Llama-3-Smaug-8B
Property | Value |
---|---|
Parameter Count | 8.03B |
License | LLaMA 2 |
Base Model | Meta-LLaMA-3-8B-Instruct |
Tensor Type | BF16 |
Research Paper | Smaug Paper |
What is Llama-3-Smaug-8B?
Llama-3-Smaug-8B is an advanced language model developed by Abacus.AI, built upon Meta's LLaMA 3 architecture. It implements the innovative Smaug recipe for enhancing performance in real-world multi-turn conversations, showing notable improvements over the base model in benchmark tests.
Implementation Details
The model is trained on multiple high-quality datasets including AQUA-RAT, Microsoft's Orca math word problems, CodeFeedback, and ShareGPT Vicuna. It leverages new techniques compared to its predecessor Smaug-72B, optimizing for both single-turn and multi-turn conversational scenarios.
- Achieves 8.33 average score on MT-Bench (compared to base model's 8.10)
- Significantly improved first-turn performance (8.78 vs 8.31)
- Maintains consistent second-turn performance (7.89)
Core Capabilities
- Enhanced multi-turn conversation handling
- Mathematical problem-solving abilities
- Code-related feedback and analysis
- General instruction following and task completion
Frequently Asked Questions
Q: What makes this model unique?
The model's implementation of the Smaug recipe, combined with its optimization for multi-turn conversations, sets it apart. It shows particular strength in first-turn interactions while maintaining competitive performance in follow-up exchanges.
Q: What are the recommended use cases?
The model is well-suited for conversational applications, mathematical problem-solving, code-related tasks, and general instruction-following scenarios. Its balanced performance makes it particularly valuable for applications requiring sustained dialogue.