Llama-3.1-Tulu-3-70B-SFT
Property | Value |
---|---|
Parameter Count | 70.6B |
Base Model | Llama 3.1 70B |
License | Llama 3.1 Community License |
Research Paper | arXiv:2411.15124 |
Training Type | Supervised Fine-Tuning (SFT) |
What is Llama-3.1-Tulu-3-70B-SFT?
Llama-3.1-Tulu-3-70B-SFT is a state-of-the-art language model developed by Allen AI as part of the Tülu3 model family. It represents the supervised fine-tuning stage of their comprehensive approach to creating high-performing instruction-following models. Built on Meta's Llama 3.1 70B foundation model, it has been optimized through careful training on a diverse mixture of publicly available, synthetic, and human-created datasets.
Implementation Details
The model was trained using specific hyperparameters including a learning rate of 2E-6, an effective batch size of 128, and a maximum sequence length of 4096 tokens. The training process employed a linear learning rate schedule with a 0.03 warmup ratio over 2 epochs. The model utilizes the BF16 tensor type and implements a specialized chat template for structured interactions.
- Comprehensive instruction-following capabilities
- Optimized for both chat and complex reasoning tasks
- Implements advanced loss accumulation techniques
- Supports efficient deployment through VLLM
Core Capabilities
- Strong performance in mathematical reasoning (MATH and GSM8K benchmarks)
- Excellent safety metrics (94.4% average across 6 safety tasks)
- High accuracy in knowledge-intensive tasks (78.9% on MMLU)
- Robust code generation capabilities (92.9% pass@10 on HumanEval)
- Superior performance in complex reasoning tasks like BigBenchHard
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its balanced performance across a wide range of tasks, particularly excelling in mathematical reasoning and safety aspects while maintaining strong general capabilities. It's part of a fully open-source approach to model development, with transparent training procedures and evaluation metrics.
Q: What are the recommended use cases?
The model is particularly well-suited for complex problem-solving tasks, including mathematical reasoning, code generation, and general instruction following. It's designed for research and educational use, with strong safety considerations built in, making it appropriate for controlled deployment in academic and research contexts.