Llama-3.1-Tulu-3-70B-SFT

Maintained By
allenai

Llama-3.1-Tulu-3-70B-SFT

PropertyValue
Parameter Count70.6B
Base ModelLlama 3.1 70B
LicenseLlama 3.1 Community License
Research PaperarXiv:2411.15124
Training TypeSupervised Fine-Tuning (SFT)

What is Llama-3.1-Tulu-3-70B-SFT?

Llama-3.1-Tulu-3-70B-SFT is a state-of-the-art language model developed by Allen AI as part of the Tülu3 model family. It represents the supervised fine-tuning stage of their comprehensive approach to creating high-performing instruction-following models. Built on Meta's Llama 3.1 70B foundation model, it has been optimized through careful training on a diverse mixture of publicly available, synthetic, and human-created datasets.

Implementation Details

The model was trained using specific hyperparameters including a learning rate of 2E-6, an effective batch size of 128, and a maximum sequence length of 4096 tokens. The training process employed a linear learning rate schedule with a 0.03 warmup ratio over 2 epochs. The model utilizes the BF16 tensor type and implements a specialized chat template for structured interactions.

  • Comprehensive instruction-following capabilities
  • Optimized for both chat and complex reasoning tasks
  • Implements advanced loss accumulation techniques
  • Supports efficient deployment through VLLM

Core Capabilities

  • Strong performance in mathematical reasoning (MATH and GSM8K benchmarks)
  • Excellent safety metrics (94.4% average across 6 safety tasks)
  • High accuracy in knowledge-intensive tasks (78.9% on MMLU)
  • Robust code generation capabilities (92.9% pass@10 on HumanEval)
  • Superior performance in complex reasoning tasks like BigBenchHard

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced performance across a wide range of tasks, particularly excelling in mathematical reasoning and safety aspects while maintaining strong general capabilities. It's part of a fully open-source approach to model development, with transparent training procedures and evaluation metrics.

Q: What are the recommended use cases?

The model is particularly well-suited for complex problem-solving tasks, including mathematical reasoning, code generation, and general instruction following. It's designed for research and educational use, with strong safety considerations built in, making it appropriate for controlled deployment in academic and research contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.