TinyLlama-1.1B-intermediate-step-1431k-3T

Maintained By
TinyLlama

TinyLlama-1.1B-intermediate-step-1431k-3T

PropertyValue
Parameter Count1.1B
LicenseApache 2.0
Training Tokens3 Trillion
ArchitectureLLaMA-based Transformer

What is TinyLlama-1.1B-intermediate-step-1431k-3T?

TinyLlama-1.1B is an ambitious project aimed at creating a compact yet powerful language model by pretraining a 1.1B parameter model on 3 trillion tokens. This specific checkpoint represents the final stage of training, achieving impressive performance metrics while maintaining a small computational footprint.

Implementation Details

The model adopts the same architecture and tokenizer as Llama 2, making it highly compatible with existing Llama-based projects. It was trained using 16 A100-40G GPUs over a 90-day period, demonstrating efficient resource utilization for large-scale training.

  • Identical architecture to Llama 2 for seamless integration
  • Trained on SlimPajama-627B and StarCoder datasets
  • Optimized for both performance and memory efficiency
  • Uses F32 tensor type for computations

Core Capabilities

  • HellaSwag (10-Shot): 60.31% normalized accuracy
  • Winogrande (5-shot): 59.51% accuracy
  • TruthfulQA (0-shot): 37.32% accuracy
  • MMLU (5-Shot): 26.04% accuracy
  • Efficient performance in resource-constrained environments

Frequently Asked Questions

Q: What makes this model unique?

TinyLlama stands out for achieving impressive performance metrics with only 1.1B parameters, making it significantly more efficient than larger models while maintaining strong capabilities. Its compatibility with the Llama ecosystem makes it particularly valuable for resource-constrained applications.

Q: What are the recommended use cases?

The model is ideal for applications requiring a balance between performance and computational efficiency, such as edge devices, rapid prototyping, and scenarios where larger models would be impractical. It's particularly well-suited for text generation tasks where resource constraints are a primary concern.

The first platform built for prompt engineering