Minitron-8B-Base

Maintained By
nvidia

Minitron-8B-Base

PropertyValue
Model Size8B parameters
DeveloperNVIDIA
LicenseNVIDIA Open Model License
Research PaperarXiv:2407.14679
Training PeriodFebruary 2024 - June 2024

What is Minitron-8B-Base?

Minitron-8B-Base is an innovative large language model developed by NVIDIA through a sophisticated pruning process of the larger Nemotron-4 15B model. What makes it particularly interesting is its efficient training approach, requiring 40x fewer training tokens compared to training from scratch, while maintaining competitive performance with models like Mistral 7B and Gemma 7B.

Implementation Details

The model features a sophisticated architecture with 4096 embedding size, 48 attention heads, and 16384 MLP intermediate dimension. It implements Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE) for enhanced performance.

  • Architecture: Transformer Decoder (auto-regressive language model)
  • Network Base: Nemotron-4
  • Training Data: 94 billion tokens
  • Input/Output: Text-based string format

Core Capabilities

  • MMLU Score: 64.5 (5-shot)
  • HellaSwag: 81.6 (zero-shot)
  • GSM8K: 54.2 (zero-shot)
  • Code Generation: 31.6 (HumanEval p@1, 0-shot)
  • Multilingual support including code generation capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its efficient training approach, achieving comparable performance to larger models while requiring significantly less computational resources. The pruning and distillation process results in 1.8x compute cost savings for the entire model family.

Q: What are the recommended use cases?

The model is designed for research and development purposes, excelling in tasks like language understanding, code generation, and general text generation. However, users should be aware of potential limitations regarding toxic content and societal biases.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.