Mistral-NeMo-12B-Instruct
Property | Value |
---|---|
Parameter Count | 12 Billion |
License | Apache 2.0 |
Context Window | 128k tokens |
Architecture | Transformer Decoder |
Training Period | June 2024 - July 2024 |
What is Mistral-NeMo-12B-Instruct?
Mistral-NeMo-12B-Instruct is a sophisticated Large Language Model (LLM) developed through a collaboration between NVIDIA and Mistral AI. This 12B parameter model represents a significant advancement in language AI, featuring multilingual capabilities and innovative technical implementations like FP8 quantization without accuracy loss.
Implementation Details
The model is built on a robust architecture consisting of 40 layers with a dimension of 5,120 and 32 attention heads. It utilizes Grouped-Query Attention (GQA) with 8 key-value heads and implements SwiGLU activation functions. The model employs rotary embeddings with a theta value of 1M and supports a substantial vocabulary size of approximately 128,000 tokens.
- 40 transformer layers with 5,120 dimensional representations
- 32 attention heads with 128 dimensional head space
- Hidden dimension of 14,436
- Advanced SwiGLU activation function
- Grouped-Query Attention with 8 KV-heads
Core Capabilities
- Multilingual support with emphasis on English language tasks
- 128k context window for handling long-form content
- FP8 quantization support for efficient deployment
- Strong performance metrics (MT Bench: 7.84, MixEval Hard: 0.534)
- Customizable through NeMo Framework tools
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its efficient architecture combining a relatively compact 12B parameter size with state-of-the-art performance. Its FP8 quantization capability and extensive context window make it particularly suitable for production deployments.
Q: What are the recommended use cases?
The model is primarily designed for English language chat applications but supports multilingual tasks. It's particularly well-suited for scenarios requiring long context understanding and can be customized using NVIDIA's NeMo Framework for specific use cases through techniques like P-tuning, Adapters, and LoRA.