Mistral-NeMo-12B-Instruct

Maintained By
nvidia

Mistral-NeMo-12B-Instruct

PropertyValue
Parameter Count12 Billion
LicenseApache 2.0
Context Window128k tokens
ArchitectureTransformer Decoder
Training PeriodJune 2024 - July 2024

What is Mistral-NeMo-12B-Instruct?

Mistral-NeMo-12B-Instruct is a sophisticated Large Language Model (LLM) developed through a collaboration between NVIDIA and Mistral AI. This 12B parameter model represents a significant advancement in language AI, featuring multilingual capabilities and innovative technical implementations like FP8 quantization without accuracy loss.

Implementation Details

The model is built on a robust architecture consisting of 40 layers with a dimension of 5,120 and 32 attention heads. It utilizes Grouped-Query Attention (GQA) with 8 key-value heads and implements SwiGLU activation functions. The model employs rotary embeddings with a theta value of 1M and supports a substantial vocabulary size of approximately 128,000 tokens.

  • 40 transformer layers with 5,120 dimensional representations
  • 32 attention heads with 128 dimensional head space
  • Hidden dimension of 14,436
  • Advanced SwiGLU activation function
  • Grouped-Query Attention with 8 KV-heads

Core Capabilities

  • Multilingual support with emphasis on English language tasks
  • 128k context window for handling long-form content
  • FP8 quantization support for efficient deployment
  • Strong performance metrics (MT Bench: 7.84, MixEval Hard: 0.534)
  • Customizable through NeMo Framework tools

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient architecture combining a relatively compact 12B parameter size with state-of-the-art performance. Its FP8 quantization capability and extensive context window make it particularly suitable for production deployments.

Q: What are the recommended use cases?

The model is primarily designed for English language chat applications but supports multilingual tasks. It's particularly well-suited for scenarios requiring long context understanding and can be customized using NVIDIA's NeMo Framework for specific use cases through techniques like P-tuning, Adapters, and LoRA.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.