Mistral-Nemo-Instruct-2407-GGUF

Property	Value
Parameter Count	12.2B
License	Apache 2.0
Supported Languages	9 (EN, FR, DE, ES, IT, PT, RU, ZH, JA)
Context Window	128k tokens
Architecture	40 layers, 5,120 dim, 32 heads

What is Mistral-Nemo-Instruct-2407-GGUF?

Mistral-Nemo-Instruct-2407-GGUF is a quantized version of the powerful Mistral-Nemo-Instruct model, jointly developed by Mistral AI and NVIDIA. This GGUF variant maintains the original model's capabilities while offering optimized performance for deployment. It's an instruction-tuned language model that excels in multilingual tasks and code generation.

Implementation Details

The model features a sophisticated architecture with 40 transformer layers, 5,120 dimensional embeddings, and uses GQA attention with 32 heads (8 KV-heads). It implements SwiGLU activation and rotary embeddings with theta=1M, supporting an extensive vocabulary of approximately 128k tokens.

Advanced GQA (Grouped Query Attention) implementation
128k context window for handling long sequences
Multi-lingual capability with strong performance across 9 languages
Quantized format for efficient deployment

Core Capabilities

Strong performance on key benchmarks (83.5% on HellaSwag, 68.0% on MMLU)
Multilingual MMLU scores ranging from 59-64.6% across different languages
Efficient function calling and chat completion capabilities
Compatible with multiple frameworks including mistral_inference, transformers, and NeMo

Frequently Asked Questions

Q: What makes this model unique?

The model combines high multilingual performance, extensive context window, and efficient quantization in GGUF format, making it particularly suitable for production deployments while maintaining strong performance across multiple languages.

Q: What are the recommended use cases?

The model excels in multilingual applications, instruction following, chat completion, and function calling. It's particularly well-suited for applications requiring long context understanding and multilingual capabilities.