Mistral-Nemo-Instruct-2407-GGUF

Maintained By
QuantFactory

Mistral-Nemo-Instruct-2407-GGUF

PropertyValue
Parameter Count12.2B
LicenseApache 2.0
Supported Languages9 (EN, FR, DE, ES, IT, PT, RU, ZH, JA)
Context Window128k tokens
Architecture40 layers, 5,120 dim, 32 heads

What is Mistral-Nemo-Instruct-2407-GGUF?

Mistral-Nemo-Instruct-2407-GGUF is a quantized version of the powerful Mistral-Nemo-Instruct model, jointly developed by Mistral AI and NVIDIA. This GGUF variant maintains the original model's capabilities while offering optimized performance for deployment. It's an instruction-tuned language model that excels in multilingual tasks and code generation.

Implementation Details

The model features a sophisticated architecture with 40 transformer layers, 5,120 dimensional embeddings, and uses GQA attention with 32 heads (8 KV-heads). It implements SwiGLU activation and rotary embeddings with theta=1M, supporting an extensive vocabulary of approximately 128k tokens.

  • Advanced GQA (Grouped Query Attention) implementation
  • 128k context window for handling long sequences
  • Multi-lingual capability with strong performance across 9 languages
  • Quantized format for efficient deployment

Core Capabilities

  • Strong performance on key benchmarks (83.5% on HellaSwag, 68.0% on MMLU)
  • Multilingual MMLU scores ranging from 59-64.6% across different languages
  • Efficient function calling and chat completion capabilities
  • Compatible with multiple frameworks including mistral_inference, transformers, and NeMo

Frequently Asked Questions

Q: What makes this model unique?

The model combines high multilingual performance, extensive context window, and efficient quantization in GGUF format, making it particularly suitable for production deployments while maintaining strong performance across multiple languages.

Q: What are the recommended use cases?

The model excels in multilingual applications, instruction following, chat completion, and function calling. It's particularly well-suited for applications requiring long context understanding and multilingual capabilities.

The first platform built for prompt engineering