Magnum-v4-22b
Property | Value |
---|---|
Parameter Count | 22.2B |
Model Type | Text Generation |
Architecture | Mistral-based Transformer |
License | MRL |
Training Hardware | 8x H100 GPUs |
What is magnum-v4-22b?
Magnum-v4-22b is a sophisticated language model designed to emulate the prose quality of Claude 3 models (Sonnet and Opus). Built on Mistral-Small-Instruct-2409, this model represents a significant advancement in natural language processing, trained across 6 carefully curated datasets with a focus on high-quality instruction following and natural conversation.
Implementation Details
The model underwent full-parameter fine-tuning for 2 epochs using 8x NVIDIA H100 GPUs. It implements advanced technical features including Liger RoPE, RMS normalization, and SwiGLU activations, with a sequence length of 32,768 tokens and BF16 precision.
- Trained on 6 specialized datasets focusing on high-quality instruction and conversation
- Uses flash attention and gradient checkpointing for efficient training
- Implements sample packing and cosine learning rate scheduling
Core Capabilities
- Achieves 56.29% accuracy on IFEval (0-Shot)
- 35.55% normalized accuracy on BBH (3-Shot)
- 17.6% exact match on MATH Level 5 (4-Shot)
- 31.44% accuracy on MMLU-PRO (5-shot)
- Specialized in generating Claude 3-like prose quality
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its ability to replicate Claude 3-like prose quality while maintaining strong performance across various benchmarks. It's specifically optimized for natural conversation and instruction following, with an extensive 32k context window.
Q: What are the recommended use cases?
The model excels in conversational AI applications, technical writing, and complex problem-solving tasks. It's particularly well-suited for applications requiring high-quality prose generation and detailed instruction following.