Magnum-v4-22b

Property	Value
Parameter Count	22.2B
Model Type	Text Generation
Architecture	Mistral-based Transformer
License	MRL
Training Hardware	8x H100 GPUs

What is magnum-v4-22b?

Magnum-v4-22b is a sophisticated language model designed to emulate the prose quality of Claude 3 models (Sonnet and Opus). Built on Mistral-Small-Instruct-2409, this model represents a significant advancement in natural language processing, trained across 6 carefully curated datasets with a focus on high-quality instruction following and natural conversation.

Implementation Details

The model underwent full-parameter fine-tuning for 2 epochs using 8x NVIDIA H100 GPUs. It implements advanced technical features including Liger RoPE, RMS normalization, and SwiGLU activations, with a sequence length of 32,768 tokens and BF16 precision.

Trained on 6 specialized datasets focusing on high-quality instruction and conversation
Uses flash attention and gradient checkpointing for efficient training
Implements sample packing and cosine learning rate scheduling

Core Capabilities

Achieves 56.29% accuracy on IFEval (0-Shot)
35.55% normalized accuracy on BBH (3-Shot)
17.6% exact match on MATH Level 5 (4-Shot)
31.44% accuracy on MMLU-PRO (5-shot)
Specialized in generating Claude 3-like prose quality

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its ability to replicate Claude 3-like prose quality while maintaining strong performance across various benchmarks. It's specifically optimized for natural conversation and instruction following, with an extensive 32k context window.

Q: What are the recommended use cases?

The model excels in conversational AI applications, technical writing, and complex problem-solving tasks. It's particularly well-suited for applications requiring high-quality prose generation and detailed instruction following.

magnum-v4-22b