magnum-v4-22b

Maintained By
anthracite-org

Magnum-v4-22b

PropertyValue
Parameter Count22.2B
Model TypeText Generation
ArchitectureMistral-based Transformer
LicenseMRL
Training Hardware8x H100 GPUs

What is magnum-v4-22b?

Magnum-v4-22b is a sophisticated language model designed to emulate the prose quality of Claude 3 models (Sonnet and Opus). Built on Mistral-Small-Instruct-2409, this model represents a significant advancement in natural language processing, trained across 6 carefully curated datasets with a focus on high-quality instruction following and natural conversation.

Implementation Details

The model underwent full-parameter fine-tuning for 2 epochs using 8x NVIDIA H100 GPUs. It implements advanced technical features including Liger RoPE, RMS normalization, and SwiGLU activations, with a sequence length of 32,768 tokens and BF16 precision.

  • Trained on 6 specialized datasets focusing on high-quality instruction and conversation
  • Uses flash attention and gradient checkpointing for efficient training
  • Implements sample packing and cosine learning rate scheduling

Core Capabilities

  • Achieves 56.29% accuracy on IFEval (0-Shot)
  • 35.55% normalized accuracy on BBH (3-Shot)
  • 17.6% exact match on MATH Level 5 (4-Shot)
  • 31.44% accuracy on MMLU-PRO (5-shot)
  • Specialized in generating Claude 3-like prose quality

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its ability to replicate Claude 3-like prose quality while maintaining strong performance across various benchmarks. It's specifically optimized for natural conversation and instruction following, with an extensive 32k context window.

Q: What are the recommended use cases?

The model excels in conversational AI applications, technical writing, and complex problem-solving tasks. It's particularly well-suited for applications requiring high-quality prose generation and detailed instruction following.

The first platform built for prompt engineering