Magnum v4 12B

Property	Value
Parameter Count	12.2B
License	Apache 2.0
Architecture	Mistral-based
Training Hardware	8x H100 GPUs
Tensor Type	BF16

What is magnum-v4-12b?

Magnum v4 12B is an advanced language model designed to replicate the prose quality of Claude 3 models (Sonnet and Opus). Built on the Mistral-Nemo-Instruct-2407 architecture, this model represents a significant achievement in open-source AI, trained across 6 carefully curated datasets with a focus on high-quality instruction following and natural language generation.

Implementation Details

The model underwent full-parameter fine-tuning for 2 epochs using 8x NVIDIA H100 GPUs. It employs advanced training techniques including gradient checkpointing, flash attention, and cosine learning rate scheduling. The training process utilized the Axolotl framework with specific optimizations for performance.

Context Length: 32,768 tokens
Training Framework: Axolotl with Liger Plugin optimization
Base Model: Mistral-Nemo-Instruct-2407
Evaluation Metrics: 33.93% on IFEval, 30.50% on BBH

Core Capabilities

High-quality prose generation similar to Claude 3
Strong performance on complex reasoning tasks
Effective instruction following with custom prompting format
Comprehensive evaluation across multiple benchmarks including MMLU-PRO and MATH

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its careful optimization towards Claude 3-like prose quality while maintaining strong performance across various benchmarks. It combines six specialized datasets and advanced training techniques to achieve high-quality output.

Q: What are the recommended use cases?

The model excels in conversational AI, instruction following, and complex reasoning tasks. It's particularly well-suited for applications requiring high-quality prose generation and detailed responses to complex queries.

magnum-v4-12b