Magnum v4 12B
Property | Value |
---|---|
Parameter Count | 12.2B |
License | Apache 2.0 |
Architecture | Mistral-based |
Training Hardware | 8x H100 GPUs |
Tensor Type | BF16 |
What is magnum-v4-12b?
Magnum v4 12B is an advanced language model designed to replicate the prose quality of Claude 3 models (Sonnet and Opus). Built on the Mistral-Nemo-Instruct-2407 architecture, this model represents a significant achievement in open-source AI, trained across 6 carefully curated datasets with a focus on high-quality instruction following and natural language generation.
Implementation Details
The model underwent full-parameter fine-tuning for 2 epochs using 8x NVIDIA H100 GPUs. It employs advanced training techniques including gradient checkpointing, flash attention, and cosine learning rate scheduling. The training process utilized the Axolotl framework with specific optimizations for performance.
- Context Length: 32,768 tokens
- Training Framework: Axolotl with Liger Plugin optimization
- Base Model: Mistral-Nemo-Instruct-2407
- Evaluation Metrics: 33.93% on IFEval, 30.50% on BBH
Core Capabilities
- High-quality prose generation similar to Claude 3
- Strong performance on complex reasoning tasks
- Effective instruction following with custom prompting format
- Comprehensive evaluation across multiple benchmarks including MMLU-PRO and MATH
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its careful optimization towards Claude 3-like prose quality while maintaining strong performance across various benchmarks. It combines six specialized datasets and advanced training techniques to achieve high-quality output.
Q: What are the recommended use cases?
The model excels in conversational AI, instruction following, and complex reasoning tasks. It's particularly well-suited for applications requiring high-quality prose generation and detailed responses to complex queries.