magnum-v4-12b

Maintained By
anthracite-org

Magnum v4 12B

PropertyValue
Parameter Count12.2B
LicenseApache 2.0
ArchitectureMistral-based
Training Hardware8x H100 GPUs
Tensor TypeBF16

What is magnum-v4-12b?

Magnum v4 12B is an advanced language model designed to replicate the prose quality of Claude 3 models (Sonnet and Opus). Built on the Mistral-Nemo-Instruct-2407 architecture, this model represents a significant achievement in open-source AI, trained across 6 carefully curated datasets with a focus on high-quality instruction following and natural language generation.

Implementation Details

The model underwent full-parameter fine-tuning for 2 epochs using 8x NVIDIA H100 GPUs. It employs advanced training techniques including gradient checkpointing, flash attention, and cosine learning rate scheduling. The training process utilized the Axolotl framework with specific optimizations for performance.

  • Context Length: 32,768 tokens
  • Training Framework: Axolotl with Liger Plugin optimization
  • Base Model: Mistral-Nemo-Instruct-2407
  • Evaluation Metrics: 33.93% on IFEval, 30.50% on BBH

Core Capabilities

  • High-quality prose generation similar to Claude 3
  • Strong performance on complex reasoning tasks
  • Effective instruction following with custom prompting format
  • Comprehensive evaluation across multiple benchmarks including MMLU-PRO and MATH

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its careful optimization towards Claude 3-like prose quality while maintaining strong performance across various benchmarks. It combines six specialized datasets and advanced training techniques to achieve high-quality output.

Q: What are the recommended use cases?

The model excels in conversational AI, instruction following, and complex reasoning tasks. It's particularly well-suited for applications requiring high-quality prose generation and detailed responses to complex queries.

The first platform built for prompt engineering