Magnum v4 123B
Property | Value |
---|---|
Parameter Count | 123 Billion |
Base Model | Mistral-Large-Instruct-2407 |
Context Length | 16,384 tokens |
License | MRL |
Training Infrastructure | 8x mi300x GPUs |
What is magnum-v4-123b?
Magnum v4 123B is an advanced language model designed to replicate the prose quality of Claude 3 models (Sonnet and Opus). Built on the Mistral-Large-Instruct-2407 architecture, this model represents a significant achievement in high-quality text generation and conversational AI.
Implementation Details
The model utilizes BF16 precision and incorporates advanced training techniques including gradient checkpointing and flash attention. It was trained using the Axolotl framework across 6 carefully curated datasets, with a sequence length of 16,384 tokens and specialized optimizations for performance.
- Full parameter fine-tuning on 8x mi300x GPUs
- Implements flash attention and gradient checkpointing for efficiency
- Uses sample packing and cosine learning rate scheduler
- Trained with adamw_bnb_8bit optimizer
Core Capabilities
- High-quality prose generation similar to Claude 3
- Extended context window of 16k tokens
- Efficient text generation with optimized architecture
- Compatible with standard chat interfaces and SillyTavern templates
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specific optimization toward Claude-like prose quality, combined with the robust Mistral architecture and extensive 16k context window. It's built using a diverse set of high-quality training datasets and advanced training techniques.
Q: What are the recommended use cases?
The model excels in conversational AI applications, creative writing, and tasks requiring high-quality prose generation. It's particularly well-suited for long-form content generation and complex dialogue interactions.