Medius Erebus Magnum 14B
Property | Value |
---|---|
Parameter Count | 14.8B |
Base Model | Qwen2.5-14B |
Training Framework | Axolotl v0.4.1 |
Tensor Type | BF16 |
Sequence Length | 32,768 tokens |
What is medius-erebus-magnum-14b?
Medius Erebus Magnum is a sophisticated large language model built upon the Qwen2.5-14B architecture, specifically engineered for enhanced conversational abilities and text generation tasks. The model leverages advanced training techniques including flash attention and gradient checkpointing through the Axolotl framework.
Implementation Details
The model was trained using a comprehensive setup involving multiple specialized datasets, including conversation logs and instructional data. Training utilized an 8-GPU distributed setup with AdamW optimizer and cosine learning rate scheduling, implementing advanced features like flash attention and unsloth gradient checkpointing for improved efficiency.
- Trained with a learning rate of 8e-06 over 2 epochs
- Implements flash attention and specialized RoPE adaptations
- Uses ChatML template for conversation formatting
- Supports sequence lengths up to 32,768 tokens
Core Capabilities
- Advanced text generation and completion
- Optimized for conversational interactions
- Extended context window handling
- Efficient processing through specialized attention mechanisms
Frequently Asked Questions
Q: What makes this model unique?
The model combines the powerful Qwen2.5-14B architecture with specialized training optimizations, including Liger plugin integrations for enhanced performance and an extensive 32k token context window.
Q: What are the recommended use cases?
This model is particularly well-suited for conversational AI applications, text generation tasks, and scenarios requiring long-context understanding. It's optimized for both general-purpose dialogue and specialized instruction-following tasks.