Medius Erebus Magnum 14B

Property	Value
Parameter Count	14.8B
Base Model	Qwen2.5-14B
Training Framework	Axolotl v0.4.1
Tensor Type	BF16
Sequence Length	32,768 tokens

What is medius-erebus-magnum-14b?

Medius Erebus Magnum is a sophisticated large language model built upon the Qwen2.5-14B architecture, specifically engineered for enhanced conversational abilities and text generation tasks. The model leverages advanced training techniques including flash attention and gradient checkpointing through the Axolotl framework.

Implementation Details

The model was trained using a comprehensive setup involving multiple specialized datasets, including conversation logs and instructional data. Training utilized an 8-GPU distributed setup with AdamW optimizer and cosine learning rate scheduling, implementing advanced features like flash attention and unsloth gradient checkpointing for improved efficiency.

Trained with a learning rate of 8e-06 over 2 epochs
Implements flash attention and specialized RoPE adaptations
Uses ChatML template for conversation formatting
Supports sequence lengths up to 32,768 tokens

Core Capabilities

Advanced text generation and completion
Optimized for conversational interactions
Extended context window handling
Efficient processing through specialized attention mechanisms

Frequently Asked Questions

Q: What makes this model unique?

The model combines the powerful Qwen2.5-14B architecture with specialized training optimizations, including Liger plugin integrations for enhanced performance and an extensive 32k token context window.

Q: What are the recommended use cases?

This model is particularly well-suited for conversational AI applications, text generation tasks, and scenarios requiring long-context understanding. It's optimized for both general-purpose dialogue and specialized instruction-following tasks.