medius-erebus-magnum-14b

Maintained By
underwoods

Medius Erebus Magnum 14B

PropertyValue
Parameter Count14.8B
Base ModelQwen2.5-14B
Training FrameworkAxolotl v0.4.1
Tensor TypeBF16
Sequence Length32,768 tokens

What is medius-erebus-magnum-14b?

Medius Erebus Magnum is a sophisticated large language model built upon the Qwen2.5-14B architecture, specifically engineered for enhanced conversational abilities and text generation tasks. The model leverages advanced training techniques including flash attention and gradient checkpointing through the Axolotl framework.

Implementation Details

The model was trained using a comprehensive setup involving multiple specialized datasets, including conversation logs and instructional data. Training utilized an 8-GPU distributed setup with AdamW optimizer and cosine learning rate scheduling, implementing advanced features like flash attention and unsloth gradient checkpointing for improved efficiency.

  • Trained with a learning rate of 8e-06 over 2 epochs
  • Implements flash attention and specialized RoPE adaptations
  • Uses ChatML template for conversation formatting
  • Supports sequence lengths up to 32,768 tokens

Core Capabilities

  • Advanced text generation and completion
  • Optimized for conversational interactions
  • Extended context window handling
  • Efficient processing through specialized attention mechanisms

Frequently Asked Questions

Q: What makes this model unique?

The model combines the powerful Qwen2.5-14B architecture with specialized training optimizations, including Liger plugin integrations for enhanced performance and an extensive 32k token context window.

Q: What are the recommended use cases?

This model is particularly well-suited for conversational AI applications, text generation tasks, and scenarios requiring long-context understanding. It's optimized for both general-purpose dialogue and specialized instruction-following tasks.

The first platform built for prompt engineering