SuperNova-Medius
Property | Value |
---|---|
Parameter Count | 14 billion |
Model Type | Text Generation |
Architecture | Qwen2.5-14B-Instruct |
License | Apache-2.0 |
Tensor Type | BF16 |
What is SuperNova-Medius?
SuperNova-Medius is an advanced language model developed by Arcee.ai that represents a significant achievement in model distillation technology. Built on the Qwen2.5-14B-Instruct architecture, it uniquely combines knowledge from both Qwen2.5-72B-Instruct and Llama-3.1-405B-Instruct through an innovative cross-architecture distillation pipeline. The model achieves impressive performance scores across various benchmarks while maintaining a practical 14B parameter size.
Implementation Details
The model employs a sophisticated multi-teacher distillation process that includes logit distillation from Llama 3.1 405B, cross-architecture adaptation using mergekit-tokensurgeon, and parallel Qwen distillation. This implementation allows the model to maintain high performance while reducing computational requirements.
- Achieves 55.6% accuracy on IFEval (0-Shot)
- Scores 49.3% on BBH (3-Shot)
- Demonstrates 32.48% accuracy on MATH Level 5 (4-Shot)
- Maintains strong performance across GPQA and MMLU-Pro benchmarks
Core Capabilities
- Advanced instruction-following and dialogue management
- High-quality content creation across diverse domains
- Technical documentation and programming assistance
- Complex reasoning and problem-solving
- Customer support and interaction handling
Frequently Asked Questions
Q: What makes this model unique?
SuperNova-Medius's uniqueness lies in its cross-architecture distillation approach, combining knowledge from two distinct model families (Qwen and Llama) into a more efficient form factor while maintaining high performance levels.
Q: What are the recommended use cases?
The model is particularly well-suited for customer support automation, content creation, technical assistance, and programming tasks. Its balanced performance makes it ideal for organizations seeking advanced AI capabilities without the resource requirements of larger models.