SuperNova-Medius

Property	Value
Parameter Count	14 billion
Model Type	Text Generation
Architecture	Qwen2.5-14B-Instruct
License	Apache-2.0
Tensor Type	BF16

What is SuperNova-Medius?

SuperNova-Medius is an advanced language model developed by Arcee.ai that represents a significant achievement in model distillation technology. Built on the Qwen2.5-14B-Instruct architecture, it uniquely combines knowledge from both Qwen2.5-72B-Instruct and Llama-3.1-405B-Instruct through an innovative cross-architecture distillation pipeline. The model achieves impressive performance scores across various benchmarks while maintaining a practical 14B parameter size.

Implementation Details

The model employs a sophisticated multi-teacher distillation process that includes logit distillation from Llama 3.1 405B, cross-architecture adaptation using mergekit-tokensurgeon, and parallel Qwen distillation. This implementation allows the model to maintain high performance while reducing computational requirements.

Achieves 55.6% accuracy on IFEval (0-Shot)
Scores 49.3% on BBH (3-Shot)
Demonstrates 32.48% accuracy on MATH Level 5 (4-Shot)
Maintains strong performance across GPQA and MMLU-Pro benchmarks

Core Capabilities

Advanced instruction-following and dialogue management
High-quality content creation across diverse domains
Technical documentation and programming assistance
Complex reasoning and problem-solving
Customer support and interaction handling

Frequently Asked Questions

Q: What makes this model unique?

SuperNova-Medius's uniqueness lies in its cross-architecture distillation approach, combining knowledge from two distinct model families (Qwen and Llama) into a more efficient form factor while maintaining high performance levels.

Q: What are the recommended use cases?

The model is particularly well-suited for customer support automation, content creation, technical assistance, and programming tasks. Its balanced performance makes it ideal for organizations seeking advanced AI capabilities without the resource requirements of larger models.