SuperNova-Medius

Maintained By
arcee-ai

SuperNova-Medius

PropertyValue
Parameter Count14 billion
Model TypeText Generation
ArchitectureQwen2.5-14B-Instruct
LicenseApache-2.0
Tensor TypeBF16

What is SuperNova-Medius?

SuperNova-Medius is an advanced language model developed by Arcee.ai that represents a significant achievement in model distillation technology. Built on the Qwen2.5-14B-Instruct architecture, it uniquely combines knowledge from both Qwen2.5-72B-Instruct and Llama-3.1-405B-Instruct through an innovative cross-architecture distillation pipeline. The model achieves impressive performance scores across various benchmarks while maintaining a practical 14B parameter size.

Implementation Details

The model employs a sophisticated multi-teacher distillation process that includes logit distillation from Llama 3.1 405B, cross-architecture adaptation using mergekit-tokensurgeon, and parallel Qwen distillation. This implementation allows the model to maintain high performance while reducing computational requirements.

  • Achieves 55.6% accuracy on IFEval (0-Shot)
  • Scores 49.3% on BBH (3-Shot)
  • Demonstrates 32.48% accuracy on MATH Level 5 (4-Shot)
  • Maintains strong performance across GPQA and MMLU-Pro benchmarks

Core Capabilities

  • Advanced instruction-following and dialogue management
  • High-quality content creation across diverse domains
  • Technical documentation and programming assistance
  • Complex reasoning and problem-solving
  • Customer support and interaction handling

Frequently Asked Questions

Q: What makes this model unique?

SuperNova-Medius's uniqueness lies in its cross-architecture distillation approach, combining knowledge from two distinct model families (Qwen and Llama) into a more efficient form factor while maintaining high performance levels.

Q: What are the recommended use cases?

The model is particularly well-suited for customer support automation, content creation, technical assistance, and programming tasks. Its balanced performance makes it ideal for organizations seeking advanced AI capabilities without the resource requirements of larger models.

The first platform built for prompt engineering