Tulu-3.1-8B-SuperNova

Maintained By
bunnycore

Tulu-3.1-8B-SuperNova

PropertyValue
Parameter Count8.03B
Model TypeMerged LLM
ArchitectureLLaMA-based
Tensor TypeBF16
PaperLinear Merge Paper

What is Tulu-3.1-8B-SuperNova?

Tulu-3.1-8B-SuperNova is an advanced language model created through a linear merge of three powerful base models: Llama-3.1-MedIT-SUN-8B, Llama-3.1-Tulu-3-8B, and Llama-3.1-SuperNova-Lite. Using mergekit technology, it combines the strengths of each model with equal weighting to create a versatile text generation system.

Implementation Details

The model employs a linear merge methodology with specific technical configurations including bfloat16 precision and int8 masking. Each constituent model contributes equally with a weight of 1.0, ensuring balanced capabilities across different domains.

  • Linear merge architecture with normalized weights
  • BFloat16 precision for optimal performance
  • Int8 masking for efficient processing
  • Equal contribution from three specialized base models

Core Capabilities

  • Outstanding performance on IFEval with 81.94% accuracy
  • Solid performance on BBH (32.50%) and MMLU-PRO (31.27%)
  • Specialized capability in MATH problems (24.32% exact match)
  • Balanced performance across various text generation tasks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced merge of medical, general knowledge, and specialized capabilities from its base models, achieving particularly strong results on instruction-following tasks as demonstrated by its IFEval score.

Q: What are the recommended use cases?

The model is particularly well-suited for instruction-following tasks, general text generation, and specialized applications requiring medical knowledge or mathematical reasoning. It performs best in scenarios where balanced, reliable responses are needed across various domains.

The first platform built for prompt engineering