HomerCreativeAnvita-Mix-Qw7B
Property | Value |
---|---|
Parameter Count | 7.62B |
Model Type | Merged Language Model |
Architecture | Qwen2-based Transformer |
Tensor Type | BF16 |
What is HomerCreativeAnvita-Mix-Qw7B?
HomerCreativeAnvita-Mix-Qw7B is a sophisticated merged language model that combines two powerful base models using the SLERP (Spherical Linear Interpolation) merge method. Currently ranked #1 on the Open LLM Leaderboard among models up to 13B parameters, it demonstrates exceptional performance across various benchmarks.
Implementation Details
The model is implemented using mergekit, utilizing a carefully crafted SLERP merge configuration that combines ZeroXClem/Qwen2.5-7B-HomerAnvita-NerdMix and ZeroXClem/Qwen2.5-7B-HomerCreative-Mix. The merge configuration employs specific attention and MLP layer weightings across 28 layers.
- SLERP merge method with customized layer-wise interpolation
- BFloat16 precision for optimal performance and memory usage
- Sophisticated attention and MLP layer mixing ratios
- 28-layer architecture derived from Qwen2.5 base models
Core Capabilities
- 78.08% accuracy on IFEval (0-Shot) tasks
- 36.98% normalized accuracy on BBH (3-Shot)
- 31.04% exact match on MATH Level 5 (4-Shot)
- 38.28% accuracy on MMLU-PRO (5-shot)
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its sophisticated merge strategy using SLERP, combining two specialized Qwen2.5 variants to achieve state-of-the-art performance in its parameter class. The careful balance of attention and MLP layer weightings results in superior performance across diverse tasks.
Q: What are the recommended use cases?
Given its strong performance on various benchmarks, this model is particularly well-suited for text generation tasks, complex reasoning problems, and educational applications requiring mathematical comprehension. It performs especially well in zero-shot and few-shot scenarios.