Teleut-7b-GGUF
Property | Value |
---|---|
Parameter Count | 7.62B |
Base Model | Qwen/Qwen2.5-7B |
License | Apache-2.0 |
Format | GGUF (Quantized) |
What is Teleut-7b-GGUF?
Teleut-7b-GGUF is a quantized version of the Teleut-7b model, which aims to replicate the capabilities of Tulu 3 using the Qwen 2.5 architecture. The model demonstrates impressive performance across various benchmarks, particularly in reasoning and knowledge-based tasks.
Implementation Details
The model was trained using the Axolotl framework with specific optimizations including flash attention and gradient checkpointing. Training utilized 8 GPUs with a batch size of 128 and employed the paged_ademamix_8bit optimizer with a cosine learning rate schedule.
- Sequence length: 8192 tokens
- Learning rate: 3.5e-06
- Training framework: Transformers 4.46.3
- Chat template: chatml
Core Capabilities
- Strong performance on BBH (64.4% with 3-shot CoT)
- Impressive MMLU scores (73.2% zero-shot)
- GSM8K mathematical reasoning (78.5% with 8-shot CoT)
- Competitive IFEval performance (66.3%)
Frequently Asked Questions
Q: What makes this model unique?
The model combines the strengths of Qwen 2.5 architecture with Tulu 3 training methodology, resulting in strong performance across various benchmarks while being optimized for efficient deployment through GGUF quantization.
Q: What are the recommended use cases?
The model excels in tasks requiring reasoning, mathematical problem-solving, and general knowledge application, making it suitable for educational applications, research, and general-purpose conversational AI systems.