Teleut-7b-GGUF

Property	Value
Parameter Count	7.62B
Base Model	Qwen/Qwen2.5-7B
License	Apache-2.0
Format	GGUF (Quantized)

What is Teleut-7b-GGUF?

Teleut-7b-GGUF is a quantized version of the Teleut-7b model, which aims to replicate the capabilities of Tulu 3 using the Qwen 2.5 architecture. The model demonstrates impressive performance across various benchmarks, particularly in reasoning and knowledge-based tasks.

Implementation Details

The model was trained using the Axolotl framework with specific optimizations including flash attention and gradient checkpointing. Training utilized 8 GPUs with a batch size of 128 and employed the paged_ademamix_8bit optimizer with a cosine learning rate schedule.

Sequence length: 8192 tokens
Learning rate: 3.5e-06
Training framework: Transformers 4.46.3
Chat template: chatml

Core Capabilities

Strong performance on BBH (64.4% with 3-shot CoT)
Impressive MMLU scores (73.2% zero-shot)
GSM8K mathematical reasoning (78.5% with 8-shot CoT)
Competitive IFEval performance (66.3%)

Frequently Asked Questions

Q: What makes this model unique?

The model combines the strengths of Qwen 2.5 architecture with Tulu 3 training methodology, resulting in strong performance across various benchmarks while being optimized for efficient deployment through GGUF quantization.

Q: What are the recommended use cases?

The model excels in tasks requiring reasoning, mathematical problem-solving, and general knowledge application, making it suitable for educational applications, research, and general-purpose conversational AI systems.

Teleut-7b-GGUF

Teleut-7b-GGUF

What is Teleut-7b-GGUF?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models