Teleut-7b-GGUF

Maintained By
QuantFactory

Teleut-7b-GGUF

PropertyValue
Parameter Count7.62B
Base ModelQwen/Qwen2.5-7B
LicenseApache-2.0
FormatGGUF (Quantized)

What is Teleut-7b-GGUF?

Teleut-7b-GGUF is a quantized version of the Teleut-7b model, which aims to replicate the capabilities of Tulu 3 using the Qwen 2.5 architecture. The model demonstrates impressive performance across various benchmarks, particularly in reasoning and knowledge-based tasks.

Implementation Details

The model was trained using the Axolotl framework with specific optimizations including flash attention and gradient checkpointing. Training utilized 8 GPUs with a batch size of 128 and employed the paged_ademamix_8bit optimizer with a cosine learning rate schedule.

  • Sequence length: 8192 tokens
  • Learning rate: 3.5e-06
  • Training framework: Transformers 4.46.3
  • Chat template: chatml

Core Capabilities

  • Strong performance on BBH (64.4% with 3-shot CoT)
  • Impressive MMLU scores (73.2% zero-shot)
  • GSM8K mathematical reasoning (78.5% with 8-shot CoT)
  • Competitive IFEval performance (66.3%)

Frequently Asked Questions

Q: What makes this model unique?

The model combines the strengths of Qwen 2.5 architecture with Tulu 3 training methodology, resulting in strong performance across various benchmarks while being optimized for efficient deployment through GGUF quantization.

Q: What are the recommended use cases?

The model excels in tasks requiring reasoning, mathematical problem-solving, and general knowledge application, making it suitable for educational applications, research, and general-purpose conversational AI systems.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.