Teleut-7b

Property	Value
Parameter Count	7.62B
Base Model	Qwen/Qwen2.5-7B
License	Apache 2.0
Training Data	allenai/tulu-3-sft-mixture
Precision	BF16

What is Teleut-7b?

Teleut-7b is an advanced language model that represents a replication attempt of Tulu 3 using the Qwen 2.5 architecture as its foundation. This model demonstrates impressive performance across various benchmarks, particularly excelling in tasks like BBH (64.4%) and MMLU (73.2%) in zero-shot settings.

Implementation Details

The model was trained using the Axolotl framework with specific optimizations including liger rope, rms norm, and GLU activation. Training utilized 8 GPUs with a batch size of 128 and employed the paged_ademamix_8bit optimizer with a cosine learning rate scheduler.

Learning rate: 3.5e-06
Sequence length: 8192
Training epochs: 1
Gradient accumulation steps: 2

Core Capabilities

Strong performance on Big-Bench Hard (BBH) tasks with 64.4% accuracy
Excellent MMLU performance (73.2%) in zero-shot settings
Competitive GSM8K performance (78.5%)
Robust instruction following capabilities (IFEval: 66.3%)

Frequently Asked Questions

Q: What makes this model unique?

The model combines the architectural advantages of Qwen 2.5 with the training methodology of Tulu 3, resulting in strong performance across various benchmarks while maintaining a relatively compact 7B parameter size.

Q: What are the recommended use cases?

The model is particularly well-suited for tasks requiring strong reasoning capabilities, including mathematical problem-solving, knowledge-intensive tasks, and general instruction following. It performs especially well in zero-shot and few-shot scenarios.

Teleut-7b

Teleut-7b

What is Teleut-7b?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models