Cosmo-1B

Property	Value
Parameter Count	1.74B
License	Apache 2.0
Architecture	LLaMA-2
Training Tokens	180B
Training Hardware	160 H100 GPUs

What is cosmo-1b?

Cosmo-1B is a lightweight language model built on the LLaMA-2 architecture, trained primarily on the Cosmopedia synthetic dataset. This model represents a balanced approach between efficiency and performance, offering strong capabilities in text generation and academic task completion.

Implementation Details

The model was trained on a diverse corpus of 30B tokens, with 25B synthetic tokens from Cosmopedia and 5B additional tokens from various sources including code repositories and educational content. Training occurred over 6 epochs using 160 H100 GPUs, with a sequence length of 2k and a global batch size of 1.3M tokens.

Trained with bfloat16 precision
Uses Mistral-7B-v0.1 tokenizer
Implements temperature-based sampling (0.6) with top-p (0.95) filtering
Features built-in chat capabilities without additional instruction tuning

Core Capabilities

Text generation and completion tasks
Strong performance on academic benchmarks (ARC-easy, ARC-challenge)
Educational content generation
Chat-based interactions
Code-aware text processing

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its efficient architecture combined with comprehensive training on synthetic data, allowing it to perform comparably to larger models on specific tasks while maintaining a relatively small parameter count of 1.74B.

Q: What are the recommended use cases?

Cosmo-1B is well-suited for educational content generation, text completion tasks, and basic chat interactions. It performs particularly well on academic benchmarks and can be effectively used for both completion and instruction-following tasks.

cosmo-1b