cosmo-1b

Maintained By
HuggingFaceTB

Cosmo-1B

PropertyValue
Parameter Count1.74B
LicenseApache 2.0
ArchitectureLLaMA-2
Training Tokens180B
Training Hardware160 H100 GPUs

What is cosmo-1b?

Cosmo-1B is a lightweight language model built on the LLaMA-2 architecture, trained primarily on the Cosmopedia synthetic dataset. This model represents a balanced approach between efficiency and performance, offering strong capabilities in text generation and academic task completion.

Implementation Details

The model was trained on a diverse corpus of 30B tokens, with 25B synthetic tokens from Cosmopedia and 5B additional tokens from various sources including code repositories and educational content. Training occurred over 6 epochs using 160 H100 GPUs, with a sequence length of 2k and a global batch size of 1.3M tokens.

  • Trained with bfloat16 precision
  • Uses Mistral-7B-v0.1 tokenizer
  • Implements temperature-based sampling (0.6) with top-p (0.95) filtering
  • Features built-in chat capabilities without additional instruction tuning

Core Capabilities

  • Text generation and completion tasks
  • Strong performance on academic benchmarks (ARC-easy, ARC-challenge)
  • Educational content generation
  • Chat-based interactions
  • Code-aware text processing

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its efficient architecture combined with comprehensive training on synthetic data, allowing it to perform comparably to larger models on specific tasks while maintaining a relatively small parameter count of 1.74B.

Q: What are the recommended use cases?

Cosmo-1B is well-suited for educational content generation, text completion tasks, and basic chat interactions. It performs particularly well on academic benchmarks and can be effectively used for both completion and instruction-following tasks.

The first platform built for prompt engineering