Sarvam-2b-v0.5
Property | Value |
---|---|
Parameter Count | 2.51B |
Model Type | Text Generation, Transformers |
License | Other |
Tensor Type | BF16 |
What is sarvam-2b-v0.5?
Sarvam-2b-v0.5 is an early checkpoint of a powerful multilingual language model specifically designed for Indian languages. This 2.51B parameter model has been pre-trained from scratch on 2 trillion tokens, with a unique focus on supporting 10 Indic languages alongside English. The model represents a significant advancement in Indian language AI, utilizing the NVIDIA NeMo™ Framework and trained on HGX H100 systems.
Implementation Details
The model employs a custom tokenizer optimized for Indic languages, achieving an impressive average fertility score of ~2, significantly lower than competitors like Llama-3.1 (9.34) and GPT-4 (3.00). This efficiency in tokenization makes it particularly effective for processing Indian languages.
- Trained on Yotta Shakti Cloud using HGX H100 systems
- Implements transformer architecture with BF16 precision
- Supports 10 Indic languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu
Core Capabilities
- Efficient multilingual text generation
- Superior tokenization for Indic scripts
- Balanced performance across 11 languages
- Easy integration with Hugging Face transformers library
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its optimized tokenization for Indic languages, achieving significantly better fertility scores compared to other models. It's specifically designed to handle multiple Indian scripts effectively while maintaining strong English language capabilities.
Q: What are the recommended use cases?
The model is well-suited for multilingual text generation tasks, particularly those involving Indian languages. It can be used for content generation, translation assistance, and general language understanding tasks across the supported languages.