BLOOMZ-3B
Property | Value |
---|---|
Parameter Count | 3 Billion |
Architecture | BLOOM Architecture (FP16) |
License | bigscience-bloom-rail-1.0 |
Paper | Crosslingual Generalization through Multitask Finetuning |
Languages | 46 languages |
What is BLOOMZ-3B?
BLOOMZ-3B is a multilingual language model that represents a significant advancement in cross-lingual AI capabilities. It's a 3 billion parameter model fine-tuned on the xP3 dataset, designed to follow instructions and perform tasks across 46 different languages. The model builds upon the BLOOM architecture and demonstrates impressive zero-shot learning abilities across various languages and tasks.
Implementation Details
The model was trained using advanced hardware configuration including 128 A100 80GB GPUs, implementing both pipeline and tensor parallelism. It underwent 2000 fine-tuning steps processing 8.39 billion tokens, using the Megatron-DeepSpeed framework for orchestration.
- Training Infrastructure: 128 A100 80GB GPUs with NVLink 4 inter-gpu connects
- Framework: PyTorch with DeepSpeed optimization
- Precision: FP16 training
- Fine-tuning Dataset: bigscience/xP3
Core Capabilities
- Multilingual instruction following across 46 languages
- Zero-shot task generalization
- Natural language understanding and generation
- Cross-lingual inference and translation
- Code understanding in 13 programming languages
Frequently Asked Questions
Q: What makes this model unique?
BLOOMZ-3B stands out for its ability to perform cross-lingual task generalization without requiring task-specific fine-tuning in target languages. It can understand and follow instructions across dozens of languages while maintaining high performance.
Q: What are the recommended use cases?
The model excels at tasks expressed in natural language, including translation, sentiment analysis, question answering, and creative writing across multiple languages. It's particularly effective when given clear, well-structured prompts with explicit instructions.