bloomz-3b

Maintained By
bigscience

BLOOMZ-3B

PropertyValue
Parameter Count3 Billion
ArchitectureBLOOM Architecture (FP16)
Licensebigscience-bloom-rail-1.0
PaperCrosslingual Generalization through Multitask Finetuning
Languages46 languages

What is BLOOMZ-3B?

BLOOMZ-3B is a multilingual language model that represents a significant advancement in cross-lingual AI capabilities. It's a 3 billion parameter model fine-tuned on the xP3 dataset, designed to follow instructions and perform tasks across 46 different languages. The model builds upon the BLOOM architecture and demonstrates impressive zero-shot learning abilities across various languages and tasks.

Implementation Details

The model was trained using advanced hardware configuration including 128 A100 80GB GPUs, implementing both pipeline and tensor parallelism. It underwent 2000 fine-tuning steps processing 8.39 billion tokens, using the Megatron-DeepSpeed framework for orchestration.

  • Training Infrastructure: 128 A100 80GB GPUs with NVLink 4 inter-gpu connects
  • Framework: PyTorch with DeepSpeed optimization
  • Precision: FP16 training
  • Fine-tuning Dataset: bigscience/xP3

Core Capabilities

  • Multilingual instruction following across 46 languages
  • Zero-shot task generalization
  • Natural language understanding and generation
  • Cross-lingual inference and translation
  • Code understanding in 13 programming languages

Frequently Asked Questions

Q: What makes this model unique?

BLOOMZ-3B stands out for its ability to perform cross-lingual task generalization without requiring task-specific fine-tuning in target languages. It can understand and follow instructions across dozens of languages while maintaining high performance.

Q: What are the recommended use cases?

The model excels at tasks expressed in natural language, including translation, sentiment analysis, question answering, and creative writing across multiple languages. It's particularly effective when given clear, well-structured prompts with explicit instructions.

The first platform built for prompt engineering