Granite-3.0-2B-Instruct
Property | Value |
---|---|
Parameter Count | 2.63B |
License | Apache 2.0 |
Training Tokens | 12T |
Context Length | 4096 |
Supported Languages | 12 languages including English, German, Spanish, French, and more |
What is granite-3.0-2b-instruct?
Granite-3.0-2B-Instruct is an advanced language model developed by IBM's Granite Team, featuring 2.63B parameters and trained on 12 trillion tokens. The model is built on a decoder-only dense transformer architecture incorporating modern techniques like GQA and RoPE, making it particularly effective for multilingual applications and instruction-following tasks.
Implementation Details
The model utilizes a sophisticated architecture with 40 layers, 2048 embedding size, and 32 attention heads. It implements SwiGLU activation and RoPE position embeddings, optimized for both performance and efficiency. The model is trained using IBM's Blue Vela supercomputing cluster, powered by NVIDIA H100 GPUs and running on 100% renewable energy.
- Embedding size: 2048
- Number of layers: 40
- Attention heads: 32 (8 KV heads)
- MLP hidden size: 8192
- Sequence length: 4096
Core Capabilities
- Text summarization and classification
- Question-answering and extraction
- Retrieval Augmented Generation (RAG)
- Code-related tasks with strong performance
- Function-calling capabilities
- Multilingual dialog support
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its balanced architecture and impressive performance across various benchmarks, including 76.79% on Hellaswag, 59.66% on GSM8K, and strong multilingual capabilities. It's specifically designed for instruction-following and business applications while maintaining high ethical standards.
Q: What are the recommended use cases?
The model excels in business applications, general instruction following, and multilingual tasks. It's particularly suitable for RAG implementations, code-related tasks, and building AI assistants across multiple domains.