granite-3.0-2b-instruct

Maintained By
ibm-granite

Granite-3.0-2B-Instruct

PropertyValue
Parameter Count2.63B
LicenseApache 2.0
Training Tokens12T
Context Length4096
Supported Languages12 languages including English, German, Spanish, French, and more

What is granite-3.0-2b-instruct?

Granite-3.0-2B-Instruct is an advanced language model developed by IBM's Granite Team, featuring 2.63B parameters and trained on 12 trillion tokens. The model is built on a decoder-only dense transformer architecture incorporating modern techniques like GQA and RoPE, making it particularly effective for multilingual applications and instruction-following tasks.

Implementation Details

The model utilizes a sophisticated architecture with 40 layers, 2048 embedding size, and 32 attention heads. It implements SwiGLU activation and RoPE position embeddings, optimized for both performance and efficiency. The model is trained using IBM's Blue Vela supercomputing cluster, powered by NVIDIA H100 GPUs and running on 100% renewable energy.

  • Embedding size: 2048
  • Number of layers: 40
  • Attention heads: 32 (8 KV heads)
  • MLP hidden size: 8192
  • Sequence length: 4096

Core Capabilities

  • Text summarization and classification
  • Question-answering and extraction
  • Retrieval Augmented Generation (RAG)
  • Code-related tasks with strong performance
  • Function-calling capabilities
  • Multilingual dialog support

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its balanced architecture and impressive performance across various benchmarks, including 76.79% on Hellaswag, 59.66% on GSM8K, and strong multilingual capabilities. It's specifically designed for instruction-following and business applications while maintaining high ethical standards.

Q: What are the recommended use cases?

The model excels in business applications, general instruction following, and multilingual tasks. It's particularly suitable for RAG implementations, code-related tasks, and building AI assistants across multiple domains.

The first platform built for prompt engineering