Granite-3.0-8B-Instruct

Property	Value
Parameter Count	8.17B
License	Apache 2.0
Architecture	Decoder-only dense transformer
Supported Languages	12 languages including English, German, Spanish, French, and more
Context Length	4096 tokens

What is granite-3.0-8b-instruct?

Granite-3.0-8B-Instruct is an advanced language model developed by IBM's Granite team, featuring 8.17B parameters and optimized for instruction-following tasks. The model is built on a decoder-only dense transformer architecture and has been fine-tuned using a combination of open-source instruction datasets and internally collected synthetic data. It demonstrates impressive capabilities across multiple domains, from general text generation to specialized tasks like code generation and mathematical reasoning.

Implementation Details

The model architecture incorporates several modern techniques including GQA (Grouped Query Attention) and RoPE (Rotary Position Embedding), along with SwiGLU activation functions and RMSNorm. With an embedding size of 4096 and 40 layers, it balances computational efficiency with powerful performance. The model utilizes 32 attention heads with 8 KV heads and a substantial MLP hidden size of 12800.

Trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs
Leverages 100% renewable energy sources for training
Supports context length of 4096 tokens
Implements shared input/output embeddings for efficiency

Core Capabilities

Text summarization and classification
Question-answering and information extraction
Retrieval Augmented Generation (RAG)
Code-related tasks and function calling
Multilingual dialogue support
Mathematical reasoning (68.99% on GSM8K)
Common sense reasoning (82.61% on Hellaswag)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its balanced architecture combining high performance with practical deployment considerations. It achieves strong results across diverse tasks while maintaining a reasonable parameter count, and notably includes support for 12 languages out of the box.

Q: What are the recommended use cases?

The model excels in business applications requiring multilingual support, code-related tasks, and general instruction following. It's particularly well-suited for building AI assistants, handling RAG applications, and performing complex reasoning tasks across multiple domains.