granite-3.0-8b-instruct

Maintained By
ibm-granite

Granite-3.0-8B-Instruct

PropertyValue
Parameter Count8.17B
LicenseApache 2.0
ArchitectureDecoder-only dense transformer
Supported Languages12 languages including English, German, Spanish, French, and more
Context Length4096 tokens

What is granite-3.0-8b-instruct?

Granite-3.0-8B-Instruct is an advanced language model developed by IBM's Granite team, featuring 8.17B parameters and optimized for instruction-following tasks. The model is built on a decoder-only dense transformer architecture and has been fine-tuned using a combination of open-source instruction datasets and internally collected synthetic data. It demonstrates impressive capabilities across multiple domains, from general text generation to specialized tasks like code generation and mathematical reasoning.

Implementation Details

The model architecture incorporates several modern techniques including GQA (Grouped Query Attention) and RoPE (Rotary Position Embedding), along with SwiGLU activation functions and RMSNorm. With an embedding size of 4096 and 40 layers, it balances computational efficiency with powerful performance. The model utilizes 32 attention heads with 8 KV heads and a substantial MLP hidden size of 12800.

  • Trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs
  • Leverages 100% renewable energy sources for training
  • Supports context length of 4096 tokens
  • Implements shared input/output embeddings for efficiency

Core Capabilities

  • Text summarization and classification
  • Question-answering and information extraction
  • Retrieval Augmented Generation (RAG)
  • Code-related tasks and function calling
  • Multilingual dialogue support
  • Mathematical reasoning (68.99% on GSM8K)
  • Common sense reasoning (82.61% on Hellaswag)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its balanced architecture combining high performance with practical deployment considerations. It achieves strong results across diverse tasks while maintaining a reasonable parameter count, and notably includes support for 12 languages out of the box.

Q: What are the recommended use cases?

The model excels in business applications requiring multilingual support, code-related tasks, and general instruction following. It's particularly well-suited for building AI assistants, handling RAG applications, and performing complex reasoning tasks across multiple domains.

The first platform built for prompt engineering