Granite Guardian 3.0 2B

Property	Value
Parameter Count	2.53B
License	Apache 2.0
Tensor Type	BF16
Developer	IBM Research
Release Date	October 21st, 2024

What is granite-guardian-3.0-2b?

Granite Guardian 3.0 2B is a specialized AI safety model developed by IBM Research to detect various risks in both user prompts and AI responses. Built on the Granite 3.0 2B architecture, this model serves as a sophisticated guardian system that can identify potential risks across multiple dimensions including harm, social bias, jailbreaking attempts, violence, profanity, sexual content, and unethical behavior.

Implementation Details

The model utilizes a transformer-based architecture optimized for risk detection tasks. It operates by generating binary yes/no responses to assess potential risks, with probability scores indicating the confidence level of risk detection. The model is implemented using the Hugging Face transformers library and supports BF16 precision for efficient inference.

Trained on human-annotated and synthetic data from diverse sources
Achieves high F1 scores across multiple safety benchmarks
Supports both prompt assessment and response evaluation
Includes specialized RAG (Retrieval-Augmented Generation) risk detection capabilities

Core Capabilities

Risk Detection: Comprehensive assessment of harmful content, bias, and ethical concerns
RAG Evaluation: Assessment of context relevance, groundedness, and answer relevance
Benchmark Performance: Strong results on standard safety datasets (F1 score of 0.67 aggregate)
Custom Risk Definitions: Supports user-defined risk assessment criteria

Frequently Asked Questions

Q: What makes this model unique?

The model's comprehensive approach to risk detection, covering both traditional safety concerns and RAG-specific issues, sets it apart. It's specifically designed for enterprise applications and provides quantifiable risk assessments with probability scores.

Q: What are the recommended use cases?

The model is ideal for enterprise applications requiring risk assessment, including content moderation, AI system guardrails, and RAG pipeline validation. It's particularly suitable for moderate cost, latency, and throughput scenarios such as model risk assessment and monitoring.