granite-guardian-3.0-2b

Maintained By
ibm-granite

Granite Guardian 3.0 2B

PropertyValue
Parameter Count2.53B
LicenseApache 2.0
Tensor TypeBF16
DeveloperIBM Research
Release DateOctober 21st, 2024

What is granite-guardian-3.0-2b?

Granite Guardian 3.0 2B is a specialized AI safety model developed by IBM Research to detect various risks in both user prompts and AI responses. Built on the Granite 3.0 2B architecture, this model serves as a sophisticated guardian system that can identify potential risks across multiple dimensions including harm, social bias, jailbreaking attempts, violence, profanity, sexual content, and unethical behavior.

Implementation Details

The model utilizes a transformer-based architecture optimized for risk detection tasks. It operates by generating binary yes/no responses to assess potential risks, with probability scores indicating the confidence level of risk detection. The model is implemented using the Hugging Face transformers library and supports BF16 precision for efficient inference.

  • Trained on human-annotated and synthetic data from diverse sources
  • Achieves high F1 scores across multiple safety benchmarks
  • Supports both prompt assessment and response evaluation
  • Includes specialized RAG (Retrieval-Augmented Generation) risk detection capabilities

Core Capabilities

  • Risk Detection: Comprehensive assessment of harmful content, bias, and ethical concerns
  • RAG Evaluation: Assessment of context relevance, groundedness, and answer relevance
  • Benchmark Performance: Strong results on standard safety datasets (F1 score of 0.67 aggregate)
  • Custom Risk Definitions: Supports user-defined risk assessment criteria

Frequently Asked Questions

Q: What makes this model unique?

The model's comprehensive approach to risk detection, covering both traditional safety concerns and RAG-specific issues, sets it apart. It's specifically designed for enterprise applications and provides quantifiable risk assessments with probability scores.

Q: What are the recommended use cases?

The model is ideal for enterprise applications requiring risk assessment, including content moderation, AI system guardrails, and RAG pipeline validation. It's particularly suitable for moderate cost, latency, and throughput scenarios such as model risk assessment and monitoring.

The first platform built for prompt engineering