Granite Guardian 3.0 2B
Property | Value |
---|---|
Parameter Count | 2.53B |
License | Apache 2.0 |
Tensor Type | BF16 |
Developer | IBM Research |
Release Date | October 21st, 2024 |
What is granite-guardian-3.0-2b?
Granite Guardian 3.0 2B is a specialized AI safety model developed by IBM Research to detect various risks in both user prompts and AI responses. Built on the Granite 3.0 2B architecture, this model serves as a sophisticated guardian system that can identify potential risks across multiple dimensions including harm, social bias, jailbreaking attempts, violence, profanity, sexual content, and unethical behavior.
Implementation Details
The model utilizes a transformer-based architecture optimized for risk detection tasks. It operates by generating binary yes/no responses to assess potential risks, with probability scores indicating the confidence level of risk detection. The model is implemented using the Hugging Face transformers library and supports BF16 precision for efficient inference.
- Trained on human-annotated and synthetic data from diverse sources
- Achieves high F1 scores across multiple safety benchmarks
- Supports both prompt assessment and response evaluation
- Includes specialized RAG (Retrieval-Augmented Generation) risk detection capabilities
Core Capabilities
- Risk Detection: Comprehensive assessment of harmful content, bias, and ethical concerns
- RAG Evaluation: Assessment of context relevance, groundedness, and answer relevance
- Benchmark Performance: Strong results on standard safety datasets (F1 score of 0.67 aggregate)
- Custom Risk Definitions: Supports user-defined risk assessment criteria
Frequently Asked Questions
Q: What makes this model unique?
The model's comprehensive approach to risk detection, covering both traditional safety concerns and RAG-specific issues, sets it apart. It's specifically designed for enterprise applications and provides quantifiable risk assessments with probability scores.
Q: What are the recommended use cases?
The model is ideal for enterprise applications requiring risk assessment, including content moderation, AI system guardrails, and RAG pipeline validation. It's particularly suitable for moderate cost, latency, and throughput scenarios such as model risk assessment and monitoring.