DeBERTa-v3-base-zeroshot-v1.1-all-33
Property | Value |
---|---|
Parameter Count | 184M |
Architecture | DeBERTa-v3 |
License | MIT |
Paper | arxiv.org/pdf/2312.17543.pdf |
Training Data | 33 datasets, 387 classes |
What is deberta-v3-base-zeroshot-v1.1-all-33?
This is a specialized zero-shot classification model built on DeBERTa-v3 architecture, designed to perform universal binary classification tasks through natural language inference (NLI). The model determines whether a hypothesis is "true" or "not true" given a text, making it adaptable to any classification scenario without specific training.
Implementation Details
The model was trained on a diverse mixture of 33 datasets encompassing 387 classes, including 5 NLI datasets (~885k texts) and 28 classification tasks reformatted into the universal NLI format (~51k cleaned texts). It uses a binary classification approach (entailment vs. not_entailment) rather than the traditional three-way NLI classification.
- Trained on multiple domains including sentiment analysis, emotion detection, topic classification, and more
- Implements FP16 tensor type for efficient computation
- Supports both single-label and multi-label classification
- Designed for English language processing with recommendation for machine translation for multilingual use-cases
Core Capabilities
- Zero-shot classification across diverse domains
- Binary classification through natural language inference
- Flexible hypothesis template customization
- High performance on multiple benchmark datasets (70.7% average accuracy in zero-shot scenarios)
Frequently Asked Questions
Q: What makes this model unique?
The model's universal binary classification approach through NLI makes it highly adaptable to new classification tasks without specific training, while being more efficient than larger language models.
Q: What are the recommended use cases?
The model excels in various classification tasks including sentiment analysis, emotion detection, topic classification, and content moderation, particularly when traditional supervised learning isn't feasible due to lack of labeled data.