DeBERTa-v3-base-squad2

Property	Value
Parameter Count	184M
License	CC-BY-4.0
Base Model	microsoft/deberta-v3-base
Training Data	SQuAD 2.0
Primary Task	Question Answering

What is deberta-v3-base-squad2?

DeBERTa-v3-base-squad2 is a fine-tuned version of the DeBERTa-v3-base model specifically optimized for extractive question answering tasks. Trained on the SQuAD 2.0 dataset, it demonstrates impressive performance with an 83.82% exact match score and 87.41% F1 score on the validation set. This model is particularly notable for its ability to handle both answerable and unanswerable questions, making it robust for real-world applications.

Implementation Details

The model was trained using specific hyperparameters including a batch size of 12, 4 epochs, and a maximum sequence length of 512 tokens. It employs linear warmup scheduling with a 0.2 warmup proportion and a learning rate of 2e-5. The training was conducted on an NVIDIA A10G GPU infrastructure.

Maximum query length: 64 tokens
Document stride: 128 tokens
Supports both PyTorch and Transformers frameworks
Includes safetensors implementation

Core Capabilities

Achieves 84.97% exact match on SQuAD plain text validation
Robust performance across different domains (NYT: 81.57% EM, New Wiki: 80.17% EM)
Handles adversarial questions with 79.29% exact match on AddOneSent dataset
Effective on both answerable and unanswerable questions

Frequently Asked Questions

Q: What makes this model unique?

This model combines the advanced architecture of DeBERTa-v3 with specific optimizations for question answering, achieving state-of-the-art performance on SQuAD 2.0 while maintaining efficiency with only 184M parameters.

Q: What are the recommended use cases?

The model is ideal for extractive question answering applications, particularly when dealing with English text. It's especially suitable for scenarios requiring handling of unanswerable questions and cross-domain adaptation, as demonstrated by its strong performance on various domain-specific datasets.