minilm-uncased-squad2
Property | Value |
---|---|
Parameter Count | 33.4M parameters |
License | CC-BY-4.0 |
Task Type | Question Answering |
Performance | 76.19% Exact Match, 79.55% F1 Score |
What is minilm-uncased-squad2?
minilm-uncased-squad2 is a compact and efficient question-answering model developed by deepset, based on Microsoft's MiniLM architecture. This model has been specifically fine-tuned on the SQuAD 2.0 dataset to perform extractive question answering tasks in English. With its relatively small size of 33.4M parameters, it offers an excellent balance between performance and computational efficiency.
Implementation Details
The model was trained using a Tesla v100 GPU with carefully selected hyperparameters, including a batch size of 12, 4 epochs, and a maximum sequence length of 384 tokens. It implements a linear warmup learning rate schedule with a 0.2 warmup proportion and uses gradient accumulation steps of 4.
- Base model: microsoft/MiniLM-L12-H384-uncased
- Training dataset: SQuAD 2.0
- Maximum query length: 64 tokens
- Document stride: 128 tokens
Core Capabilities
- Extractive Question Answering with high accuracy
- Efficient processing of both answerable and unanswerable questions
- Easy integration with both Haystack and Transformers libraries
- Balanced performance across different question types (78.36% exact match on answerable questions)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture while maintaining strong performance metrics. With only 33.4M parameters, it achieves an impressive 76.19% exact match accuracy on SQuAD 2.0, making it an excellent choice for production environments where computational resources are constrained.
Q: What are the recommended use cases?
The model is best suited for extractive question answering tasks in English, particularly in applications requiring efficient document analysis and information extraction. It can be easily integrated into both research and production environments using either Haystack or Hugging Face Transformers frameworks.