MiniLM-L12-H384-uncased
Property | Value |
---|---|
Parameters | 33M |
License | MIT |
Author | Microsoft |
Paper | View Paper |
Architecture | 12-layer, 384-hidden, 12-heads |
What is MiniLM-L12-H384-uncased?
MiniLM is a compressed transformer model developed by Microsoft that achieves remarkable efficiency while maintaining high performance. This uncased version features 12 layers with a 384 hidden size, resulting in just 33M parameters - a significant reduction compared to BERT-Base's 109M parameters while being 2.7x faster.
Implementation Details
The model utilizes deep self-attention distillation to compress pre-trained transformers while preserving their task-agnostic capabilities. It's designed as a drop-in replacement for BERT, requiring fine-tuning before deployment.
- Achieves comparable or better performance than BERT-Base on various NLP tasks
- Features improved efficiency with 33M parameters (vs BERT's 109M)
- Implements a 12-layer architecture with 384 hidden dimensions
- Supports uncased text processing
Core Capabilities
- Strong performance on SQuAD 2.0 (81.7 vs BERT's 76.8)
- Excellent MNLI-m accuracy (85.7)
- High performance on SST-2 (93.0) and QNLI (91.5)
- Effective on MRPC (89.5) and QQP (91.3) tasks
Frequently Asked Questions
Q: What makes this model unique?
MiniLM's uniqueness lies in its ability to maintain BERT-level performance while significantly reducing model size through deep self-attention distillation, making it 2.7x faster than BERT-Base.
Q: What are the recommended use cases?
The model is particularly well-suited for text classification tasks, question answering, and general NLP applications where computational efficiency is crucial while maintaining high accuracy.