MiniLM-L12-H384-uncased

Property	Value
Parameters	33M
License	MIT
Author	Microsoft
Paper	View Paper
Architecture	12-layer, 384-hidden, 12-heads

What is MiniLM-L12-H384-uncased?

MiniLM is a compressed transformer model developed by Microsoft that achieves remarkable efficiency while maintaining high performance. This uncased version features 12 layers with a 384 hidden size, resulting in just 33M parameters - a significant reduction compared to BERT-Base's 109M parameters while being 2.7x faster.

Implementation Details

The model utilizes deep self-attention distillation to compress pre-trained transformers while preserving their task-agnostic capabilities. It's designed as a drop-in replacement for BERT, requiring fine-tuning before deployment.

Achieves comparable or better performance than BERT-Base on various NLP tasks
Features improved efficiency with 33M parameters (vs BERT's 109M)
Implements a 12-layer architecture with 384 hidden dimensions
Supports uncased text processing

Core Capabilities

Strong performance on SQuAD 2.0 (81.7 vs BERT's 76.8)
Excellent MNLI-m accuracy (85.7)
High performance on SST-2 (93.0) and QNLI (91.5)
Effective on MRPC (89.5) and QQP (91.3) tasks

Frequently Asked Questions

Q: What makes this model unique?

MiniLM's uniqueness lies in its ability to maintain BERT-level performance while significantly reducing model size through deep self-attention distillation, making it 2.7x faster than BERT-Base.

Q: What are the recommended use cases?

The model is particularly well-suited for text classification tasks, question answering, and general NLP applications where computational efficiency is crucial while maintaining high accuracy.