MiniLM-L12-H384-uncased

Maintained By
microsoft

MiniLM-L12-H384-uncased

PropertyValue
Parameters33M
LicenseMIT
AuthorMicrosoft
PaperView Paper
Architecture12-layer, 384-hidden, 12-heads

What is MiniLM-L12-H384-uncased?

MiniLM is a compressed transformer model developed by Microsoft that achieves remarkable efficiency while maintaining high performance. This uncased version features 12 layers with a 384 hidden size, resulting in just 33M parameters - a significant reduction compared to BERT-Base's 109M parameters while being 2.7x faster.

Implementation Details

The model utilizes deep self-attention distillation to compress pre-trained transformers while preserving their task-agnostic capabilities. It's designed as a drop-in replacement for BERT, requiring fine-tuning before deployment.

  • Achieves comparable or better performance than BERT-Base on various NLP tasks
  • Features improved efficiency with 33M parameters (vs BERT's 109M)
  • Implements a 12-layer architecture with 384 hidden dimensions
  • Supports uncased text processing

Core Capabilities

  • Strong performance on SQuAD 2.0 (81.7 vs BERT's 76.8)
  • Excellent MNLI-m accuracy (85.7)
  • High performance on SST-2 (93.0) and QNLI (91.5)
  • Effective on MRPC (89.5) and QQP (91.3) tasks

Frequently Asked Questions

Q: What makes this model unique?

MiniLM's uniqueness lies in its ability to maintain BERT-level performance while significantly reducing model size through deep self-attention distillation, making it 2.7x faster than BERT-Base.

Q: What are the recommended use cases?

The model is particularly well-suited for text classification tasks, question answering, and general NLP applications where computational efficiency is crucial while maintaining high accuracy.

The first platform built for prompt engineering