Multilingual-MiniLM-L12-H384
Property | Value |
---|---|
Architecture | 12-layer Transformer |
Hidden Size | 384 |
Parameters | 21M (Transformer) + 96M (Embedding) |
License | MIT |
Languages | 16 (including en, ar, bg, de, el, es, fr, hi, ru, sw, th, tr, ur, vi, zh) |
What is Multilingual-MiniLM-L12-H384?
Multilingual-MiniLM-L12-H384 is a compact yet powerful multilingual transformer model developed by Microsoft. It's a distilled version of larger language models, designed to provide efficient cross-lingual understanding while maintaining strong performance. The model combines BERT's architecture with XLM-R's tokenization approach, offering a unique balance between computational efficiency and multilingual capabilities.
Implementation Details
The model features a 12-layer transformer architecture with 384 hidden dimensions and 12 attention heads. It uses knowledge distillation techniques to compress the capabilities of larger models into a more efficient form, resulting in just 21M transformer parameters and 96M embedding parameters.
- Utilizes XLM-RoBERTa tokenizer for multilingual support
- Implements BERT-style transformer architecture
- Optimized for cross-lingual transfer learning
- Achieves 71.1% average accuracy on XNLI benchmark
Core Capabilities
- Cross-lingual Natural Language Inference (XNLI benchmark performance)
- Multilingual Question Answering (MLQA benchmark support)
- Text Classification across 16 languages
- Efficient deployment with reduced parameter count
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture that maintains strong performance while using only 21M parameters for its transformer component. It successfully combines BERT's architecture with XLM-R's tokenization, making it particularly effective for multilingual applications while requiring fewer computational resources.
Q: What are the recommended use cases?
The model is particularly well-suited for cross-lingual tasks such as natural language inference and question answering. It's ideal for applications requiring multilingual understanding with limited computational resources, showing strong performance on benchmarks like XNLI and MLQA across multiple languages.