BERT-Tiny (bert_uncased_L-2_H-128_A-2)
Property | Value |
---|---|
Parameter Count | 4.43M parameters |
Model Type | Transformer-based Language Model |
License | Apache 2.0 |
Research Paper | arXiv:1908.08962 |
GLUE Score | 64.2 |
What is bert_uncased_L-2_H-128_A-2?
BERT-Tiny is part of Google's BERT miniatures series, specifically designed for environments with restricted computational resources. This particular model features 2 layers, 128-dimensional hidden states, and 2 attention heads, making it the most compact variant in the BERT family. Despite its small size, it maintains the core BERT architecture and training methodology.
Implementation Details
The model is trained using the standard BERT recipe with WordPiece masking on English uncased text. It's particularly effective when used in knowledge distillation scenarios, where a larger BERT model acts as a teacher.
- Architecture: 2 transformer layers with 128-dimensional hidden states
- Attention Heads: 2
- Training Approach: Pre-trained using masked language modeling
- Input Processing: Uncased text (lowercase)
Core Capabilities
- GLUE Score: 64.2 overall performance
- Strong performance on SST-2 (83.2) and QNLI (81.5)
- Optimized for resource-constrained environments
- Suitable for knowledge distillation applications
Frequently Asked Questions
Q: What makes this model unique?
This model represents the smallest variant in the BERT family, offering a balance between computational efficiency and performance. It's particularly valuable for deployment in environments with limited resources while maintaining core BERT capabilities.
Q: What are the recommended use cases?
The model is best suited for: 1) Initial prototyping of BERT-based applications, 2) Edge device deployment where computational resources are limited, 3) Knowledge distillation scenarios where it can learn from larger BERT models.