BERT-Tiny (bert_uncased_L-2_H-128_A-2)

Property	Value
Parameter Count	4.43M parameters
Model Type	Transformer-based Language Model
License	Apache 2.0
Research Paper	arXiv:1908.08962
GLUE Score	64.2

What is bert_uncased_L-2_H-128_A-2?

BERT-Tiny is part of Google's BERT miniatures series, specifically designed for environments with restricted computational resources. This particular model features 2 layers, 128-dimensional hidden states, and 2 attention heads, making it the most compact variant in the BERT family. Despite its small size, it maintains the core BERT architecture and training methodology.

Implementation Details

The model is trained using the standard BERT recipe with WordPiece masking on English uncased text. It's particularly effective when used in knowledge distillation scenarios, where a larger BERT model acts as a teacher.

Architecture: 2 transformer layers with 128-dimensional hidden states
Attention Heads: 2
Training Approach: Pre-trained using masked language modeling
Input Processing: Uncased text (lowercase)

Core Capabilities

GLUE Score: 64.2 overall performance
Strong performance on SST-2 (83.2) and QNLI (81.5)
Optimized for resource-constrained environments
Suitable for knowledge distillation applications

Frequently Asked Questions

Q: What makes this model unique?

This model represents the smallest variant in the BERT family, offering a balance between computational efficiency and performance. It's particularly valuable for deployment in environments with limited resources while maintaining core BERT capabilities.

Q: What are the recommended use cases?

The model is best suited for: 1) Initial prototyping of BERT-based applications, 2) Edge device deployment where computational resources are limited, 3) Knowledge distillation scenarios where it can learn from larger BERT models.

bert_uncased_L-2_H-128_A-2