distilbert-base-multilingual-cased

Maintained By
distilbert

DistilBERT Base Multilingual Cased

PropertyValue
Parameter Count134M parameters
Model TypeTransformer-based language model
LicenseApache 2.0
PaperarXiv:1910.01108
Languages Supported104 languages

What is distilbert-base-multilingual-cased?

DistilBERT base multilingual cased is a compressed version of the BERT base multilingual model, designed to be more efficient while maintaining strong performance. Created by Hugging Face, this model has been distilled to reduce its size by 24% compared to mBERT-base while being twice as fast in execution.

Implementation Details

The model architecture consists of 6 layers, 768 dimension, and 12 attention heads, totaling 134M parameters. It was trained on Wikipedia content across 104 different languages, making it highly versatile for multilingual applications. The model maintains case sensitivity, distinguishing between capitalized and lowercase text.

  • 6-layer transformer architecture
  • 768-dimensional hidden states
  • 12 attention heads
  • 24% smaller than original mBERT
  • 2x faster inference speed

Core Capabilities

  • Masked language modeling
  • Next sentence prediction
  • Cross-lingual transfer learning
  • Fine-tuning for downstream tasks
  • Zero-shot learning across languages

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient balance between performance and speed. While maintaining 97% of mBERT's language understanding capabilities, it operates at twice the speed and uses significantly fewer computational resources.

Q: What are the recommended use cases?

The model is best suited for tasks that require whole-sentence understanding, such as sequence classification, token classification, and question answering. It's particularly valuable for multilingual applications where resource efficiency is important. However, it's not recommended for text generation tasks.

The first platform built for prompt engineering