opus-mt-tc-big-en-gmq
Property | Value |
---|---|
Parameter Count | 232M |
License | CC-BY-4.0 |
Framework | PyTorch/Transformers |
Languages | English to Danish, Faroese, Icelandic, Norwegian (Bokmål/Nynorsk), Swedish |
What is opus-mt-tc-big-en-gmq?
opus-mt-tc-big-en-gmq is a neural machine translation model designed for translating English text into North Germanic languages. Developed by Helsinki-NLP, it's built on the Marian NMT framework and converted to PyTorch. The model uses a transformer-big architecture and achieves impressive BLEU scores ranging from 21.5 to 61.6 across different language pairs and test sets.
Implementation Details
The model implements a transformer-big architecture trained on the opusTCv20210807+bt dataset. It uses SentencePiece tokenization with 32k vocabulary size and requires target language tokens (e.g., >>dan<<) to specify the desired output language.
- Model Size: 232M parameters
- Tokenization: SentencePiece (spm32k,spm32k)
- Training Data: OPUS dataset with backtranslation
- Release Date: 2022-03-17
Core Capabilities
- Multi-target language translation supporting 7 North Germanic languages
- High performance on Tatoeba test sets (BLEU scores up to 61.6 for English-Danish)
- Efficient handling of various text domains
- Support for both formal and informal language variants
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to handle multiple North Germanic languages with a single model while maintaining high performance makes it unique. It's particularly effective for Danish, Norwegian, and Swedish translations, with BLEU scores above 45 on standard benchmarks.
Q: What are the recommended use cases?
The model is ideal for professional translation tasks involving English to North Germanic languages, particularly for Danish, Swedish, and Norwegian Bokmål where it shows the strongest performance. It's suitable for both general text and specific domains like news translation.