opus-mt-tc-big-en-gmq

Property	Value
Parameter Count	232M
License	CC-BY-4.0
Framework	PyTorch/Transformers
Languages	English to Danish, Faroese, Icelandic, Norwegian (Bokmål/Nynorsk), Swedish

What is opus-mt-tc-big-en-gmq?

opus-mt-tc-big-en-gmq is a neural machine translation model designed for translating English text into North Germanic languages. Developed by Helsinki-NLP, it's built on the Marian NMT framework and converted to PyTorch. The model uses a transformer-big architecture and achieves impressive BLEU scores ranging from 21.5 to 61.6 across different language pairs and test sets.

Implementation Details

The model implements a transformer-big architecture trained on the opusTCv20210807+bt dataset. It uses SentencePiece tokenization with 32k vocabulary size and requires target language tokens (e.g., >>dan<<) to specify the desired output language.

Model Size: 232M parameters
Tokenization: SentencePiece (spm32k,spm32k)
Training Data: OPUS dataset with backtranslation
Release Date: 2022-03-17

Core Capabilities

Multi-target language translation supporting 7 North Germanic languages
High performance on Tatoeba test sets (BLEU scores up to 61.6 for English-Danish)
Efficient handling of various text domains
Support for both formal and informal language variants

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle multiple North Germanic languages with a single model while maintaining high performance makes it unique. It's particularly effective for Danish, Norwegian, and Swedish translations, with BLEU scores above 45 on standard benchmarks.

Q: What are the recommended use cases?

The model is ideal for professional translation tasks involving English to North Germanic languages, particularly for Danish, Swedish, and Norwegian Bokmål where it shows the strongest performance. It's suitable for both general text and specific domains like news translation.