TowerInstruct-7B-v0.1
Property | Value |
---|---|
Parameter Count | 6.74B |
License | CC-BY-NC-4.0 |
Paper | arXiv:2402.17733 |
Supported Languages | 10 (EN, DE, FR, ZH, PT, NL, RU, KO, IT, ES) |
Base Model | TowerBase |
What is TowerInstruct-7B-v0.1?
TowerInstruct-7B-v0.1 is a specialized language model developed by Unbabel in collaboration with Instituto Superior Técnico and CentraleSupélec University of Paris-Saclay. It's specifically designed for translation-related tasks, representing a significant advancement in multilingual AI capabilities.
Implementation Details
The model is built upon TowerBase and fine-tuned using the TowerBlocks supervised fine-tuning dataset. It implements the ChatML prompt template format and operates with F32 tensor type. Training utilized a batch size of 256, with a cosine learning rate scheduler and carefully optimized hyperparameters for maximum performance.
- Total training batch size: 256
- Learning rate: 7e-06
- Maximum sequence length: 2048
- Training epochs: 4
Core Capabilities
- General machine translation (sentence and paragraph-level)
- Terminology-aware translation
- Context-aware translation
- Automatic post-edition
- Named-entity recognition
- Grammatical error correction
- Paraphrase generation
Frequently Asked Questions
Q: What makes this model unique?
TowerInstruct-7B-v0.1 stands out for its specialized focus on translation-related tasks across 10 major languages, combining various translation capabilities with additional language understanding tasks like named-entity recognition and grammatical error correction.
Q: What are the recommended use cases?
The model is best suited for translation tasks, automatic post-editing, and language-related transformations. However, it's important to note that it's not intended for use as a conversational chatbot or code assistant, despite having some training in these areas.