Llama-3-Taiwan-70B-Instruct
Property | Value |
---|---|
Parameter Count | 70B |
Context Length | 8K tokens (128K version available) |
License | Llama3 |
Research Paper | TMLU Paper |
Training Framework | NVIDIA NeMo |
What is Llama-3-Taiwan-70B-Instruct?
Llama-3-Taiwan-70B-Instruct is a state-of-the-art language model specifically fine-tuned for Traditional Mandarin and English language processing. Built on Meta's Llama-3 70B architecture, it has been optimized using high-quality Traditional Mandarin and English corpora, covering diverse domains including legal, manufacturing, medical, and electronics.
Implementation Details
The model was trained using NVIDIA's NeMo Framework on DGX H100 systems, featuring a batch size of 2M tokens per step. It demonstrates impressive performance across various benchmarks, notably achieving 74.76% on TMLU and 80.95% on Taiwan Truthful QA.
- Advanced 70B parameter architecture
- 8K context window with 128K version available
- Trained on NVIDIA DGX H100 systems
- Implements both English and Traditional Mandarin capabilities
Core Capabilities
- Multi-turn dialogue in Traditional Mandarin and English
- RAG (Retrieval-Augmented Generation) support
- Structured output and function calling
- Domain-specific knowledge in legal, medical, and technical fields
- High performance on Taiwanese-specific benchmarks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for Traditional Mandarin while maintaining strong English capabilities. It achieves SOTA performance on Taiwanese benchmarks and includes comprehensive domain knowledge across multiple professional fields.
Q: What are the recommended use cases?
The model excels in multi-turn conversations, RAG applications, structured output generation, and domain-specific tasks in legal, medical, and technical fields. It's particularly suited for applications requiring deep understanding of Traditional Mandarin context.