Llama-3-Taiwan-70B-Instruct

Property	Value
Parameter Count	70B
Context Length	8K tokens (128K version available)
License	Llama3
Research Paper	TMLU Paper
Training Framework	NVIDIA NeMo

What is Llama-3-Taiwan-70B-Instruct?

Llama-3-Taiwan-70B-Instruct is a state-of-the-art language model specifically fine-tuned for Traditional Mandarin and English language processing. Built on Meta's Llama-3 70B architecture, it has been optimized using high-quality Traditional Mandarin and English corpora, covering diverse domains including legal, manufacturing, medical, and electronics.

Implementation Details

The model was trained using NVIDIA's NeMo Framework on DGX H100 systems, featuring a batch size of 2M tokens per step. It demonstrates impressive performance across various benchmarks, notably achieving 74.76% on TMLU and 80.95% on Taiwan Truthful QA.

Advanced 70B parameter architecture
8K context window with 128K version available
Trained on NVIDIA DGX H100 systems
Implements both English and Traditional Mandarin capabilities

Core Capabilities

Multi-turn dialogue in Traditional Mandarin and English
RAG (Retrieval-Augmented Generation) support
Structured output and function calling
Domain-specific knowledge in legal, medical, and technical fields
High performance on Taiwanese-specific benchmarks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Traditional Mandarin while maintaining strong English capabilities. It achieves SOTA performance on Taiwanese benchmarks and includes comprehensive domain knowledge across multiple professional fields.

Q: What are the recommended use cases?

The model excels in multi-turn conversations, RAG applications, structured output generation, and domain-specific tasks in legal, medical, and technical fields. It's particularly suited for applications requiring deep understanding of Traditional Mandarin context.