Llama-3.1-Swallow-8B-Instruct-v0.1
Property | Value |
---|---|
Parameter Count | 8.03B |
Model Type | LLaMA Architecture |
License | META LLAMA 3.1 COMMUNITY LICENSE & Gemma Terms of Use |
Languages | Japanese, English |
Paper | LLaMA 3 Paper |
What is Llama-3.1-Swallow-8B-Instruct-v0.1?
Llama-3.1-Swallow-8B-Instruct is an advanced language model that enhances the Japanese language capabilities of Meta's LLaMA 3.1 while maintaining strong English performance. It was developed through continual pre-training using approximately 200 billion tokens from Japanese web corpus, Wikipedia articles, and specialized content.
Implementation Details
The model underwent extensive training using the Megatron-LM framework and was fine-tuned on carefully curated instruction datasets. It leverages both synthetic and human-curated data to ensure high-quality responses in both Japanese and English contexts.
- Built on LLaMA 3.1 architecture with 8B parameters
- Trained on Swallow Corpus Version 2 and multilingual content
- Supports both Japanese and English instruction following
- Implements advanced tokenization for efficient processing
Core Capabilities
- Strong performance in Japanese NLP tasks (achieving top scores in multiple benchmarks)
- Maintains competitive English language capabilities
- Excels in tasks like translation, summarization, and question-answering
- Specialized instruction-following abilities in both languages
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its enhanced Japanese language capabilities while maintaining strong English performance, achieving state-of-the-art results in various Japanese NLP benchmarks while preserving LLaMA 3.1's English capabilities.
Q: What are the recommended use cases?
The model is well-suited for bilingual applications including translation, summarization, question-answering, and general instruction following in both Japanese and English contexts. It's particularly effective for tasks requiring deep understanding of Japanese language and culture.