Llama-3.1-Swallow-8B-Instruct-v0.1

Property	Value
Parameter Count	8.03B
Model Type	LLaMA Architecture
License	META LLAMA 3.1 COMMUNITY LICENSE & Gemma Terms of Use
Languages	Japanese, English
Paper	LLaMA 3 Paper

What is Llama-3.1-Swallow-8B-Instruct-v0.1?

Llama-3.1-Swallow-8B-Instruct is an advanced language model that enhances the Japanese language capabilities of Meta's LLaMA 3.1 while maintaining strong English performance. It was developed through continual pre-training using approximately 200 billion tokens from Japanese web corpus, Wikipedia articles, and specialized content.

Implementation Details

The model underwent extensive training using the Megatron-LM framework and was fine-tuned on carefully curated instruction datasets. It leverages both synthetic and human-curated data to ensure high-quality responses in both Japanese and English contexts.

Built on LLaMA 3.1 architecture with 8B parameters
Trained on Swallow Corpus Version 2 and multilingual content
Supports both Japanese and English instruction following
Implements advanced tokenization for efficient processing

Core Capabilities

Strong performance in Japanese NLP tasks (achieving top scores in multiple benchmarks)
Maintains competitive English language capabilities
Excels in tasks like translation, summarization, and question-answering
Specialized instruction-following abilities in both languages

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its enhanced Japanese language capabilities while maintaining strong English performance, achieving state-of-the-art results in various Japanese NLP benchmarks while preserving LLaMA 3.1's English capabilities.

Q: What are the recommended use cases?

The model is well-suited for bilingual applications including translation, summarization, question-answering, and general instruction following in both Japanese and English contexts. It's particularly effective for tasks requiring deep understanding of Japanese language and culture.