llm-jp-3-13b-instruct
Property | Value |
---|---|
Parameter Count | 13.7B |
License | Apache 2.0 |
Context Length | 4096 tokens |
Training Tokens | 2.1T |
Languages | Japanese, English, Chinese, Korean |
What is llm-jp-3-13b-instruct?
llm-jp-3-13b-instruct is a large language model developed by the Research and Development Center for Large Language Models at the National Institute of Informatics. It's a instruction-tuned variant of the base 13B parameter model, specifically designed for better performance in conversational and instruction-following tasks.
Implementation Details
The model features 40 transformer layers with a hidden size of 5120 and 40 attention heads. It utilizes BF16 precision and requires modern hardware with PyTorch 2.3.0+ and transformers 4.40.1+ for optimal performance. The model was trained on a diverse dataset comprising 2.1T tokens across multiple languages and domains.
- Comprehensive training on Japanese (Wikipedia, Common Crawl, WARP) and English (Dolma dataset) content
- Instruction-tuning using carefully curated datasets including ichikara-instruction and FLAN
- Advanced tokenizer based on Unigram byte-fallback model
Core Capabilities
- Strong performance in Japanese and English text generation
- Excels in instruction-following tasks with an average score of 6.47 on Japanese MT Bench
- Handles various tasks including coding, extraction, humanities, and reasoning
- 4096 token context window for processing longer sequences
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its balanced bilingual capabilities in Japanese and English, backed by extensive training on high-quality datasets and careful instruction tuning. It achieves impressive scores on various benchmarks, particularly in humanities (9.15/10) and writing tasks.
Q: What are the recommended use cases?
The model excels in conversational AI, text generation, and instruction-following tasks. It's particularly strong in humanities, writing, and STEM-related applications, making it suitable for educational, creative, and technical content generation in both Japanese and English.