llm-jp-3-13b-instruct

Property	Value
Parameter Count	13.7B
License	Apache 2.0
Context Length	4096 tokens
Training Tokens	2.1T
Languages	Japanese, English, Chinese, Korean

What is llm-jp-3-13b-instruct?

llm-jp-3-13b-instruct is a large language model developed by the Research and Development Center for Large Language Models at the National Institute of Informatics. It's a instruction-tuned variant of the base 13B parameter model, specifically designed for better performance in conversational and instruction-following tasks.

Implementation Details

The model features 40 transformer layers with a hidden size of 5120 and 40 attention heads. It utilizes BF16 precision and requires modern hardware with PyTorch 2.3.0+ and transformers 4.40.1+ for optimal performance. The model was trained on a diverse dataset comprising 2.1T tokens across multiple languages and domains.

Comprehensive training on Japanese (Wikipedia, Common Crawl, WARP) and English (Dolma dataset) content
Instruction-tuning using carefully curated datasets including ichikara-instruction and FLAN
Advanced tokenizer based on Unigram byte-fallback model

Core Capabilities

Strong performance in Japanese and English text generation
Excels in instruction-following tasks with an average score of 6.47 on Japanese MT Bench
Handles various tasks including coding, extraction, humanities, and reasoning
4096 token context window for processing longer sequences

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced bilingual capabilities in Japanese and English, backed by extensive training on high-quality datasets and careful instruction tuning. It achieves impressive scores on various benchmarks, particularly in humanities (9.15/10) and writing tasks.

Q: What are the recommended use cases?

The model excels in conversational AI, text generation, and instruction-following tasks. It's particularly strong in humanities, writing, and STEM-related applications, making it suitable for educational, creative, and technical content generation in both Japanese and English.