llm-jp-3-13b-instruct

Maintained By
llm-jp

llm-jp-3-13b-instruct

PropertyValue
Parameter Count13.7B
LicenseApache 2.0
Context Length4096 tokens
Training Tokens2.1T
LanguagesJapanese, English, Chinese, Korean

What is llm-jp-3-13b-instruct?

llm-jp-3-13b-instruct is a large language model developed by the Research and Development Center for Large Language Models at the National Institute of Informatics. It's a instruction-tuned variant of the base 13B parameter model, specifically designed for better performance in conversational and instruction-following tasks.

Implementation Details

The model features 40 transformer layers with a hidden size of 5120 and 40 attention heads. It utilizes BF16 precision and requires modern hardware with PyTorch 2.3.0+ and transformers 4.40.1+ for optimal performance. The model was trained on a diverse dataset comprising 2.1T tokens across multiple languages and domains.

  • Comprehensive training on Japanese (Wikipedia, Common Crawl, WARP) and English (Dolma dataset) content
  • Instruction-tuning using carefully curated datasets including ichikara-instruction and FLAN
  • Advanced tokenizer based on Unigram byte-fallback model

Core Capabilities

  • Strong performance in Japanese and English text generation
  • Excels in instruction-following tasks with an average score of 6.47 on Japanese MT Bench
  • Handles various tasks including coding, extraction, humanities, and reasoning
  • 4096 token context window for processing longer sequences

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced bilingual capabilities in Japanese and English, backed by extensive training on high-quality datasets and careful instruction tuning. It achieves impressive scores on various benchmarks, particularly in humanities (9.15/10) and writing tasks.

Q: What are the recommended use cases?

The model excels in conversational AI, text generation, and instruction-following tasks. It's particularly strong in humanities, writing, and STEM-related applications, making it suitable for educational, creative, and technical content generation in both Japanese and English.

The first platform built for prompt engineering