llm-jp-3-13b

Maintained By
llm-jp

llm-jp-3-13b

PropertyValue
Parameter Count13.7B parameters
LicenseApache 2.0
Context Length4096 tokens
Training Tokens2.1T tokens
Architecture40 layers, 5120 hidden size, 40 attention heads

What is llm-jp-3-13b?

llm-jp-3-13b is a state-of-the-art language model developed by the National Institute of Informatics' Research and Development Center for Large Language Models. This model represents a significant achievement in multilingual AI, with particular strength in Japanese and English language processing. It features a sophisticated architecture with 13.7B parameters and has been trained on an extensive dataset of 2.1T tokens across multiple languages and domains.

Implementation Details

The model utilizes a Transformer-based architecture with 40 layers, 5120 hidden dimensions, and 40 attention heads. It implements a unique Unigram byte-fallback tokenizer specifically optimized for Japanese text processing, while maintaining strong capabilities in English and other languages.

  • Requires PyTorch 2.3.0+ and Transformers 4.40.1+
  • Supports BF16 precision for efficient inference
  • 4096 token context window
  • Comprehensive training across various domains including Wikipedia, Common Crawl, and specialized Japanese datasets

Core Capabilities

  • Strong performance in both Japanese and English language tasks
  • Achieves 0.5802 average score on llm-jp-eval benchmark
  • Excels in tasks like extraction, humanities, and STEM topics
  • Supports code generation across multiple programming languages
  • Efficient processing of both Japanese and English text through specialized tokenization

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its specialized Japanese language capabilities while maintaining strong English performance, achieved through careful architecture design and extensive training on both languages. Its unique tokenizer and training approach make it particularly effective for Japanese text processing.

Q: What are the recommended use cases?

The model is well-suited for multilingual applications, particularly those involving Japanese and English content. It shows strong performance in text generation, comprehension, and specialized tasks like code generation, making it valuable for applications ranging from content creation to technical documentation.

The first platform built for prompt engineering