llm-jp-3-13b
Property | Value |
---|---|
Parameter Count | 13.7B parameters |
License | Apache 2.0 |
Context Length | 4096 tokens |
Training Tokens | 2.1T tokens |
Architecture | 40 layers, 5120 hidden size, 40 attention heads |
What is llm-jp-3-13b?
llm-jp-3-13b is a state-of-the-art language model developed by the National Institute of Informatics' Research and Development Center for Large Language Models. This model represents a significant achievement in multilingual AI, with particular strength in Japanese and English language processing. It features a sophisticated architecture with 13.7B parameters and has been trained on an extensive dataset of 2.1T tokens across multiple languages and domains.
Implementation Details
The model utilizes a Transformer-based architecture with 40 layers, 5120 hidden dimensions, and 40 attention heads. It implements a unique Unigram byte-fallback tokenizer specifically optimized for Japanese text processing, while maintaining strong capabilities in English and other languages.
- Requires PyTorch 2.3.0+ and Transformers 4.40.1+
- Supports BF16 precision for efficient inference
- 4096 token context window
- Comprehensive training across various domains including Wikipedia, Common Crawl, and specialized Japanese datasets
Core Capabilities
- Strong performance in both Japanese and English language tasks
- Achieves 0.5802 average score on llm-jp-eval benchmark
- Excels in tasks like extraction, humanities, and STEM topics
- Supports code generation across multiple programming languages
- Efficient processing of both Japanese and English text through specialized tokenization
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its specialized Japanese language capabilities while maintaining strong English performance, achieved through careful architecture design and extensive training on both languages. Its unique tokenizer and training approach make it particularly effective for Japanese text processing.
Q: What are the recommended use cases?
The model is well-suited for multilingual applications, particularly those involving Japanese and English content. It shows strong performance in text generation, comprehension, and specialized tasks like code generation, making it valuable for applications ranging from content creation to technical documentation.