llm-jp-3-1.8b
Property | Value |
---|---|
Parameter Count | 1.8B |
Model Type | Transformer-based Language Model |
Architecture | 24 layers, 2048 hidden size, 16 attention heads |
Context Length | 4096 tokens |
License | Apache 2.0 |
What is llm-jp-3-1.8b?
llm-jp-3-1.8b is a bilingual Japanese-English language model developed by the Research and Development Center for Large Language Models at the National Institute of Informatics. Trained on an impressive 2.1T tokens, it represents a significant advancement in multilingual AI capabilities, particularly for Japanese language processing.
Implementation Details
The model utilizes a modern transformer architecture with 24 layers, 2048 hidden dimensions, and 16 attention heads. It employs a unique tokenizer based on the Unigram byte-fallback model, specifically optimized for Japanese and English text processing. The model requires PyTorch 2.3.0+ and supports BF16 precision for efficient inference.
- Comprehensive training on diverse datasets including Wikipedia, Common Crawl, and specialized Japanese corpora
- 4096 token context window for handling longer sequences
- Efficient architecture with 407M embedding parameters and 1.46B non-embedding parameters
Core Capabilities
- Bilingual text generation in Japanese and English
- Strong performance in various tasks including question answering, reading comprehension, and natural language inference
- Code generation capabilities across multiple programming languages
- Achieves 0.3767 average score on llm-jp-eval benchmark
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its specialized optimization for Japanese language processing while maintaining strong English capabilities. It's part of a larger family of models ranging from 1.8B to 172B parameters, offering different size-performance trade-offs.
Q: What are the recommended use cases?
The model is well-suited for tasks including text generation, translation assistance, code generation, and general language understanding in both Japanese and English contexts. It's particularly effective for applications requiring balanced bilingual capabilities.