llm-jp-3-1.8b

Maintained By
llm-jp

llm-jp-3-1.8b

PropertyValue
Parameter Count1.8B
Model TypeTransformer-based Language Model
Architecture24 layers, 2048 hidden size, 16 attention heads
Context Length4096 tokens
LicenseApache 2.0

What is llm-jp-3-1.8b?

llm-jp-3-1.8b is a bilingual Japanese-English language model developed by the Research and Development Center for Large Language Models at the National Institute of Informatics. Trained on an impressive 2.1T tokens, it represents a significant advancement in multilingual AI capabilities, particularly for Japanese language processing.

Implementation Details

The model utilizes a modern transformer architecture with 24 layers, 2048 hidden dimensions, and 16 attention heads. It employs a unique tokenizer based on the Unigram byte-fallback model, specifically optimized for Japanese and English text processing. The model requires PyTorch 2.3.0+ and supports BF16 precision for efficient inference.

  • Comprehensive training on diverse datasets including Wikipedia, Common Crawl, and specialized Japanese corpora
  • 4096 token context window for handling longer sequences
  • Efficient architecture with 407M embedding parameters and 1.46B non-embedding parameters

Core Capabilities

  • Bilingual text generation in Japanese and English
  • Strong performance in various tasks including question answering, reading comprehension, and natural language inference
  • Code generation capabilities across multiple programming languages
  • Achieves 0.3767 average score on llm-jp-eval benchmark

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its specialized optimization for Japanese language processing while maintaining strong English capabilities. It's part of a larger family of models ranging from 1.8B to 172B parameters, offering different size-performance trade-offs.

Q: What are the recommended use cases?

The model is well-suited for tasks including text generation, translation assistance, code generation, and general language understanding in both Japanese and English contexts. It's particularly effective for applications requiring balanced bilingual capabilities.

The first platform built for prompt engineering