llm-jp-3-1.8b

Maintained By
llm-jp

llm-jp-3-1.8b

PropertyValue
Parameter Count1.8B
Model TypeTransformer-based Language Model
Architecture24 layers, 2048 hidden size, 16 attention heads
Context Length4096 tokens
LicenseApache 2.0

What is llm-jp-3-1.8b?

llm-jp-3-1.8b is a bilingual Japanese-English language model developed by the Research and Development Center for Large Language Models at the National Institute of Informatics. Trained on an impressive 2.1T tokens, it represents a significant advancement in multilingual AI capabilities, particularly for Japanese language processing.

Implementation Details

The model utilizes a modern transformer architecture with 24 layers, 2048 hidden dimensions, and 16 attention heads. It employs a unique tokenizer based on the Unigram byte-fallback model, specifically optimized for Japanese and English text processing. The model requires PyTorch 2.3.0+ and supports BF16 precision for efficient inference.

  • Comprehensive training on diverse datasets including Wikipedia, Common Crawl, and specialized Japanese corpora
  • 4096 token context window for handling longer sequences
  • Efficient architecture with 407M embedding parameters and 1.46B non-embedding parameters

Core Capabilities

  • Bilingual text generation in Japanese and English
  • Strong performance in various tasks including question answering, reading comprehension, and natural language inference
  • Code generation capabilities across multiple programming languages
  • Achieves 0.3767 average score on llm-jp-eval benchmark

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its specialized optimization for Japanese language processing while maintaining strong English capabilities. It's part of a larger family of models ranging from 1.8B to 172B parameters, offering different size-performance trade-offs.

Q: What are the recommended use cases?

The model is well-suited for tasks including text generation, translation assistance, code generation, and general language understanding in both Japanese and English contexts. It's particularly effective for applications requiring balanced bilingual capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.