PLaMo-13B
Property | Value |
---|---|
Parameter Count | 13.1B |
License | Apache v2.0 |
Context Length | 4096 tokens |
Training Tokens | 1.5T (1.32T English, 0.18T Japanese) |
Paper | Research Paper |
What is PLaMo-13B?
PLaMo-13B is a sophisticated bilingual language model developed by Preferred Networks, Inc., built upon the LLaMA architecture. It's specifically designed to handle both English and Japanese languages effectively, making it a powerful tool for multilingual applications.
Implementation Details
The model utilizes a causal decoder-only architecture and employs a custom sentencepiece tokenizer trained on a subset of the pre-training datasets. It's implemented using the Transformers library and supports both CPU and GPU inference.
- Trained on diverse datasets including C4, Project Gutenberg, RedPajama, and Japanese Wikipedia
- Uses BF16 tensor type for efficient computation
- Supports a context window of 4096 tokens
Core Capabilities
- Bilingual text generation in English and Japanese
- High-quality language understanding and generation
- Flexible integration through Hugging Face Transformers
- Support for various text generation parameters (temperature, top-k, top-p)
Frequently Asked Questions
Q: What makes this model unique?
PLaMo-13B stands out for its balanced bilingual capabilities, having been trained on both English (1.32T tokens) and Japanese (0.18T tokens) datasets, making it particularly effective for applications requiring both languages.
Q: What are the recommended use cases?
The model is well-suited for text generation tasks in both English and Japanese, including content creation, translation assistance, and general language understanding applications. However, safety testing is recommended before deployment in production environments.