shisa-v1-llama3-8b
Property | Value |
---|---|
Base Model | Meta-Llama-3-8B-Instruct |
Learning Rate | 8e-6 |
Training Epochs | 3 |
Model Type | LlamaForCausalLM |
HuggingFace Link | shisa-ai/shisa-v1-llama3-8b |
What is shisa-v1-llama3-8b?
shisa-v1-llama3-8b is a fine-tuned version of Meta's Llama 3 8B model, specifically optimized for Japanese-English language tasks. The model demonstrates impressive performance across multiple benchmarks, achieving an average score of 6.59 across ELYZA100, JA MT-Bench, Rakuda, and Tengu-Bench evaluations.
Implementation Details
The model was trained using the Axolotl framework (version 0.4.0) with a sequence length of 8192 and employs advanced features such as gradient checkpointing and flash attention. Training was conducted using the ultra-orca-boros-en-ja-v1 dataset with a learning rate of 8e-6, which proved optimal among various tested configurations.
- Uses 8-bit AdamW optimizer with linear learning rate scheduling
- Implements gradient accumulation over 8 steps
- Trained with mixed precision (BF16) and flash attention
- Achieves 91.30% Japanese character accuracy
Core Capabilities
- Strong performance on ELYZA100 (6.67 score)
- Excellent MT-Bench results (6.95 score)
- Robust Rakuda benchmark performance (7.05 score)
- Competitive positioning among other Japanese-capable models
Frequently Asked Questions
Q: What makes this model unique?
The model represents a sweet spot in the performance-size trade-off, achieving strong results with only 8B parameters. Its carefully tuned learning rate of 8e-6 proved optimal among several tested configurations, demonstrating superior performance compared to other learning rates.
Q: What are the recommended use cases?
The model is particularly well-suited for Japanese-English bilingual tasks, showing strong performance in translation, comprehension, and general language understanding. It's positioned as a practical option for applications requiring reliable Japanese language capabilities without the computational overhead of larger models.