shisa-v1-llama3-8b

Maintained By
shisa-ai

shisa-v1-llama3-8b

PropertyValue
Base ModelMeta-Llama-3-8B-Instruct
Learning Rate8e-6
Training Epochs3
Model TypeLlamaForCausalLM
HuggingFace Linkshisa-ai/shisa-v1-llama3-8b

What is shisa-v1-llama3-8b?

shisa-v1-llama3-8b is a fine-tuned version of Meta's Llama 3 8B model, specifically optimized for Japanese-English language tasks. The model demonstrates impressive performance across multiple benchmarks, achieving an average score of 6.59 across ELYZA100, JA MT-Bench, Rakuda, and Tengu-Bench evaluations.

Implementation Details

The model was trained using the Axolotl framework (version 0.4.0) with a sequence length of 8192 and employs advanced features such as gradient checkpointing and flash attention. Training was conducted using the ultra-orca-boros-en-ja-v1 dataset with a learning rate of 8e-6, which proved optimal among various tested configurations.

  • Uses 8-bit AdamW optimizer with linear learning rate scheduling
  • Implements gradient accumulation over 8 steps
  • Trained with mixed precision (BF16) and flash attention
  • Achieves 91.30% Japanese character accuracy

Core Capabilities

  • Strong performance on ELYZA100 (6.67 score)
  • Excellent MT-Bench results (6.95 score)
  • Robust Rakuda benchmark performance (7.05 score)
  • Competitive positioning among other Japanese-capable models

Frequently Asked Questions

Q: What makes this model unique?

The model represents a sweet spot in the performance-size trade-off, achieving strong results with only 8B parameters. Its carefully tuned learning rate of 8e-6 proved optimal among several tested configurations, demonstrating superior performance compared to other learning rates.

Q: What are the recommended use cases?

The model is particularly well-suited for Japanese-English bilingual tasks, showing strong performance in translation, comprehension, and general language understanding. It's positioned as a practical option for applications requiring reliable Japanese language capabilities without the computational overhead of larger models.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.