Dolphin 2.9 LLaMA3 8B
Property | Value |
---|---|
Parameter Count | 8.03B |
Base Model | Meta-LLaMA-3-8B |
Context Length | 4096 tokens |
License | META LLAMA 3 COMMUNITY LICENSE |
Training Duration | 2.5 days on 8x L40S |
What is dolphin-2.9-llama3-8b?
Dolphin 2.9 is an advanced language model built on Meta's LLaMA3 architecture, fine-tuned by Eric Hartford, Lucas Atkins, and Fernando Fernandes at Cognitive Computations. This model represents a significant advancement in conversational AI, combining instruction-following capabilities with coding expertise and uncensored responses.
Implementation Details
The model underwent full-weight fine-tuning with a 4k sequence length, utilizing the ChatML prompt template format. Training was conducted using 8x L40S GPUs provided by Crusoe Cloud, incorporating multiple high-quality datasets including OpenHermes-2.5, CodeFeedback, and UltraChat.
- Trained using Axolotl framework version 0.4.0
- Implements BF16 precision and flash attention
- Uses cosine learning rate scheduler with 2e-5 learning rate
- Trained for 3 epochs with gradient checkpointing
Core Capabilities
- Advanced conversational abilities with ChatML format support
- Strong coding and technical task performance
- Function calling support
- Initial agentic capabilities
- Uncensored responses (requires careful deployment consideration)
- Mathematical problem-solving abilities
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its combination of uncensored capabilities, diverse training datasets, and optimal balance between size and performance. It's particularly notable for its implementation of the latest LLaMA3 architecture with carefully curated training data.
Q: What are the recommended use cases?
This model is well-suited for conversational AI applications, coding assistance, technical documentation, and general instruction-following tasks. However, due to its uncensored nature, implementing appropriate safety layers is recommended for production deployments.