bagel-dpo-34b-v0.2
Property | Value |
---|---|
Parameter Count | 34.4B |
Model Type | Language Model (LLaMA Architecture) |
License | Yi License |
Tensor Type | BF16 |
What is bagel-dpo-34b-v0.2?
bagel-dpo-34b-v0.2 is an experimental fine-tuned version of the Yi-34B-200K model, developed using the bagel training framework. This model stands out for its implementation of Direct Preference Optimization (DPO) and its training on an extensive collection of 29 diverse datasets. The model is designed to be less restricted in its responses while maintaining high-quality output across various tasks.
Implementation Details
The model employs a unique multi-format prompting system that incorporates four different prompt styles: Vicuna, LLaMA-2, Alpaca, and ChatML. Each instruction is processed through all four formats during training, effectively quadrupling the exposure to training data. The model uses BF16 precision and has been trained with careful attention to decontamination using approximate nearest neighbor search.
- Trained on 29 carefully curated datasets spanning multiple domains
- Implements four different prompt formats for enhanced versatility
- Uses Direct Preference Optimization for improved output quality
- Incorporates both standard and toxic DPO datasets for reduced censorship
Core Capabilities
- Advanced coding abilities through multiple programming datasets
- Strong mathematical reasoning from specialized math instruction sets
- Multi-lingual comprehension capabilities
- Creative writing and roleplay abilities
- SQL and database query handling
- Uncensored response capability when appropriately prompted
Frequently Asked Questions
Q: What makes this model unique?
This model's uniqueness stems from its comprehensive training approach using multiple prompt formats and its integration of both standard and specialized DPO datasets. It offers reduced censorship compared to similar models while maintaining high-quality outputs across various tasks.
Q: What are the recommended use cases?
The model is suited for a wide range of applications including coding, mathematical problem-solving, creative writing, and general conversation. It's particularly useful in scenarios requiring detailed, unrestricted responses while maintaining output quality.