Bagel-34b-v0.2
Property | Value |
---|---|
Parameter Count | 34.4B |
Base Model | Yi-34b-200k |
License | Apache 2.0 |
Tensor Type | BF16 |
What is bagel-34b-v0.2?
Bagel-34b-v0.2 is an experimental fine-tuned version of the Yi-34b-200k model, specifically designed to excel in creative writing and roleplay applications. This model represents the SFT (Supervised Fine-Tuning) phase before DPO implementation, trained on an extensive collection of 29 diverse datasets ranging from conversation and coding to mathematics and emotional understanding.
Implementation Details
The model employs a unique multi-format prompting system, supporting four different prompt formats: Vicuna, Llama-2, Alpaca, and ChatML. Each instruction is converted into all four formats during training, effectively quadrupling the exposure to each training example. The training process utilizes a conservative approach with a single epoch and low learning rate to prevent overfitting.
- Supports multiple prompt formats for enhanced flexibility
- Trained on 29 carefully curated datasets
- Implements decontamination using approximate nearest neighbor search
- Optimized for creative and conversational tasks
Core Capabilities
- Advanced creative writing and roleplay generation
- Multi-lingual comprehension and response
- Code generation in multiple programming languages
- Mathematical problem solving
- Emotional understanding and context awareness
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its multi-format prompt training approach, combined with an extensive and diverse training dataset collection. Unlike many other models, it maintains creative capabilities while incorporating technical knowledge from various domains.
Q: What are the recommended use cases?
The model excels in creative writing, roleplay scenarios, and general conversation. It's particularly well-suited for applications requiring a balance of creativity and technical knowledge, such as story writing, character interaction, and educational content generation.