L3-8B-Stheno-v3.2
Property | Value |
---|---|
Parameter Count | 8.03B |
Model Type | Text Generation |
Architecture | LLaMA-based Transformer |
License | CC-BY-NC-4.0 |
Tensor Type | BF16 |
What is L3-8B-Stheno-v3.2?
L3-8B-Stheno-v3.2 is a sophisticated language model developed by Sao10K, representing the sixth iteration of the Stheno series. Trained on an H100 SXM GPU over approximately 24 hours, this model combines creative writing capabilities with assistant-style functionality. It's built upon the LLaMA architecture and has been fine-tuned using four carefully curated datasets, including writing prompts, instruct data, and filtered conversational logs.
Implementation Details
The model employs specific sampling parameters for optimal performance, including a recommended temperature range of 1.12-1.22, Min-P of 0.075, Top-K of 50, and a repetition penalty of 1.1. It utilizes the LLaMA-3-Instruct prompting template and includes specialized system prompts for roleplay scenarios.
- Trained on multiple high-quality datasets including Opus-WritingPrompts and Claude-3-Opus-Instruct-15K
- Implements BF16 tensor format for efficient computation
- Features improved hyperparameters resulting in lower loss levels
- Includes sophisticated stopping mechanisms for coherent text generation
Core Capabilities
- Enhanced narrative and storywriting abilities
- Balanced handling of SFW and NSFW content
- Improved multi-turn coherency in conversations
- Better prompt and instruction adherence
- Assistant-style task handling
- Role-playing and character immersion
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its balanced approach to content generation, improved narrative capabilities, and enhanced instruction following compared to previous versions. It represents a careful trade-off between creativity and reliability.
Q: What are the recommended use cases?
The model excels in creative writing, storytelling, roleplay scenarios, and assistant-style tasks. It's particularly well-suited for applications requiring both creative expression and structured response generation.