RWKV7-Goose-World3-2.9B-HF
Property | Value |
---|---|
Parameter Count | 2.9B |
License | Apache-2.0 |
Tokenizer | RWKV World tokenizer (65,536 vocab) |
Training Tokens | 3.119 trillion |
Final Loss | 1.8745 |
What is RWKV7-Goose-World3-2.9B-HF?
RWKV7-Goose-World3-2.9B-HF is an advanced language model developed by the RWKV Project under the LF AI & Data Foundation. It represents a significant evolution in the RWKV architecture, implementing flash-linear attention format for improved efficiency and performance. The model was trained on a massive dataset of 3.119 trillion tokens using World v3 data.
Implementation Details
The model utilizes a sophisticated training regime with bfloat16 precision and employs a delayed cosine decay learning rate schedule ranging from 4e-4 to 1e-5, combined with a weight decay of 0.1. The implementation leverages the flash-linear-attention framework and requires the latest version of the transformers library (>=4.48.0) for optimal performance.
- Flash-linear attention architecture for efficient processing
- Custom RWKV World tokenizer with 65,536 vocabulary size
- Optimized for English language tasks
- Implements advanced training techniques with varying batch sizes
Core Capabilities
- Large-scale text generation and completion
- Efficient processing with flash-linear attention
- Seamless integration with HuggingFace transformers library
- Support for chat-template formatting and generation
Frequently Asked Questions
Q: What makes this model unique?
The model combines the innovative RWKV7 architecture with flash-linear attention, providing efficient processing while maintaining high performance. Its training on 3.119 trillion tokens and custom World tokenizer makes it particularly effective for English language tasks.
Q: What are the recommended use cases?
The model is well-suited for text generation, completion tasks, and chatbot applications. It can be easily integrated into existing pipelines using the HuggingFace transformers library and supports sophisticated chat templating.