RWKV7-Goose-World3-2.9B-HF

Property	Value
Parameter Count	2.9B
License	Apache-2.0
Tokenizer	RWKV World tokenizer (65,536 vocab)
Training Tokens	3.119 trillion
Final Loss	1.8745

What is RWKV7-Goose-World3-2.9B-HF?

RWKV7-Goose-World3-2.9B-HF is an advanced language model developed by the RWKV Project under the LF AI & Data Foundation. It represents a significant evolution in the RWKV architecture, implementing flash-linear attention format for improved efficiency and performance. The model was trained on a massive dataset of 3.119 trillion tokens using World v3 data.

Implementation Details

The model utilizes a sophisticated training regime with bfloat16 precision and employs a delayed cosine decay learning rate schedule ranging from 4e-4 to 1e-5, combined with a weight decay of 0.1. The implementation leverages the flash-linear-attention framework and requires the latest version of the transformers library (>=4.48.0) for optimal performance.

Flash-linear attention architecture for efficient processing
Custom RWKV World tokenizer with 65,536 vocabulary size
Optimized for English language tasks
Implements advanced training techniques with varying batch sizes

Core Capabilities

Large-scale text generation and completion
Efficient processing with flash-linear attention
Seamless integration with HuggingFace transformers library
Support for chat-template formatting and generation

Frequently Asked Questions

Q: What makes this model unique?

The model combines the innovative RWKV7 architecture with flash-linear attention, providing efficient processing while maintaining high performance. Its training on 3.119 trillion tokens and custom World tokenizer makes it particularly effective for English language tasks.

Q: What are the recommended use cases?

The model is well-suited for text generation, completion tasks, and chatbot applications. It can be easily integrated into existing pipelines using the HuggingFace transformers library and supports sophisticated chat templating.