German GPT-2 Larger
Property | Value |
---|---|
Parameter Count | 137M |
License | MIT |
Training Data | 90GB GC4 Corpus |
Architecture | GPT-2 |
Author | stefan-it |
What is german-gpt2-larger?
German-gpt2-larger is a specialized language model trained on the German Colossal Clean Crawled Corpus (GC4). Built upon the dbmdz/german-gpt2 foundation, this model represents a significant advancement in German language processing, offering enhanced text generation capabilities while maintaining ethical considerations in its implementation.
Implementation Details
The model was trained on a v3-8 TPU for approximately 17 days over 20 epochs. It utilizes the same tokenizer and vocabulary as dbmdz/german-gpt2, with training parameters including a block size of 512, batch size of 16, and learning rate of 5e-3. The training corpus encompasses approximately 90GB of carefully curated German text data.
- Built on dbmdz/german-gpt2 backbone
- Trained with TPU v3-8 infrastructure
- Implements advanced tokenization specific to German language
- Optimized for research and fine-tuning applications
Core Capabilities
- German text generation with high coherence
- Fine-tuning compatibility for specific use-cases
- Research-focused implementation with bias awareness
- Efficient text processing with 137M parameters
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its extensive training on 90GB of clean German text data and its focus on research applications, particularly in studying and addressing bias in language models.
Q: What are the recommended use cases?
The model is primarily intended for research purposes and as a foundation for fine-tuning on specific tasks. It's particularly suitable for studying language model behavior, bias analysis, and German text generation tasks.