German GPT-2 Larger

Property	Value
Parameter Count	137M
License	MIT
Training Data	90GB GC4 Corpus
Architecture	GPT-2
Author	stefan-it

What is german-gpt2-larger?

German-gpt2-larger is a specialized language model trained on the German Colossal Clean Crawled Corpus (GC4). Built upon the dbmdz/german-gpt2 foundation, this model represents a significant advancement in German language processing, offering enhanced text generation capabilities while maintaining ethical considerations in its implementation.

Implementation Details

The model was trained on a v3-8 TPU for approximately 17 days over 20 epochs. It utilizes the same tokenizer and vocabulary as dbmdz/german-gpt2, with training parameters including a block size of 512, batch size of 16, and learning rate of 5e-3. The training corpus encompasses approximately 90GB of carefully curated German text data.

Built on dbmdz/german-gpt2 backbone
Trained with TPU v3-8 infrastructure
Implements advanced tokenization specific to German language
Optimized for research and fine-tuning applications

Core Capabilities

German text generation with high coherence
Fine-tuning compatibility for specific use-cases
Research-focused implementation with bias awareness
Efficient text processing with 137M parameters

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its extensive training on 90GB of clean German text data and its focus on research applications, particularly in studying and addressing bias in language models.

Q: What are the recommended use cases?

The model is primarily intended for research purposes and as a foundation for fine-tuning on specific tasks. It's particularly suitable for studying language model behavior, bias analysis, and German text generation tasks.