longformer-gottbert-base-8192-aw512
Property | Value |
---|---|
Parameter Count | 153M |
Language | German |
Maximum Sequence Length | 8192 tokens |
Attention Window Size | 512 tokens |
Training Data | OSCAR Corpus (500M tokens) |
What is longformer-gottbert-base-8192-aw512?
This is a German language model based on the Longformer architecture, specifically adapted from GottBERT-base. It's designed to process exceptionally long text sequences up to 8192 tokens, making it particularly valuable for tasks requiring long-document understanding. The model was trained on a carefully curated 500-million token subset of the German OSCAR corpus.
Implementation Details
The model implements a hybrid attention mechanism, combining local attention windows of 512 tokens with task-specific global attention. It was trained using masked language modeling for 3 epochs, achieving a final validation loss of 1.4981. The training process utilized mixed-precision training with Native AMP and employed the Adam optimizer with carefully tuned hyperparameters.
- Initialized from GottBERT-base weights
- Training batch size: 16 (with gradient accumulation)
- Learning rate: 3e-05 with linear scheduling
- Trained using PyTorch 1.10.1 and Transformers 4.15.0
Core Capabilities
- Long document processing (up to 8192 tokens)
- Efficient attention mechanism with 512-token windows
- Optimized for German language understanding
- Suitable for feature extraction tasks
Frequently Asked Questions
Q: What makes this model unique?
This model combines the benefits of the Longformer architecture with German language optimization, offering an efficient solution for processing long German texts while maintaining computational efficiency through its hybrid attention mechanism.
Q: What are the recommended use cases?
The model is particularly well-suited for tasks involving long German documents, including document classification, feature extraction, and analysis of lengthy texts such as academic papers, legal documents, or technical documentation.