longformer-gottbert-base-8192-aw512

Property	Value
Parameter Count	153M
Language	German
Maximum Sequence Length	8192 tokens
Attention Window Size	512 tokens
Training Data	OSCAR Corpus (500M tokens)

What is longformer-gottbert-base-8192-aw512?

This is a German language model based on the Longformer architecture, specifically adapted from GottBERT-base. It's designed to process exceptionally long text sequences up to 8192 tokens, making it particularly valuable for tasks requiring long-document understanding. The model was trained on a carefully curated 500-million token subset of the German OSCAR corpus.

Implementation Details

The model implements a hybrid attention mechanism, combining local attention windows of 512 tokens with task-specific global attention. It was trained using masked language modeling for 3 epochs, achieving a final validation loss of 1.4981. The training process utilized mixed-precision training with Native AMP and employed the Adam optimizer with carefully tuned hyperparameters.

Initialized from GottBERT-base weights
Training batch size: 16 (with gradient accumulation)
Learning rate: 3e-05 with linear scheduling
Trained using PyTorch 1.10.1 and Transformers 4.15.0

Core Capabilities

Long document processing (up to 8192 tokens)
Efficient attention mechanism with 512-token windows
Optimized for German language understanding
Suitable for feature extraction tasks

Frequently Asked Questions

Q: What makes this model unique?

This model combines the benefits of the Longformer architecture with German language optimization, offering an efficient solution for processing long German texts while maintaining computational efficiency through its hybrid attention mechanism.

Q: What are the recommended use cases?

The model is particularly well-suited for tasks involving long German documents, including document classification, feature extraction, and analysis of lengthy texts such as academic papers, legal documents, or technical documentation.