longformer-gottbert-base-8192-aw512

Maintained By
LennartKeller

longformer-gottbert-base-8192-aw512

PropertyValue
Parameter Count153M
LanguageGerman
Maximum Sequence Length8192 tokens
Attention Window Size512 tokens
Training DataOSCAR Corpus (500M tokens)

What is longformer-gottbert-base-8192-aw512?

This is a German language model based on the Longformer architecture, specifically adapted from GottBERT-base. It's designed to process exceptionally long text sequences up to 8192 tokens, making it particularly valuable for tasks requiring long-document understanding. The model was trained on a carefully curated 500-million token subset of the German OSCAR corpus.

Implementation Details

The model implements a hybrid attention mechanism, combining local attention windows of 512 tokens with task-specific global attention. It was trained using masked language modeling for 3 epochs, achieving a final validation loss of 1.4981. The training process utilized mixed-precision training with Native AMP and employed the Adam optimizer with carefully tuned hyperparameters.

  • Initialized from GottBERT-base weights
  • Training batch size: 16 (with gradient accumulation)
  • Learning rate: 3e-05 with linear scheduling
  • Trained using PyTorch 1.10.1 and Transformers 4.15.0

Core Capabilities

  • Long document processing (up to 8192 tokens)
  • Efficient attention mechanism with 512-token windows
  • Optimized for German language understanding
  • Suitable for feature extraction tasks

Frequently Asked Questions

Q: What makes this model unique?

This model combines the benefits of the Longformer architecture with German language optimization, offering an efficient solution for processing long German texts while maintaining computational efficiency through its hybrid attention mechanism.

Q: What are the recommended use cases?

The model is particularly well-suited for tasks involving long German documents, including document classification, feature extraction, and analysis of lengthy texts such as academic papers, legal documents, or technical documentation.

The first platform built for prompt engineering