e5-base-sts-en-de

Maintained By
danielheinz

e5-base-sts-en-de

PropertyValue
Parameter Count278M
LicenseMIT
Tensor TypeF32
Best Performance0.904 Spearman correlation

What is e5-base-sts-en-de?

e5-base-sts-en-de is a specialized language model fine-tuned for semantic textual similarity tasks, particularly focused on German language processing. Based on the multilingual-e5-base architecture, this model has been specifically optimized to understand and compare the meaning of text passages in German.

Implementation Details

The model implements a two-stage training approach: first utilizing Multiple Negatives Ranking Loss on paraphrase datasets, followed by Cosine Similarity Loss training on semantic textual similarity datasets. It's built on XLM-RoBERTa architecture and fine-tuned using three key datasets: German paraphrase corpus, PAWS-X, and STSB-Multi-MT.

  • 278M parameters for robust language understanding
  • F32 tensor type for precise computations
  • Achieves 0.920 on STSB validation subset
  • 0.904 Spearman correlation on test data

Core Capabilities

  • Semantic similarity assessment for German text
  • Cross-lingual text comparison
  • Paraphrase detection and evaluation
  • Feature extraction for NLP tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized fine-tuning for German semantic similarity tasks, combining multiple training datasets and achieving state-of-the-art performance on the STSB benchmark.

Q: What are the recommended use cases?

This model is ideal for applications requiring semantic similarity comparison in German text, including document similarity analysis, paraphrase detection, and cross-lingual text matching between German and English content.

The first platform built for prompt engineering