sentence-camembert-base

Maintained By
dangvantuan

sentence-camembert-base

PropertyValue
Parameter Count111M
LicenseApache 2.0
PaperResearch Paper
LanguageFrench
TaskSentence Similarity

What is sentence-camembert-base?

sentence-camembert-base is a specialized French language model designed for generating high-quality sentence embeddings. Built upon Facebook's CamemBERT architecture, this model has been fine-tuned using Siamese BERT-Networks on the STSB dataset to excel at sentence similarity tasks. With 111M parameters, it demonstrates strong performance, achieving 82.36% Pearson correlation on the test set.

Implementation Details

The model leverages the sentence-transformers framework and can be easily implemented using the SentenceTransformer library. It processes French text input and generates dense vector representations that capture semantic meaning.

  • Built on facebook/camembert-base architecture
  • Fine-tuned using Siamese BERT-Networks
  • Optimized for French language understanding
  • Supports batch processing of sentences

Core Capabilities

  • Sentence embedding generation for French text
  • Semantic similarity computation
  • Outperforms multilingual alternatives (e.g., distiluse-base-multilingual-cased)
  • Achieves 86.73% Pearson correlation on dev set

Frequently Asked Questions

Q: What makes this model unique?

This model specifically targets French language sentence embeddings and achieves state-of-the-art performance, surpassing multilingual alternatives by a significant margin (82.36% vs 78.62% Pearson correlation on test set).

Q: What are the recommended use cases?

The model is ideal for French language tasks including semantic textual similarity, document classification, clustering, and information retrieval. It's particularly well-suited for applications requiring precise understanding of sentence relationships in French text.

The first platform built for prompt engineering