sentence-camembert-base

Property	Value
Parameter Count	111M
License	Apache 2.0
Paper	Research Paper
Language	French
Task	Sentence Similarity

What is sentence-camembert-base?

sentence-camembert-base is a specialized French language model designed for generating high-quality sentence embeddings. Built upon Facebook's CamemBERT architecture, this model has been fine-tuned using Siamese BERT-Networks on the STSB dataset to excel at sentence similarity tasks. With 111M parameters, it demonstrates strong performance, achieving 82.36% Pearson correlation on the test set.

Implementation Details

The model leverages the sentence-transformers framework and can be easily implemented using the SentenceTransformer library. It processes French text input and generates dense vector representations that capture semantic meaning.

Built on facebook/camembert-base architecture
Fine-tuned using Siamese BERT-Networks
Optimized for French language understanding
Supports batch processing of sentences

Core Capabilities

Sentence embedding generation for French text
Semantic similarity computation
Outperforms multilingual alternatives (e.g., distiluse-base-multilingual-cased)
Achieves 86.73% Pearson correlation on dev set

Frequently Asked Questions

Q: What makes this model unique?

This model specifically targets French language sentence embeddings and achieves state-of-the-art performance, surpassing multilingual alternatives by a significant margin (82.36% vs 78.62% Pearson correlation on test set).

Q: What are the recommended use cases?

The model is ideal for French language tasks including semantic textual similarity, document classification, clustering, and information retrieval. It's particularly well-suited for applications requiring precise understanding of sentence relationships in French text.