sentence-camembert-base
Property | Value |
---|---|
Parameter Count | 111M |
License | Apache 2.0 |
Paper | Research Paper |
Language | French |
Task | Sentence Similarity |
What is sentence-camembert-base?
sentence-camembert-base is a specialized French language model designed for generating high-quality sentence embeddings. Built upon Facebook's CamemBERT architecture, this model has been fine-tuned using Siamese BERT-Networks on the STSB dataset to excel at sentence similarity tasks. With 111M parameters, it demonstrates strong performance, achieving 82.36% Pearson correlation on the test set.
Implementation Details
The model leverages the sentence-transformers framework and can be easily implemented using the SentenceTransformer library. It processes French text input and generates dense vector representations that capture semantic meaning.
- Built on facebook/camembert-base architecture
- Fine-tuned using Siamese BERT-Networks
- Optimized for French language understanding
- Supports batch processing of sentences
Core Capabilities
- Sentence embedding generation for French text
- Semantic similarity computation
- Outperforms multilingual alternatives (e.g., distiluse-base-multilingual-cased)
- Achieves 86.73% Pearson correlation on dev set
Frequently Asked Questions
Q: What makes this model unique?
This model specifically targets French language sentence embeddings and achieves state-of-the-art performance, surpassing multilingual alternatives by a significant margin (82.36% vs 78.62% Pearson correlation on test set).
Q: What are the recommended use cases?
The model is ideal for French language tasks including semantic textual similarity, document classification, clustering, and information retrieval. It's particularly well-suited for applications requiring precise understanding of sentence relationships in French text.