persian-embeddings
Property | Value |
---|---|
Parameter Count | 560M parameters |
License | Apache 2.0 |
Base Model | XLM-RoBERTa-base |
Languages | Persian, English |
Output Dimensions | 1024 |
What is persian-embeddings?
persian-embeddings is a specialized sentence transformer model designed for generating high-quality embeddings for both Persian and English text. Built on FacebookAI's XLM-RoBERTa architecture, it maps sentences and paragraphs to a 1024-dimensional vector space, enabling powerful semantic search and clustering capabilities.
Implementation Details
The model leverages the sentence-transformers framework and can be easily implemented using either the sentence-transformers library or HuggingFace Transformers. It employs mean pooling for generating sentence embeddings and supports batch processing with proper attention mask handling.
- Built on XLM-RoBERTa base architecture
- Generates 1024-dimensional dense vectors
- Supports both Persian and English text processing
- Implements F32 tensor type for precise computations
Core Capabilities
- Bilingual text embedding generation
- Semantic search functionality
- Text clustering support
- Cross-lingual capabilities
- Efficient batch processing
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its bilingual capabilities in Persian and English, making it particularly valuable for cross-lingual applications and Persian language processing tasks. The 1024-dimensional output vectors provide rich semantic representations suitable for various downstream tasks.
Q: What are the recommended use cases?
The model is ideal for semantic search applications, document clustering, text similarity analysis, and cross-lingual information retrieval. It's particularly useful in applications requiring understanding of both Persian and English content.