persian-embeddings

Property	Value
Parameter Count	560M parameters
License	Apache 2.0
Base Model	XLM-RoBERTa-base
Languages	Persian, English
Output Dimensions	1024

What is persian-embeddings?

persian-embeddings is a specialized sentence transformer model designed for generating high-quality embeddings for both Persian and English text. Built on FacebookAI's XLM-RoBERTa architecture, it maps sentences and paragraphs to a 1024-dimensional vector space, enabling powerful semantic search and clustering capabilities.

Implementation Details

The model leverages the sentence-transformers framework and can be easily implemented using either the sentence-transformers library or HuggingFace Transformers. It employs mean pooling for generating sentence embeddings and supports batch processing with proper attention mask handling.

Built on XLM-RoBERTa base architecture
Generates 1024-dimensional dense vectors
Supports both Persian and English text processing
Implements F32 tensor type for precise computations

Core Capabilities

Bilingual text embedding generation
Semantic search functionality
Text clustering support
Cross-lingual capabilities
Efficient batch processing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its bilingual capabilities in Persian and English, making it particularly valuable for cross-lingual applications and Persian language processing tasks. The 1024-dimensional output vectors provide rich semantic representations suitable for various downstream tasks.

Q: What are the recommended use cases?

The model is ideal for semantic search applications, document clustering, text similarity analysis, and cross-lingual information retrieval. It's particularly useful in applications requiring understanding of both Persian and English content.