persian-embeddings

Maintained By
heydariAI

persian-embeddings

PropertyValue
Parameter Count560M parameters
LicenseApache 2.0
Base ModelXLM-RoBERTa-base
LanguagesPersian, English
Output Dimensions1024

What is persian-embeddings?

persian-embeddings is a specialized sentence transformer model designed for generating high-quality embeddings for both Persian and English text. Built on FacebookAI's XLM-RoBERTa architecture, it maps sentences and paragraphs to a 1024-dimensional vector space, enabling powerful semantic search and clustering capabilities.

Implementation Details

The model leverages the sentence-transformers framework and can be easily implemented using either the sentence-transformers library or HuggingFace Transformers. It employs mean pooling for generating sentence embeddings and supports batch processing with proper attention mask handling.

  • Built on XLM-RoBERTa base architecture
  • Generates 1024-dimensional dense vectors
  • Supports both Persian and English text processing
  • Implements F32 tensor type for precise computations

Core Capabilities

  • Bilingual text embedding generation
  • Semantic search functionality
  • Text clustering support
  • Cross-lingual capabilities
  • Efficient batch processing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its bilingual capabilities in Persian and English, making it particularly valuable for cross-lingual applications and Persian language processing tasks. The 1024-dimensional output vectors provide rich semantic representations suitable for various downstream tasks.

Q: What are the recommended use cases?

The model is ideal for semantic search applications, document clustering, text similarity analysis, and cross-lingual information retrieval. It's particularly useful in applications requiring understanding of both Persian and English content.

The first platform built for prompt engineering