paraphrase-MiniLM-L12-v2
Property | Value |
---|---|
Parameter Count | 33.4M |
License | Apache 2.0 |
Paper | Sentence-BERT Paper |
Output Dimensions | 384 |
What is paraphrase-MiniLM-L12-v2?
paraphrase-MiniLM-L12-v2 is a powerful sentence transformer model designed to create meaningful sentence embeddings. It converts sentences and paragraphs into 384-dimensional dense vector representations, making it ideal for tasks like semantic similarity comparison, clustering, and information retrieval.
Implementation Details
The model is built on the sentence-transformers framework and utilizes a MiniLM architecture. It implements a two-step process: first passing input through a transformer model, followed by a pooling operation on the contextualized word embeddings. The model supports a maximum sequence length of 128 tokens and includes both mean pooling and attention-aware token processing.
- Efficient architecture with only 33.4M parameters
- 384-dimensional output embeddings
- Supports both PyTorch and TensorFlow implementations
- Compatible with ONNX, Safetensors, and OpenVINO
Core Capabilities
- Sentence and paragraph embedding generation
- Semantic similarity computation
- Text clustering and classification
- Information retrieval and semantic search
- Cross-lingual text comparison
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture that balances performance and resource usage. With only 33.4M parameters, it provides high-quality 384-dimensional embeddings suitable for production environments.
Q: What are the recommended use cases?
The model excels in semantic search applications, document similarity comparison, clustering related texts, and building semantic text retrieval systems. It's particularly effective for applications requiring efficient text similarity computations.