all-mpnet-base-v2-embedding-all
Property | Value |
---|---|
Parameter Count | 109M |
License | Apache 2.0 |
Framework | PyTorch, Sentence-Transformers |
Training Datasets | 7 (including SQuAD, NewsQA, FIQA) |
What is all-mpnet-base-v2-embedding-all?
This is a fine-tuned version of the all-mpnet-base-v2 model specifically optimized for sentence embedding tasks. Developed as part of a Master's Thesis on service information systems, it has been trained on seven diverse datasets to enhance its sentence similarity capabilities.
Implementation Details
The model leverages advanced training techniques including D-Adaptation and mixed-precision training (bf16). It was trained for 15 epochs with an AdamW optimizer, achieving a final training loss of 0.012 and validation loss of 0.0377.
- Effective batch size: 180
- Learning rate: 1.0
- Weight decay: 0.02
- Warmup enabled
Core Capabilities
- Sentence similarity assessment with top-1 accuracy of 0.385
- Feature extraction for text embeddings
- Multi-dataset optimization for robust performance
- Easy integration with sentence-transformers library
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its comprehensive training on seven diverse datasets and its optimization for sentence embedding tasks, making it particularly effective for service information systems and question-answering applications.
Q: What are the recommended use cases?
The model is ideal for sentence similarity tasks, semantic search, and feature extraction in natural language processing applications. It's particularly well-suited for question-answering systems and document retrieval tasks.