GIST-small-Embedding-v0
Property | Value |
---|---|
Parameter Count | 33.4M |
License | MIT |
Paper | GISTEmbed Paper |
Base Model | BAAI/bge-small-en-v1.5 |
What is GIST-small-Embedding-v0?
GIST-small-Embedding-v0 is a specialized text embedding model that implements the Guided In-sample Selection of Training Negatives (GIST) approach. Fine-tuned on the BAAI/bge-small-en-v1.5 base model, it leverages both the MEDI dataset and MTEB Classification training data to generate high-quality text embeddings without requiring explicit instructions.
Implementation Details
The model was trained with specific parameters including 40 epochs, 0.1 warmup ratio, and 5e-6 learning rate. It employs a contrastive loss temperature of 0.01 and uses a batch size of 16. The architecture is optimized for generating semantic embeddings that can be used directly for various NLP tasks.
- No instruction required for embedding generation
- Trained on combined MEDI and MTEB Classification datasets
- Optimized checkpoint selection at 102,000 steps
Core Capabilities
- Strong performance in semantic similarity tasks
- Effective for classification and clustering applications
- Robust performance in retrieval tasks
- High accuracy in pair classification scenarios
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its ability to generate high-quality embeddings without requiring instructions, while leveraging the GIST approach for optimal negative sample selection during training. This makes it particularly efficient for practical applications.
Q: What are the recommended use cases?
The model excels in semantic similarity tasks, document retrieval, clustering, and classification applications. It's particularly well-suited for scenarios where instruction-free embedding generation is desired.