GIST-small-Embedding-v0

Property	Value
Parameter Count	33.4M
License	MIT
Paper	GISTEmbed Paper
Base Model	BAAI/bge-small-en-v1.5

What is GIST-small-Embedding-v0?

GIST-small-Embedding-v0 is a specialized text embedding model that implements the Guided In-sample Selection of Training Negatives (GIST) approach. Fine-tuned on the BAAI/bge-small-en-v1.5 base model, it leverages both the MEDI dataset and MTEB Classification training data to generate high-quality text embeddings without requiring explicit instructions.

Implementation Details

The model was trained with specific parameters including 40 epochs, 0.1 warmup ratio, and 5e-6 learning rate. It employs a contrastive loss temperature of 0.01 and uses a batch size of 16. The architecture is optimized for generating semantic embeddings that can be used directly for various NLP tasks.

No instruction required for embedding generation
Trained on combined MEDI and MTEB Classification datasets
Optimized checkpoint selection at 102,000 steps

Core Capabilities

Strong performance in semantic similarity tasks
Effective for classification and clustering applications
Robust performance in retrieval tasks
High accuracy in pair classification scenarios

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its ability to generate high-quality embeddings without requiring instructions, while leveraging the GIST approach for optimal negative sample selection during training. This makes it particularly efficient for practical applications.

Q: What are the recommended use cases?

The model excels in semantic similarity tasks, document retrieval, clustering, and classification applications. It's particularly well-suited for scenarios where instruction-free embedding generation is desired.