GIST-small-Embedding-v0

Maintained By
avsolatorio

GIST-small-Embedding-v0

PropertyValue
Parameter Count33.4M
LicenseMIT
PaperGISTEmbed Paper
Base ModelBAAI/bge-small-en-v1.5

What is GIST-small-Embedding-v0?

GIST-small-Embedding-v0 is a specialized text embedding model that implements the Guided In-sample Selection of Training Negatives (GIST) approach. Fine-tuned on the BAAI/bge-small-en-v1.5 base model, it leverages both the MEDI dataset and MTEB Classification training data to generate high-quality text embeddings without requiring explicit instructions.

Implementation Details

The model was trained with specific parameters including 40 epochs, 0.1 warmup ratio, and 5e-6 learning rate. It employs a contrastive loss temperature of 0.01 and uses a batch size of 16. The architecture is optimized for generating semantic embeddings that can be used directly for various NLP tasks.

  • No instruction required for embedding generation
  • Trained on combined MEDI and MTEB Classification datasets
  • Optimized checkpoint selection at 102,000 steps

Core Capabilities

  • Strong performance in semantic similarity tasks
  • Effective for classification and clustering applications
  • Robust performance in retrieval tasks
  • High accuracy in pair classification scenarios

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its ability to generate high-quality embeddings without requiring instructions, while leveraging the GIST approach for optimal negative sample selection during training. This makes it particularly efficient for practical applications.

Q: What are the recommended use cases?

The model excels in semantic similarity tasks, document retrieval, clustering, and classification applications. It's particularly well-suited for scenarios where instruction-free embedding generation is desired.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.