GIST-Embedding-v0

Maintained By
avsolatorio

GIST-Embedding-v0

PropertyValue
Model Size109M parameters
Base ModelBAAI/bge-base-en-v1.5
LicenseMIT
PaperGISTEmbed Paper
Training DataMEDI dataset + MTEB Classification

What is GIST-Embedding-v0?

GIST-Embedding-v0 is a specialized text embedding model developed using a novel approach called Guided In-sample Selection of Training Negatives (GIST). Built on top of the BGE-base-en-v1.5 architecture, this model has been fine-tuned using a combination of the MEDI dataset and carefully selected triplets from MTEB Classification training data. A key advantage is its ability to generate high-quality embeddings without requiring specific instructions or prompts.

Implementation Details

The model was trained with specific parameters including 80 epochs, a warmup ratio of 0.1, and a learning rate of 5e-6. It employs a contrastive loss temperature of 0.01 and uses batch sizes of 32. The training process involved checkpoint steps at 103,500 iterations.

  • No instruction requirement for embedding generation
  • Built on proven BERT architecture
  • Optimized for semantic search and similarity tasks
  • Trained on diverse classification datasets

Core Capabilities

  • Text similarity computation
  • Semantic search implementation
  • Document classification
  • Cross-lingual text matching

Frequently Asked Questions

Q: What makes this model unique?

The model's unique feature is its ability to generate high-quality embeddings without requiring instructions, while utilizing a novel guided negative selection approach during training. This makes it particularly efficient for production deployments.

Q: What are the recommended use cases?

The model excels in semantic search, document similarity matching, and classification tasks. It's particularly well-suited for applications requiring efficient text embedding without the overhead of instruction engineering.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.