colpali-v1.3

Maintained By
vidore

ColPali-v1.3

PropertyValue
LicenseMIT
Base ModelPaliGemma-3B
PaperColPali: Efficient Document Retrieval with Vision Language Models
Training Data127,460 query-page pairs

What is colpali-v1.3?

ColPali-v1.3 is an advanced visual retrieval model that combines PaliGemma-3B with ColBERT strategy for efficient document indexing. This version represents a significant improvement over previous iterations, trained with 256 batch size for 3 epochs and implementing right padding for queries to address token encoding issues.

Implementation Details

The model is built on a foundation of SigLIP and PaliGemma-3B, utilizing LoRA adapters with alpha=32 and r=32. It's trained using paged_adamw_8bit optimizer with a learning rate of 5e-5 and linear decay with 2.5% warmup steps.

  • Trained in bfloat16 format
  • Uses 8 GPU setup with data parallelism
  • Implements ColBERT-style multi-vector representations
  • Supports both text and image inputs

Core Capabilities

  • Efficient document indexing from visual features
  • Multi-vector representations of text and images
  • Zero-shot generalization to non-English languages
  • Optimized for PDF-type documents

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines Vision Language Models with ColBERT strategy, enabling efficient document retrieval through multi-vector representations. It processes both text and image inputs through a unified architecture.

Q: What are the recommended use cases?

The model excels in document retrieval tasks, particularly with PDF documents. It's especially useful for applications requiring visual-textual matching and efficient document indexing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.