ColPali-v1.3
Property | Value |
---|---|
License | MIT |
Base Model | PaliGemma-3B |
Paper | ColPali: Efficient Document Retrieval with Vision Language Models |
Training Data | 127,460 query-page pairs |
What is colpali-v1.3?
ColPali-v1.3 is an advanced visual retrieval model that combines PaliGemma-3B with ColBERT strategy for efficient document indexing. This version represents a significant improvement over previous iterations, trained with 256 batch size for 3 epochs and implementing right padding for queries to address token encoding issues.
Implementation Details
The model is built on a foundation of SigLIP and PaliGemma-3B, utilizing LoRA adapters with alpha=32 and r=32. It's trained using paged_adamw_8bit optimizer with a learning rate of 5e-5 and linear decay with 2.5% warmup steps.
- Trained in bfloat16 format
- Uses 8 GPU setup with data parallelism
- Implements ColBERT-style multi-vector representations
- Supports both text and image inputs
Core Capabilities
- Efficient document indexing from visual features
- Multi-vector representations of text and images
- Zero-shot generalization to non-English languages
- Optimized for PDF-type documents
Frequently Asked Questions
Q: What makes this model unique?
The model uniquely combines Vision Language Models with ColBERT strategy, enabling efficient document retrieval through multi-vector representations. It processes both text and image inputs through a unified architecture.
Q: What are the recommended use cases?
The model excels in document retrieval tasks, particularly with PDF documents. It's especially useful for applications requiring visual-textual matching and efficient document indexing.