ColPali v1.2
Property | Value |
---|---|
License | MIT |
Base Model | PaliGemma-3B |
Paper | ColPali: Efficient Document Retrieval with Vision Language Models |
Primary Language | English |
What is colpali-v1.2?
ColPali v1.2 is an advanced visual document retrieval model that combines the power of PaliGemma-3B with ColBERT strategy for efficient document indexing. This version introduces significant improvements over its predecessor, including right padding for queries and deterministic projection layer initialization. The model processes both text and images, generating multi-vector representations for enhanced retrieval accuracy.
Implementation Details
The model is built on a sophisticated architecture that integrates SigLIP's visual capabilities with PaliGemma-3B's language understanding. It utilizes LoRA adapters with alpha=32 and r=32 on transformer layers, and implements an 8-bit optimizer for efficient training. The training process spans 5 epochs with enhanced warmup steps to prevent non-English language collapse.
- Trained on 127,460 query-page pairs
- Uses bfloat16 format for computation
- Implements data parallelism across 8 GPUs
- Features a learning rate of 5e-5 with linear decay
Core Capabilities
- Efficient document indexing from visual features
- Multi-vector representations of text and images
- Zero-shot generalization potential to non-English languages
- Compatible with colpali-engine>=0.2.0
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its ability to map image patch embeddings through a language model, creating a unified latent space for both text and visual content. This enables superior document retrieval performance through the ColBERT interaction mechanism.
Q: What are the recommended use cases?
The model excels in PDF document retrieval tasks, particularly in academic and professional contexts where precise document matching is crucial. It's especially effective for English-language content but shows potential for cross-lingual applications.