ColPali-v1.3

Property	Value
License	MIT
Base Model	PaliGemma-3B
Paper	ColPali: Efficient Document Retrieval with Vision Language Models
Training Data	127,460 query-page pairs

What is colpali-v1.3?

ColPali-v1.3 is an advanced visual retrieval model that combines PaliGemma-3B with ColBERT strategy for efficient document indexing. This version represents a significant improvement over previous iterations, trained with 256 batch size for 3 epochs and implementing right padding for queries to address token encoding issues.

Implementation Details

The model is built on a foundation of SigLIP and PaliGemma-3B, utilizing LoRA adapters with alpha=32 and r=32. It's trained using paged_adamw_8bit optimizer with a learning rate of 5e-5 and linear decay with 2.5% warmup steps.

Trained in bfloat16 format
Uses 8 GPU setup with data parallelism
Implements ColBERT-style multi-vector representations
Supports both text and image inputs

Core Capabilities

Efficient document indexing from visual features
Multi-vector representations of text and images
Zero-shot generalization to non-English languages
Optimized for PDF-type documents

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines Vision Language Models with ColBERT strategy, enabling efficient document retrieval through multi-vector representations. It processes both text and image inputs through a unified architecture.

Q: What are the recommended use cases?

The model excels in document retrieval tasks, particularly with PDF documents. It's especially useful for applications requiring visual-textual matching and efficient document indexing.

colpali-v1.3