paligemma-siglip-so400m-patch14-448

Maintained By
Efficient-Large-Model

paligemma-siglip-so400m-patch14-448

PropertyValue
ArchitectureSoViT-400m
Training DataWebLI Dataset
Resolution448x448 pixels
PaperSigmoid Loss for Language Image Pre-Training
Training Infrastructure16 TPU-v4 chips (3 days)

What is paligemma-siglip-so400m-patch14-448?

This model is an implementation of SigLIP (Sigmoid Loss for Language Image Pre-Training) with shape optimization. It represents a significant advancement over traditional CLIP models by introducing a more efficient sigmoid loss function for image-text pair processing. The model uses the SoViT-400m architecture, specifically designed for compute-optimal performance as detailed in the research on getting ViT in shape.

Implementation Details

The model processes images at 448x448 resolution with RGB normalization (mean: 0.5, std: 0.5). Text inputs are tokenized and padded to 64 tokens. The implementation leverages a sigmoid loss function that operates directly on image-text pairs, eliminating the need for global similarity normalization.

  • Shape-optimized architecture (SoViT-400m)
  • Efficient sigmoid loss function
  • Pre-trained on WebLI dataset
  • Patch size of 14x14 pixels

Core Capabilities

  • Zero-shot image classification
  • Image-text retrieval
  • Efficient batch processing
  • Superior performance at both small and large batch sizes

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its sigmoid loss function implementation and shape-optimized architecture, which enables better scaling and performance compared to traditional CLIP models, while maintaining efficiency at various batch sizes.

Q: What are the recommended use cases?

The model excels in zero-shot image classification and image-text retrieval tasks. It's particularly suitable for applications requiring efficient processing of image-text pairs without the need for task-specific fine-tuning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.