CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K-augreg

Maintained By
laion

CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K-augreg

PropertyValue
LicenseMIT
Training DatasetLAION Aesthetic (~900M samples)
Resolution320x320
Zero-shot ImageNet Accuracy71.3%
PaperConvNeXt Paper

What is CLIP-convnext_base_w_320-laion_aesthetic-s13B-b82K-augreg?

This is an advanced CLIP model that combines ConvNeXt-Base architecture with enhanced training techniques for improved zero-shot image classification. Trained on the LAION Aesthetic dataset, it represents a significant advancement in combining convolutional architectures with contrastive language-image pre-training.

Implementation Details

The model utilizes the ConvNeXt-Base architecture as its image tower, coupled with a text tower similar to RN50x4 from OpenAI CLIP. It's specifically optimized for 320x320 resolution and incorporates advanced augmentation and regularization techniques including Random Resize Crop, Random Erasing (0.35 probability), and Stochastic Depth (0.1 probability).

  • Trained on 13B samples from LAION Aesthetic dataset
  • Uses augmented regularization for improved generalization
  • Achieves 71.3% zero-shot accuracy on ImageNet
  • Optimized for higher resolution (320x320) input

Core Capabilities

  • Zero-shot image classification
  • Image and text retrieval
  • Transfer learning for downstream tasks
  • Robust performance across different image resolutions
  • Improved generalization through augmented regularization

Frequently Asked Questions

Q: What makes this model unique?

This model is one of the first ConvNeXt CLIP models trained at scale, offering competitive performance with ViT-B/16 while potentially being more sample efficient. It particularly excels when evaluated at higher resolutions than its training resolution.

Q: What are the recommended use cases?

The model is best suited for research purposes in zero-shot image classification, image-text retrieval, and as a foundation for transfer learning tasks. However, it's not recommended for deployment in production systems without thorough testing and evaluation.

The first platform built for prompt engineering