eva02_large_patch14_448.mim_m38m_ft_in22k_in1k

Maintained By
timm

EVA02 Large Patch14 448

PropertyValue
Parameter Count305.1M
Model TypeVision Transformer
LicenseMIT
Image Size448 x 448
Top-1 Accuracy90.054%
PaperEVA-02: A Visual Representation for Neon Genesis

What is eva02_large_patch14_448.mim_m38m_ft_in22k_in1k?

This is an advanced vision transformer model from the EVA02 family, designed for high-performance image classification tasks. Pre-trained on the Merged-38M dataset (which includes ImageNet-22k, CC12M, CC3M, COCO, ADE20K, Object365, and OpenImages), it implements masked image modeling using EVA-CLIP as a teacher model.

Implementation Details

The model represents a significant advancement in vision transformer architecture, incorporating several key innovations:

  • Mean pooling for improved feature aggregation
  • SwiGLU activation functions for enhanced learning capability
  • Rotary Position Embeddings (ROPE) for better spatial understanding
  • Additional Layer Normalization in MLP blocks
  • Patch size of 14x14 pixels
  • 448x448 input resolution

Core Capabilities

  • State-of-the-art image classification performance (90.054% top-1 accuracy)
  • Robust feature extraction for downstream tasks
  • Efficient processing of high-resolution images
  • Strong transfer learning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its comprehensive pre-training on the Merged-38M dataset and its advanced architectural features like ROPE and SwiGLU. It achieves exceptional accuracy while maintaining reasonable computational requirements for its size class.

Q: What are the recommended use cases?

The model is ideal for high-precision image classification tasks, feature extraction for downstream applications, and transfer learning scenarios where high accuracy is crucial. It's particularly well-suited for applications requiring detailed image analysis at higher resolutions.

The first platform built for prompt engineering