EVA02 Small Patch14 336
Property | Value |
---|---|
Parameter Count | 22.1M |
Image Size | 336x336 |
License | MIT |
Paper | EVA-02: A Visual Representation for Neon Genesis |
Top-1 Accuracy | 85.74% |
What is eva02_small_patch14_336.mim_in22k_ft_in1k?
This is a small-scale variant of the EVA02 vision transformer architecture, designed for efficient image classification and feature extraction. It was first pre-trained on ImageNet-22k using masked image modeling with EVA-CLIP as a teacher, then fine-tuned on ImageNet-1k for optimal performance.
Implementation Details
The model implements several key architectural innovations including mean pooling, SwiGLU activation functions, and Rotary Position Embeddings (ROPE). It processes images at 336x336 resolution using 14x14 patches, achieving 15.5 GMACs with 54.3M activations.
- Pre-trained on ImageNet-22k dataset
- Fine-tuned on ImageNet-1k
- Uses mean pooling for feature aggregation
- Implements SwiGLU activation
- Incorporates Rotary Position Embeddings
Core Capabilities
- Image Classification
- Feature Extraction
- Transfer Learning
- Visual Representation Learning
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture that balances performance and size, achieving 85.74% top-1 accuracy on ImageNet-1k while maintaining a relatively small parameter count of 22.1M. It incorporates modern architectural improvements like ROPE and SwiGLU, making it particularly effective for both classification and feature extraction tasks.
Q: What are the recommended use cases?
The model is well-suited for image classification tasks, particularly when working with high-resolution images (336x336). It's also effective for feature extraction in transfer learning scenarios, making it valuable for downstream computer vision tasks where pre-trained visual representations are needed.