EVA Large Patch14 196
Property | Value |
---|---|
Parameter Count | 304M |
Model Type | Image Classification |
Architecture | Vision Transformer |
License | MIT |
Paper | EVA Paper |
Image Size | 196 x 196 |
What is eva_large_patch14_196.in22k_ft_in22k_in1k?
This is a large-scale vision transformer model that belongs to the EVA (Exploring the Limits of Masked Visual Representation Learning at Scale) family. It was initially pretrained on ImageNet-22k using masked image modeling with EVA-CLIP as a teacher, then fine-tuned on ImageNet-22k and finally on ImageNet-1k. The model achieves an impressive 88.592% top-1 accuracy on ImageNet-1k validation.
Implementation Details
The model features a patch size of 14x14 pixels and operates on 196x196 resolution images. With 304.1M parameters and 61.6 GMACs, it provides a strong balance between computational efficiency and performance. The model uses F32 tensor type for maximum compatibility.
- Utilizes masked visual representation learning at scale
- Implements a hierarchical vision transformer architecture
- Supports both classification and feature extraction workflows
Core Capabilities
- Image classification with 1000 classes
- Feature extraction for downstream tasks
- High-quality visual representations with 88.592% top-1 accuracy
- Flexible input handling through timm's transform pipeline
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture that maintains high performance at a relatively small input resolution (196x196), making it practical for real-world applications while achieving competitive accuracy.
Q: What are the recommended use cases?
The model is ideal for image classification tasks, feature extraction for downstream applications, and as a backbone for transfer learning. It's particularly suitable for applications requiring a good balance between accuracy and computational efficiency.