EVA Large Patch14 196

Property	Value
Parameter Count	304M
Model Type	Image Classification
Architecture	Vision Transformer
License	MIT
Paper	EVA Paper
Image Size	196 x 196

What is eva_large_patch14_196.in22k_ft_in22k_in1k?

This is a large-scale vision transformer model that belongs to the EVA (Exploring the Limits of Masked Visual Representation Learning at Scale) family. It was initially pretrained on ImageNet-22k using masked image modeling with EVA-CLIP as a teacher, then fine-tuned on ImageNet-22k and finally on ImageNet-1k. The model achieves an impressive 88.592% top-1 accuracy on ImageNet-1k validation.

Implementation Details

The model features a patch size of 14x14 pixels and operates on 196x196 resolution images. With 304.1M parameters and 61.6 GMACs, it provides a strong balance between computational efficiency and performance. The model uses F32 tensor type for maximum compatibility.

Utilizes masked visual representation learning at scale
Implements a hierarchical vision transformer architecture
Supports both classification and feature extraction workflows

Core Capabilities

Image classification with 1000 classes
Feature extraction for downstream tasks
High-quality visual representations with 88.592% top-1 accuracy
Flexible input handling through timm's transform pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture that maintains high performance at a relatively small input resolution (196x196), making it practical for real-world applications while achieving competitive accuracy.

Q: What are the recommended use cases?

The model is ideal for image classification tasks, feature extraction for downstream applications, and as a backbone for transfer learning. It's particularly suitable for applications requiring a good balance between accuracy and computational efficiency.