eva_large_patch14_196.in22k_ft_in22k_in1k

Maintained By
timm

EVA Large Patch14 196

PropertyValue
Parameter Count304M
Model TypeImage Classification
ArchitectureVision Transformer
LicenseMIT
PaperEVA Paper
Image Size196 x 196

What is eva_large_patch14_196.in22k_ft_in22k_in1k?

This is a large-scale vision transformer model that belongs to the EVA (Exploring the Limits of Masked Visual Representation Learning at Scale) family. It was initially pretrained on ImageNet-22k using masked image modeling with EVA-CLIP as a teacher, then fine-tuned on ImageNet-22k and finally on ImageNet-1k. The model achieves an impressive 88.592% top-1 accuracy on ImageNet-1k validation.

Implementation Details

The model features a patch size of 14x14 pixels and operates on 196x196 resolution images. With 304.1M parameters and 61.6 GMACs, it provides a strong balance between computational efficiency and performance. The model uses F32 tensor type for maximum compatibility.

  • Utilizes masked visual representation learning at scale
  • Implements a hierarchical vision transformer architecture
  • Supports both classification and feature extraction workflows

Core Capabilities

  • Image classification with 1000 classes
  • Feature extraction for downstream tasks
  • High-quality visual representations with 88.592% top-1 accuracy
  • Flexible input handling through timm's transform pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture that maintains high performance at a relatively small input resolution (196x196), making it practical for real-world applications while achieving competitive accuracy.

Q: What are the recommended use cases?

The model is ideal for image classification tasks, feature extraction for downstream applications, and as a backbone for transfer learning. It's particularly suitable for applications requiring a good balance between accuracy and computational efficiency.

The first platform built for prompt engineering