deit_base_patch16_224.fb_in1k

Maintained By
timm

DeiT Base Patch16 224

PropertyValue
Parameter Count86.6M
LicenseApache 2.0
PaperTraining data-efficient image transformers & distillation through attention
Image Size224x224
GMACs17.6

What is deit_base_patch16_224.fb_in1k?

DeiT (Data-efficient image Transformers) is a vision transformer model trained on ImageNet-1k that achieves strong performance while being more data-efficient than traditional vision transformers. This particular variant uses 16x16 pixel patches and processes 224x224 images.

Implementation Details

The model implements a vision transformer architecture with attention-based distillation, allowing it to learn efficiently from limited data. It processes images by splitting them into 16x16 patches, embedding them, and passing them through transformer layers.

  • 86.6M total parameters
  • 17.6 GMACs computational complexity
  • 23.9M activations
  • Supports both classification and feature extraction

Core Capabilities

  • Image classification on ImageNet-1k classes
  • Feature extraction for downstream tasks
  • Efficient training through attention-based knowledge distillation
  • Support for both F32 tensor operations

Frequently Asked Questions

Q: What makes this model unique?

DeiT's uniqueness lies in its data-efficient training approach using attention-based distillation, allowing it to achieve strong performance with less training data than traditional vision transformers.

Q: What are the recommended use cases?

This model is ideal for image classification tasks, particularly when working with ImageNet-like datasets. It can also be used as a feature extractor for transfer learning applications, with the ability to output embeddings by removing the classification head.

The first platform built for prompt engineering