vit-mae-large

Property	Value
Author	Facebook
License	Apache 2.0
Paper	Masked Autoencoders Are Scalable Vision Learners
Framework	PyTorch, TensorFlow

What is vit-mae-large?

vit-mae-large is a large-scale Vision Transformer model pre-trained using the Masked Autoencoder (MAE) approach. This model represents Facebook's implementation of a self-supervised learning method that masks and reconstructs large portions of input images. The model processes images as sequences of fixed-size patches and has been trained on the ImageNet-1K dataset.

Implementation Details

The model employs a BERT-like transformer encoder architecture with a unique pretraining strategy. During training, it masks 75% of image patches randomly, processes the visible patches through the encoder, and then reconstructs the masked portions using a decoder with learnable mask tokens. This high masking ratio is a key innovation that forces the model to develop robust visual representations.

Transformer-based encoder-decoder architecture
75% masking ratio during pretraining
Learnable shared mask tokens
Reconstruction of raw pixel values

Core Capabilities

Image classification tasks
Feature extraction for downstream vision tasks
Self-supervised visual representation learning
Efficient processing of high-resolution images

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its unusually high masking ratio (75%) during pretraining, which is significantly higher than previous approaches. This aggressive masking strategy, combined with the large model size, enables more efficient and effective self-supervised learning.

Q: What are the recommended use cases?

The model is particularly well-suited for image classification tasks and can be fine-tuned for specific downstream vision tasks. It's especially valuable when working with large-scale image datasets where labeled data is limited, as it leverages self-supervised learning principles.

vit-mae-large

vit-mae-large

What is vit-mae-large?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models