vit-mae-large

Maintained By
facebook

vit-mae-large

PropertyValue
AuthorFacebook
LicenseApache 2.0
PaperMasked Autoencoders Are Scalable Vision Learners
FrameworkPyTorch, TensorFlow

What is vit-mae-large?

vit-mae-large is a large-scale Vision Transformer model pre-trained using the Masked Autoencoder (MAE) approach. This model represents Facebook's implementation of a self-supervised learning method that masks and reconstructs large portions of input images. The model processes images as sequences of fixed-size patches and has been trained on the ImageNet-1K dataset.

Implementation Details

The model employs a BERT-like transformer encoder architecture with a unique pretraining strategy. During training, it masks 75% of image patches randomly, processes the visible patches through the encoder, and then reconstructs the masked portions using a decoder with learnable mask tokens. This high masking ratio is a key innovation that forces the model to develop robust visual representations.

  • Transformer-based encoder-decoder architecture
  • 75% masking ratio during pretraining
  • Learnable shared mask tokens
  • Reconstruction of raw pixel values

Core Capabilities

  • Image classification tasks
  • Feature extraction for downstream vision tasks
  • Self-supervised visual representation learning
  • Efficient processing of high-resolution images

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its unusually high masking ratio (75%) during pretraining, which is significantly higher than previous approaches. This aggressive masking strategy, combined with the large model size, enables more efficient and effective self-supervised learning.

Q: What are the recommended use cases?

The model is particularly well-suited for image classification tasks and can be fine-tuned for specific downstream vision tasks. It's especially valuable when working with large-scale image datasets where labeled data is limited, as it leverages self-supervised learning principles.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.