videomae-large

Maintained By
MCG-NJU

VideoMAE Large

PropertyValue
Parameter Count343M
LicenseCC-BY-NC-4.0
PaperVideoMAE Paper
FrameworkPyTorch

What is videomae-large?

VideoMAE-large is an advanced self-supervised learning model designed for video understanding tasks. It extends the Masked Autoencoder (MAE) approach to video processing, utilizing a large-scale architecture with 343M parameters. Pre-trained on the Kinetics-400 dataset for 1600 epochs, it represents a significant advancement in video representation learning.

Implementation Details

The model processes videos as sequences of 16x16 fixed-size patches, incorporating a Vision Transformer (ViT) architecture with additional decoder capabilities. It utilizes a [CLS] token for classification tasks and employs sinus/cosinus position embeddings.

  • Large-scale architecture with 343M parameters
  • Self-supervised pre-training on Kinetics-400
  • 16x16 patch-based video processing
  • Transformer-based encoding with specialized decoder

Core Capabilities

  • Masked video patch prediction
  • Feature extraction for downstream tasks
  • Video representation learning
  • Transfer learning potential for various video tasks

Frequently Asked Questions

Q: What makes this model unique?

VideoMAE-large stands out for its self-supervised learning approach that doesn't require labeled data for pre-training, making it highly efficient for video understanding tasks. Its large parameter count and specialized architecture enable robust feature learning from masked video content.

Q: What are the recommended use cases?

The model is primarily designed for video understanding tasks and can be fine-tuned for specific applications like action recognition, video classification, and feature extraction. It's particularly useful when working with large video datasets that require sophisticated feature learning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.