ConvNeXt V2 Base Model

Property	Value
Parameter Count	88.7M
License	CC-BY-NC-4.0
Architecture	ConvNeXt V2
Training Data	ImageNet-22k, ImageNet-1k
Paper	ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

What is convnextv2_base.fcmae_ft_in22k_in1k?

This is a state-of-the-art convolutional neural network model that represents the second generation of the ConvNeXt architecture. It was pretrained using a fully convolutional masked autoencoder framework (FCMAE) and subsequently fine-tuned on ImageNet-22k and ImageNet-1k datasets. With 88.7M parameters, it achieves an impressive top-1 accuracy of 86.74% on ImageNet-1k validation.

Implementation Details

The model operates with an input image size of 224x224 pixels during training and 288x288 for testing, utilizing 15.4 GMACs (Giga Multiply-Accumulate Operations) and generating 28.8M activations. It's implemented using PyTorch and is available through the TIMM library.

Efficient architecture combining traditional CNN strengths with modern design principles
Pretrained using masked autoencoder approach for better feature learning
Optimized for both performance and computational efficiency

Core Capabilities

Image classification with 1000 classes
Feature extraction for downstream tasks
Generation of image embeddings
Support for both inference and transfer learning

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines FCMAE pretraining with dual fine-tuning on ImageNet-22k and ImageNet-1k, offering an excellent balance between accuracy and computational efficiency. It represents a significant improvement over the original ConvNeXt architecture.

Q: What are the recommended use cases?

The model excels in image classification tasks, feature extraction for downstream applications, and generating image embeddings. It's particularly suitable for applications requiring high accuracy while maintaining reasonable computational requirements.

convnextv2_base.fcmae_ft_in22k_in1k