ConvNeXt V2 Base Model
Property | Value |
---|---|
Parameter Count | 88.7M |
License | CC-BY-NC-4.0 |
Architecture | ConvNeXt V2 |
Training Data | ImageNet-22k, ImageNet-1k |
Paper | ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders |
What is convnextv2_base.fcmae_ft_in22k_in1k?
This is a state-of-the-art convolutional neural network model that represents the second generation of the ConvNeXt architecture. It was pretrained using a fully convolutional masked autoencoder framework (FCMAE) and subsequently fine-tuned on ImageNet-22k and ImageNet-1k datasets. With 88.7M parameters, it achieves an impressive top-1 accuracy of 86.74% on ImageNet-1k validation.
Implementation Details
The model operates with an input image size of 224x224 pixels during training and 288x288 for testing, utilizing 15.4 GMACs (Giga Multiply-Accumulate Operations) and generating 28.8M activations. It's implemented using PyTorch and is available through the TIMM library.
- Efficient architecture combining traditional CNN strengths with modern design principles
- Pretrained using masked autoencoder approach for better feature learning
- Optimized for both performance and computational efficiency
Core Capabilities
- Image classification with 1000 classes
- Feature extraction for downstream tasks
- Generation of image embeddings
- Support for both inference and transfer learning
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines FCMAE pretraining with dual fine-tuning on ImageNet-22k and ImageNet-1k, offering an excellent balance between accuracy and computational efficiency. It represents a significant improvement over the original ConvNeXt architecture.
Q: What are the recommended use cases?
The model excels in image classification tasks, feature extraction for downstream applications, and generating image embeddings. It's particularly suitable for applications requiring high accuracy while maintaining reasonable computational requirements.