Cosmos-Tokenizer-CI8x8
Property | Value |
---|---|
Developer | NVIDIA |
License | NVIDIA Open Model License |
Parameters | 77M |
Compression Ratio | 8x8 |
Processing Time | 62.7ms per 1024x1024 image |
What is Cosmos-Tokenizer-CI8x8?
Cosmos-Tokenizer-CI8x8 is part of NVIDIA's suite of visual tokenizers designed for high-quality image compression. This continuous image tokenizer achieves an 8x8 spatial compression while maintaining exceptional reconstruction quality, outperforming state-of-the-art alternatives in both speed and fidelity.
Implementation Details
The model employs a symmetrical encoder-decoder architecture with a 2-level Haar wavelet transform layer for efficient down-sampling. It operates in BF16 precision on NVIDIA Ampere and Hopper GPUs, processing images with resolutions from 256px up to 4K.
- Lightweight and computationally efficient architecture
- Supports both PyTorch and NeMo frameworks
- Achieves PSNR of 32.98 and SSIM of 0.836 on MS-COCO dataset
- 12x faster than comparable models
Core Capabilities
- High-quality image compression and reconstruction
- Fast processing speed (62.7ms per 1024x1024 image)
- Flexible resolution support (256px to 4K)
- Compatible with diffusion-based and autoregressive models
Frequently Asked Questions
Q: What makes this model unique?
The model's combination of high compression quality, fast processing speed, and efficient architecture sets it apart. It achieves 8x more compression than SOTA methods while maintaining higher image quality.
Q: What are the recommended use cases?
The model is ideal for image generation pipelines, particularly in diffusion models like Stable Diffusion, where high-quality image tokenization is crucial for downstream tasks.