Cosmos-Tokenizer-CI8x8

Property	Value
Developer	NVIDIA
License	NVIDIA Open Model License
Parameters	77M
Compression Ratio	8x8
Processing Time	62.7ms per 1024x1024 image

What is Cosmos-Tokenizer-CI8x8?

Cosmos-Tokenizer-CI8x8 is part of NVIDIA's suite of visual tokenizers designed for high-quality image compression. This continuous image tokenizer achieves an 8x8 spatial compression while maintaining exceptional reconstruction quality, outperforming state-of-the-art alternatives in both speed and fidelity.

Implementation Details

The model employs a symmetrical encoder-decoder architecture with a 2-level Haar wavelet transform layer for efficient down-sampling. It operates in BF16 precision on NVIDIA Ampere and Hopper GPUs, processing images with resolutions from 256px up to 4K.

Lightweight and computationally efficient architecture
Supports both PyTorch and NeMo frameworks
Achieves PSNR of 32.98 and SSIM of 0.836 on MS-COCO dataset
12x faster than comparable models

Core Capabilities

High-quality image compression and reconstruction
Fast processing speed (62.7ms per 1024x1024 image)
Flexible resolution support (256px to 4K)
Compatible with diffusion-based and autoregressive models

Frequently Asked Questions

Q: What makes this model unique?

The model's combination of high compression quality, fast processing speed, and efficient architecture sets it apart. It achieves 8x more compression than SOTA methods while maintaining higher image quality.

Q: What are the recommended use cases?

The model is ideal for image generation pipelines, particularly in diffusion models like Stable Diffusion, where high-quality image tokenization is crucial for downstream tasks.