Cosmos-Tokenizer-CI8x8

Maintained By
nvidia

Cosmos-Tokenizer-CI8x8

PropertyValue
DeveloperNVIDIA
LicenseNVIDIA Open Model License
Parameters77M
Compression Ratio8x8
Processing Time62.7ms per 1024x1024 image

What is Cosmos-Tokenizer-CI8x8?

Cosmos-Tokenizer-CI8x8 is part of NVIDIA's suite of visual tokenizers designed for high-quality image compression. This continuous image tokenizer achieves an 8x8 spatial compression while maintaining exceptional reconstruction quality, outperforming state-of-the-art alternatives in both speed and fidelity.

Implementation Details

The model employs a symmetrical encoder-decoder architecture with a 2-level Haar wavelet transform layer for efficient down-sampling. It operates in BF16 precision on NVIDIA Ampere and Hopper GPUs, processing images with resolutions from 256px up to 4K.

  • Lightweight and computationally efficient architecture
  • Supports both PyTorch and NeMo frameworks
  • Achieves PSNR of 32.98 and SSIM of 0.836 on MS-COCO dataset
  • 12x faster than comparable models

Core Capabilities

  • High-quality image compression and reconstruction
  • Fast processing speed (62.7ms per 1024x1024 image)
  • Flexible resolution support (256px to 4K)
  • Compatible with diffusion-based and autoregressive models

Frequently Asked Questions

Q: What makes this model unique?

The model's combination of high compression quality, fast processing speed, and efficient architecture sets it apart. It achieves 8x more compression than SOTA methods while maintaining higher image quality.

Q: What are the recommended use cases?

The model is ideal for image generation pipelines, particularly in diffusion models like Stable Diffusion, where high-quality image tokenization is crucial for downstream tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.