CogVideoX-5b-I2V

Maintained By
THUDM

CogVideoX-5b-I2V

PropertyValue
Model Size5B parameters
LicenseCustom CogVideoX License
PaperarXiv:2408.06072
AuthorTHUDM
Video Resolution720 x 480

What is CogVideoX-5b-I2V?

CogVideoX-5b-I2V is an advanced image-to-video generation model that transforms still images into dynamic video content using state-of-the-art diffusion techniques. Built on a 5B parameter architecture, it specializes in creating 6-second videos at 8 frames per second while maintaining high visual quality and content coherence.

Implementation Details

The model operates with BF16 precision (recommended) and can run on modern NVIDIA GPUs with as little as 5GB of VRAM when using optimizations. It employs sophisticated 3D positional embeddings combining ROPE and learnable embeddings, supporting English text prompts up to 226 tokens.

  • Supports multiple precision formats including BF16, FP16, FP32, and INT8 quantization
  • Memory-efficient with diffusers optimization: from 5GB VRAM
  • Inference speed: ~180 seconds on A100, ~90 seconds on H100
  • Integrated VAE tiling and slicing for memory optimization

Core Capabilities

  • Image-to-video generation with text guidance
  • 6-second video output at 8fps
  • 720x480 resolution output
  • Support for advanced quantization via PytorchAO
  • Multi-GPU inference support
  • Fine-tuning capabilities with LORA and SFT options

Frequently Asked Questions

Q: What makes this model unique?

The model combines advanced 3D positional embeddings with image-to-video capabilities, offering high-quality video generation while maintaining relatively modest hardware requirements through various optimization techniques.

Q: What are the recommended use cases?

This model is ideal for creating short animated sequences from still images, content creation, prototyping video concepts, and research applications in computer vision and AI-driven video generation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.