CogVideoX-2b

Maintained By
THUDM

CogVideoX-2b

PropertyValue
LicenseApache 2.0
PaperarXiv:2408.06072
FrameworkDiffusers
TaskText-to-Video Generation

What is CogVideoX-2b?

CogVideoX-2b is an entry-level text-to-video generation model designed for efficient video creation with minimal computational requirements. It represents the lightweight version of the CogVideoX family, capable of generating 6-second videos at 720x480 resolution with 8 frames per second.

Implementation Details

The model utilizes FP16 precision and features remarkable VRAM optimization, requiring as little as 4GB when using diffusers with optimizations enabled. It employs 3d_sincos_pos_embed positional encoding and supports various precision formats including FP16, BF16, FP32, and INT8.

  • Inference speed: ~90 seconds on A100, ~45 seconds on H100 (50 steps)
  • VRAM usage: 18GB with SAT, 4GB with diffusers (FP16)
  • Supports English prompts up to 226 tokens
  • Compatible with PytorchAO and Optimum-quanto for quantization

Core Capabilities

  • High-quality video generation from text descriptions
  • Efficient memory management with multiple optimization options
  • Support for various precision formats and quantization methods
  • Multi-GPU inference support
  • Fine-tuning capabilities with LORA and SFT options

Frequently Asked Questions

Q: What makes this model unique?

CogVideoX-2b stands out for its efficient balance between performance and resource requirements, making it accessible for users with limited computational resources while maintaining good video generation quality.

Q: What are the recommended use cases?

The model is ideal for standard text-to-video generation tasks, particularly suited for development and testing environments, content creation, and scenarios where computational resources are limited but quality video generation is still required.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.