vae-kl-f8-d16

Maintained By
ostris

Ostris VAE - KL-f8-d16

PropertyValue
Parameter Count57,266,643
LicenseMIT
LibraryDiffusers
PSNR Score31.166
LPIPS Score0.0198

What is vae-kl-f8-d16?

vae-kl-f8-d16 is a lightweight variational autoencoder (VAE) designed for efficient image processing. It features 16 channels with an 8x downsample, trained from scratch on a diverse dataset including photos, artistic works, text, cartoons, and vector images. This model achieves performance metrics comparable to larger VAEs while using significantly fewer parameters.

Implementation Details

The model employs a streamlined architecture with 57.2M parameters, considerably less than standard VAEs like SD3 (83.8M parameters). Despite its smaller size, it achieves impressive metrics with a PSNR of 31.166 and LPIPS of 0.0198, nearly matching SD3's performance.

  • Optimized 16-channel architecture
  • 8x downsampling capability
  • Compatible with Diffusers library
  • Requires adapter implementation for SD1.5 usage

Core Capabilities

  • Efficient VRAM usage due to lightweight design
  • High-quality image reconstruction comparable to larger models
  • Versatile application across various image types
  • Full compatibility with diffusers pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for achieving near SD3-level performance metrics while using 31% fewer parameters, resulting in faster processing and lower VRAM usage. Its MIT license also allows unrestricted usage.

Q: What are the recommended use cases?

The model is primarily intended for developers looking to implement a lightweight VAE in their projects. It requires training into a network before practical use and is particularly suitable for SD 1.5, SDXL, and potentially pixart implementations.

The first platform built for prompt engineering