Mitsua Diffusion One

Property	Value
License	Mitsua Open RAIL-M
Model Type	Text-to-Image Diffusion
Training Data	11M Public Domain/CC0 Images
Architecture	Latent Diffusion with OpenCLIP ViT-H/14

What is mitsua-diffusion-one?

Mitsua Diffusion One is a groundbreaking text-to-image diffusion model trained entirely from scratch using only public domain and properly licensed content. It serves as the successor to Mitsua Diffusion CC0 and is specifically designed to power AI VTuber Elan Mitsua's creative activities.

Implementation Details

The model employs a sophisticated training pipeline with progressive resolution scaling from 256x256 to higher resolutions (512x512, 768x512, 512x768). It incorporates the Diffusion With Offset Noise technique in its final training stages, applied to the last 12k steps with p=0.02. The model uses OpenCLIP ViT-H/14 as its text encoder, which is released under the MIT License.

Trained on approximately 11M images with data augmentation
Uses multiple aspect ratio training for versatility
Implements Diffusion With Offset Noise for enhanced quality
Available in both fine-tuned and base versions

Core Capabilities

Text-to-image generation with ethical data compliance
Multi-resolution output support
Specialized for artistic and creative applications
Full compatibility with the diffusers pipeline

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its scratch-training approach using only public domain and properly licensed content, ensuring ethical AI development. Its cosine similarity scores (0.07-0.20) with existing models prove its originality.

Q: What are the recommended use cases?

The model is primarily designed for creative applications, particularly in the context of AI VTuber activities. However, it's currently noted as being of relatively low quality and lacking diversity, making it more suitable for experimental or development purposes.