japanese-cloob-vit-b-16

Maintained By
rinna

japanese-cloob-vit-b-16

PropertyValue
Parameter Count197M
LicenseApache 2.0
PaperCLOOB Paper
Authorrinna
ArchitectureViT-B/16 + BERT-12

What is japanese-cloob-vit-b-16?

japanese-cloob-vit-b-16 is a specialized vision-language model developed by rinna Co., Ltd. that implements the CLOOB (Contrastive Leave One Out Boost) architecture for Japanese language understanding. The model combines a ViT-B/16 vision transformer for image processing with a 12-layer BERT model for text processing, enabling powerful image-text alignment in Japanese.

Implementation Details

The model architecture consists of two main components: a Vision Transformer (ViT-B/16) initialized from the AugReg model for image encoding, and a 12-layer BERT for processing Japanese text. It was trained on the CC12M dataset with Japanese-translated captions, making it particularly effective for Japanese language applications.

  • Vision Encoder: ViT-B/16 architecture with patch size 16
  • Text Encoder: 12-layer BERT optimized for Japanese
  • Training Dataset: CC12M with Japanese translations
  • Framework Support: PyTorch

Core Capabilities

  • Japanese text-image alignment and understanding
  • Feature extraction for both images and text
  • Zero-shot image classification with Japanese labels
  • Cross-modal similarity computation

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Japanese language processing using the CLOOB architecture, which differentiates it from traditional CLIP models. It offers specialized capabilities for Japanese text-image understanding while maintaining high performance through its transformer-based architecture.

Q: What are the recommended use cases?

The model is ideal for Japanese-language applications requiring image-text alignment, including image search with Japanese queries, zero-shot image classification with Japanese labels, and cross-modal content analysis in Japanese context.

The first platform built for prompt engineering