whisper-ja-anime-v0.1

Property	Value
Author	efwkjn
Training Duration	~160 hours on NVIDIA 3060
Model URL	HuggingFace Repository
Architecture	Modified Whisper with frozen turbo encoder + 2 decoder layers

What is whisper-ja-anime-v0.1?

whisper-ja-anime-v0.1 is a specialized speech recognition model designed specifically for Japanese audio transcription, with a focus on anime-related content. The model was trained for 2^19 steps with a batch size of 8, incorporating multiple training datasets including OOPPEENN, Reazon, and Common Voice 19.

Implementation Details

The model features a frozen turbo encoder coupled with two decoder layers, trained using a mixed strategy: 50% with prompts and 25% without timestamps. While noted as potentially undertrained, it shows promising performance in domain-specific tasks.

Optimized for Japanese transcription in anime-adjacent domains
Reduced hallucination compared to baseline models
Support for both beam search and no-timestamp operations
Validated across multiple anime and general Japanese speech datasets

Core Capabilities

Domain-specific transcription with improved accuracy for anime content
Flexible beam size options (1 and 5) for different accuracy requirements
Competitive performance against turbo models in domain-specific tasks
Effective handling of various Japanese speech patterns and accents

Frequently Asked Questions

Q: What makes this model unique?

The model specializes in anime-domain Japanese transcription while maintaining low hallucination rates, making it particularly useful for anime-related content transcription tasks. It achieves this while remaining a drop-in replacement for standard Whisper models.

Q: What are the recommended use cases?

This model is best suited for transcribing Japanese audio from anime and related content, showing particularly strong performance on domain-specific datasets like Blue Archive, Genshin Impact, and various anime series. It's also capable of handling general Japanese speech but may show reduced performance on long-form content.