whisper-ja-anime-v0.1

Maintained By
efwkjn

whisper-ja-anime-v0.1

PropertyValue
Authorefwkjn
Training Duration~160 hours on NVIDIA 3060
Model URLHuggingFace Repository
ArchitectureModified Whisper with frozen turbo encoder + 2 decoder layers

What is whisper-ja-anime-v0.1?

whisper-ja-anime-v0.1 is a specialized speech recognition model designed specifically for Japanese audio transcription, with a focus on anime-related content. The model was trained for 2^19 steps with a batch size of 8, incorporating multiple training datasets including OOPPEENN, Reazon, and Common Voice 19.

Implementation Details

The model features a frozen turbo encoder coupled with two decoder layers, trained using a mixed strategy: 50% with prompts and 25% without timestamps. While noted as potentially undertrained, it shows promising performance in domain-specific tasks.

  • Optimized for Japanese transcription in anime-adjacent domains
  • Reduced hallucination compared to baseline models
  • Support for both beam search and no-timestamp operations
  • Validated across multiple anime and general Japanese speech datasets

Core Capabilities

  • Domain-specific transcription with improved accuracy for anime content
  • Flexible beam size options (1 and 5) for different accuracy requirements
  • Competitive performance against turbo models in domain-specific tasks
  • Effective handling of various Japanese speech patterns and accents

Frequently Asked Questions

Q: What makes this model unique?

The model specializes in anime-domain Japanese transcription while maintaining low hallucination rates, making it particularly useful for anime-related content transcription tasks. It achieves this while remaining a drop-in replacement for standard Whisper models.

Q: What are the recommended use cases?

This model is best suited for transcribing Japanese audio from anime and related content, showing particularly strong performance on domain-specific datasets like Blue Archive, Genshin Impact, and various anime series. It's also capable of handling general Japanese speech but may show reduced performance on long-form content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.