wav2vec2-large-xlsr-japanese-hiragana

Maintained By
vumichien

wav2vec2-large-xlsr-japanese-hiragana

PropertyValue
Base Modelfacebook/wav2vec2-large-xlsr-53
TaskJapanese Speech Recognition
Authorvumichien
Model HubHuggingFace

What is wav2vec2-large-xlsr-japanese-hiragana?

This is a specialized speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 for Japanese language processing. The model has been specifically trained to output Japanese text in hiragana, utilizing both the Common Voice dataset and the Japanese speech corpus from Saruwatari-lab, University of Tokyo (JSUT).

Implementation Details

The model processes 16kHz audio input and employs MeCab tokenizer with additional preprocessing to handle Japanese text conversion. It utilizes PyKakasi for character conversion and implements specific regex patterns to clean the text output. The implementation requires CUDA support for optimal performance and achieves impressive accuracy metrics with a Word Error Rate (WER) of 24.74% and Character Error Rate (CER) of 10.99% on the test set.

  • Requires 16kHz audio sampling rate
  • Integrates MeCab and PyKakasi for Japanese text processing
  • Supports batch processing for efficient inference
  • Implements custom preprocessing pipeline for Japanese text

Core Capabilities

  • Direct speech-to-text conversion without language model
  • Automatic hiragana output generation
  • Batch processing support for multiple audio files
  • Handling of Japanese-specific characters and pronunciations

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Japanese speech recognition with direct hiragana output, combining the power of wav2vec2-large-xlsr-53 with specialized Japanese language processing capabilities. Its integration with MeCab and PyKakasi makes it particularly effective for Japanese text conversion.

Q: What are the recommended use cases?

The model is ideal for Japanese speech transcription tasks, particularly when hiragana output is desired. It's suitable for applications in voice assistants, transcription services, and Japanese language learning tools where accurate phonetic representation is important.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.