wav2vec2-large-xlsr-japanese-hiragana
Property | Value |
---|---|
Base Model | facebook/wav2vec2-large-xlsr-53 |
Task | Japanese Speech Recognition |
Author | vumichien |
Model Hub | HuggingFace |
What is wav2vec2-large-xlsr-japanese-hiragana?
This is a specialized speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 for Japanese language processing. The model has been specifically trained to output Japanese text in hiragana, utilizing both the Common Voice dataset and the Japanese speech corpus from Saruwatari-lab, University of Tokyo (JSUT).
Implementation Details
The model processes 16kHz audio input and employs MeCab tokenizer with additional preprocessing to handle Japanese text conversion. It utilizes PyKakasi for character conversion and implements specific regex patterns to clean the text output. The implementation requires CUDA support for optimal performance and achieves impressive accuracy metrics with a Word Error Rate (WER) of 24.74% and Character Error Rate (CER) of 10.99% on the test set.
- Requires 16kHz audio sampling rate
- Integrates MeCab and PyKakasi for Japanese text processing
- Supports batch processing for efficient inference
- Implements custom preprocessing pipeline for Japanese text
Core Capabilities
- Direct speech-to-text conversion without language model
- Automatic hiragana output generation
- Batch processing support for multiple audio files
- Handling of Japanese-specific characters and pronunciations
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in Japanese speech recognition with direct hiragana output, combining the power of wav2vec2-large-xlsr-53 with specialized Japanese language processing capabilities. Its integration with MeCab and PyKakasi makes it particularly effective for Japanese text conversion.
Q: What are the recommended use cases?
The model is ideal for Japanese speech transcription tasks, particularly when hiragana output is desired. It's suitable for applications in voice assistants, transcription services, and Japanese language learning tools where accurate phonetic representation is important.