wav2vec2-large-xlsr-53-th-cv8-newmm

Maintained By
wannaphong

wav2vec2-large-xlsr-53-th-cv8-newmm

PropertyValue
LicenseApache 2.0
PaperThai Wav2Vec2.0 with CommonVoice V8
LanguageThai
FrameworkPyTorch

What is wav2vec2-large-xlsr-53-th-cv8-newmm?

This is a state-of-the-art Thai automatic speech recognition model that builds upon the wav2vec2-large-xlsr-53 architecture. It's specifically fine-tuned on the Thai CommonVoice V8 dataset, incorporating improvements over the V7 dataset version. The model utilizes a newmm tokenizer along with a language model to achieve superior performance in Thai speech recognition tasks.

Implementation Details

The model is built on Facebook's wav2vec2-large-xlsr-53 architecture and implements several key technical innovations:

  • Pre-tokenization using pythainlp.tokenize.word_tokenize
  • Integration with CommonVoice V8 dataset, enhanced from V7
  • Improved training methodology with bug fixes from the original implementation
  • Combined language model approach for better accuracy

Core Capabilities

  • Achieves 12.58% WER (Word Error Rate) with newmm tokenizer on CV8 testset
  • Demonstrates 3.27% CER (Character Error Rate)
  • Supports Thai language speech recognition with high accuracy
  • Performs better than previous versions on both CV7 and CV8 testsets

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its improved performance over previous versions, achieving lower WER and CER rates through the combination of wav2vec2 architecture with newmm tokenization and language modeling. It's specifically optimized for Thai language processing.

Q: What are the recommended use cases?

The model is ideal for Thai speech recognition tasks, particularly in applications requiring high accuracy transcription of Thai speech. It's suitable for both academic research and practical applications in speech-to-text conversion for Thai language content.

The first platform built for prompt engineering