dolphin-small

Maintained By
DataoceanAI

Dolphin-Small

PropertyValue
Parameter Count372M
Model TypeASR (Automatic Speech Recognition)
ArchitectureJoint CTC-Attention with E-Branchformer encoder
LicenseApache 2.0
Average WER25.2%

What is dolphin-small?

Dolphin-small is a powerful multilingual ASR model developed through collaboration between DataoceanAI and Tsinghua University. It's designed specifically for Eastern languages, supporting 40 languages across East Asia, South Asia, Southeast Asia, and the Middle East, plus 22 Chinese dialects. Trained on over 210,000 hours of data, it represents a significant advancement in multilingual speech recognition technology.

Implementation Details

The model employs a sophisticated joint CTC-Attention architecture, utilizing an E-Branchformer-based encoder and a standard Transformer decoder. A notable innovation is its two-level language token system, which handles linguistic and regional diversity through separate language and region tokens (e.g., for language, for region).

  • 372M parameters for optimal performance-efficiency balance
  • Trained on both proprietary and open-source datasets
  • FFmpeg requirement for audio conversion to WAV format
  • Streamlined architecture without translation capabilities

Core Capabilities

  • Speech Recognition across 40+ languages
  • Voice Activity Detection (VAD)
  • Audio Segmentation
  • Language Identification (LID)
  • Regional dialect support for 22 Chinese variants

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its specialized focus on Eastern languages and dialects, combined with its innovative two-level language token system. This makes it particularly effective for Asian language processing, with state-of-the-art performance for these specific language families.

Q: What are the recommended use cases?

The model is ideal for applications requiring multilingual ASR capabilities in Eastern languages, such as transcription services, voice assistants, and automated content processing systems. It's particularly valuable for applications dealing with Chinese dialects and various Asian languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.