Dolphin-base

Property	Value
Model Size	140M parameters
License	Apache 2.0
Author	DataoceanAI
Architecture	Joint CTC-Attention with E-Branchformer
Training Data	210,000+ hours

What is dolphin-base?

Dolphin-base is a sophisticated multilingual ASR (Automatic Speech Recognition) model developed through collaboration between DataoceanAI and Tsinghua University. It's designed specifically for Eastern languages, supporting an impressive array of 40 languages across East Asia, South Asia, Southeast Asia, and the Middle East, plus 22 Chinese dialects.

Implementation Details

The model implements a joint CTC-Attention architecture, utilizing an E-Branchformer encoder and a standard Transformer decoder. A notable innovation is its two-level language token system, which handles linguistic and regional diversity through separate language and region tokens (e.g., <zh> for language, <CN> for region).

Base model size: 140M parameters with 33.3% average WER
Trained on 210,000+ hours of proprietary and open-source data
Implements voice activity detection, segmentation, and language identification

Core Capabilities

Multilingual ASR across 40 Eastern languages
Support for 22 Chinese dialects
Voice activity detection (VAD)
Audio segmentation
Language identification (LID)
Regional accent handling through two-level token system

Frequently Asked Questions

Q: What makes this model unique?

The model's specialty lies in its comprehensive coverage of Eastern languages and Chinese dialects, combined with its innovative two-level language token system. This makes it particularly effective for handling diverse Asian language variations and accents.

Q: What are the recommended use cases?

The model is ideal for applications requiring Eastern language speech recognition, particularly in multilingual environments. It's suitable for voice transcription services, language learning platforms, and applications requiring language identification or voice activity detection in Asian languages.

dolphin-base