Paraformer-large

Maintained By
funasr

Paraformer-large

PropertyValue
Authorfunasr
Model TypeNon-autoregressive Speech Recognition
Training Data60,000 hours of Mandarin
PaperParaformer: Fast and Accurate Parallel Transformer (INTERSPEECH 2022)

What is Paraformer-large?

Paraformer-large is a groundbreaking non-autoregressive end-to-end speech recognition model that revolutionizes the way we process speech to text. Unlike traditional autoregressive models, it can generate entire sentence transcriptions in parallel, leading to exceptional inference efficiency and reduced computational costs.

Implementation Details

The model is implemented using the funasr_onnx runtime and supports both CPU and GPU inference. It features quantization options for optimized performance and flexible batch processing capabilities. The model can be easily deployed using pip installation and supports various input formats including string paths and numpy arrays.

  • Parallel text generation for entire sentences
  • Support for both CPU and GPU inference
  • Quantization options for optimized performance
  • Flexible batch processing capabilities
  • Easy deployment through pip installation

Core Capabilities

  • State-of-the-art speech recognition performance (Top ranked on SpeechIO leaderboard)
  • 10x reduction in machine costs for cloud services
  • Efficient parallel inference using GPUs
  • Support for industrial-scale speech processing
  • Integration with comprehensive speech processing pipeline

Frequently Asked Questions

Q: What makes this model unique?

Paraformer-large's non-autoregressive architecture enables parallel processing of entire sentences, dramatically improving inference efficiency while maintaining accuracy comparable to traditional models. This makes it particularly valuable for large-scale deployment scenarios.

Q: What are the recommended use cases?

The model is ideal for industrial-scale speech recognition applications, particularly in Mandarin language processing. It's especially suitable for cloud services where computational efficiency is crucial, and for applications requiring real-time or near-real-time speech recognition.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.