Phi-4-mm-inst-zeroth-kor
Property | Value |
---|---|
Base Model | microsoft/Phi-4-multimodal-instruct |
Training Dataset | zeroth_korean |
Training Steps | 174 steps (1 epoch) |
Model Hub | HuggingFace |
What is Phi-4-mm-inst-zeroth-kor?
Phi-4-mm-inst-zeroth-kor is a specialized Korean speech processing model fine-tuned from Microsoft's Phi-4-multimodal-instruct. This model represents a significant advancement in Korean automatic speech recognition (ASR) and speech translation tasks, achieving remarkable improvements particularly on the zeroth-test benchmark where it reduced the error rate from 195.92 to 7.02.
Implementation Details
The model was fine-tuned on the zeroth_korean dataset for just one epoch (174 steps), demonstrating efficient learning capabilities. It implements flash attention 2 for optimal performance and supports various speech-related tasks through different prompt templates.
- Supports both ASR and speech translation tasks
- Implements flash_attention_2 for improved performance
- Uses specialized prompt templates for different tasks
- Trained using A40 GPU architecture
Core Capabilities
- Korean Speech Recognition (ASR) with significantly improved accuracy
- Korean-to-English speech translation
- English-to-Korean speech translation
- Chain-of-thought translation capabilities
- Multi-directional speech processing tasks
Frequently Asked Questions
Q: What makes this model unique?
The model achieves remarkable improvement in Korean ASR tasks, reducing the error rate by 96% compared to the base model on the zeroth-test benchmark. It also demonstrates capability in speech translation tasks without explicit translation training.
Q: What are the recommended use cases?
The model is particularly suited for Korean speech recognition, Korean-English speech translation, and can be used in applications requiring transcription or translation of Korean speech content. It's especially effective when chain-of-thought (CoT) processing is needed for complex translation tasks.