Granite Speech 3.2-8B
Property | Value |
---|---|
Developer | IBM |
Release Date | April 2nd, 2025 |
License | Apache 2.0 |
Primary Tasks | ASR and AST |
Model Size | 8B parameters |
Training Infrastructure | 32 NVIDIA H100 GPUs |
What is granite-speech-3.2-8b?
Granite-speech-3.2-8b is IBM's state-of-the-art speech language model designed specifically for automatic speech recognition (ASR) and automatic speech translation (AST). Built on the foundation of granite-3.2-8b-instruct, this model has been specially adapted for speech processing through modality alignment training on diverse public corpora.
Implementation Details
The model features a sophisticated architecture comprising three main components: a speech encoder with 10 conformer blocks, a speech-text modality adapter, and the base granite-3.2-8b-instruct language model. The speech encoder processes input using CTC with block-attention mechanism, while the modality adapter employs a 2-layer window query transformer for temporal downsampling.
- Speech encoder with 1024 hidden dimensions and 8 attention heads
- Temporal downsampling factor of 10x for efficient processing
- LoRA adapters with rank=64 for query and value projections
- 128k context length capability
Core Capabilities
- English speech recognition with state-of-the-art accuracy
- Speech translation to French, Spanish, Italian, German, Portuguese, Japanese, and Mandarin
- Trained on over 60,000 hours of diverse speech data
- Optimized for enterprise applications
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its efficient architecture that combines speech and language processing capabilities in a relatively compact 8B parameter model, while maintaining high performance through innovative temporal downsampling and modality adaptation techniques.
Q: What are the recommended use cases?
The model is specifically designed for enterprise applications requiring speech processing, particularly English speech-to-text transcription and translation to major languages. It's not recommended for text-only tasks, where the standard Granite language models would be more appropriate.