granite-speech-3.2-8b

Maintained By
ibm-granite

Granite Speech 3.2-8B

PropertyValue
DeveloperIBM
Release DateApril 2nd, 2025
LicenseApache 2.0
Primary TasksASR and AST
Model Size8B parameters
Training Infrastructure32 NVIDIA H100 GPUs

What is granite-speech-3.2-8b?

Granite-speech-3.2-8b is IBM's state-of-the-art speech language model designed specifically for automatic speech recognition (ASR) and automatic speech translation (AST). Built on the foundation of granite-3.2-8b-instruct, this model has been specially adapted for speech processing through modality alignment training on diverse public corpora.

Implementation Details

The model features a sophisticated architecture comprising three main components: a speech encoder with 10 conformer blocks, a speech-text modality adapter, and the base granite-3.2-8b-instruct language model. The speech encoder processes input using CTC with block-attention mechanism, while the modality adapter employs a 2-layer window query transformer for temporal downsampling.

  • Speech encoder with 1024 hidden dimensions and 8 attention heads
  • Temporal downsampling factor of 10x for efficient processing
  • LoRA adapters with rank=64 for query and value projections
  • 128k context length capability

Core Capabilities

  • English speech recognition with state-of-the-art accuracy
  • Speech translation to French, Spanish, Italian, German, Portuguese, Japanese, and Mandarin
  • Trained on over 60,000 hours of diverse speech data
  • Optimized for enterprise applications

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its efficient architecture that combines speech and language processing capabilities in a relatively compact 8B parameter model, while maintaining high performance through innovative temporal downsampling and modality adaptation techniques.

Q: What are the recommended use cases?

The model is specifically designed for enterprise applications requiring speech processing, particularly English speech-to-text transcription and translation to major languages. It's not recommended for text-only tasks, where the standard Granite language models would be more appropriate.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.