NVDA 192.53 ▼1.64%GOOGL 337.39 ▼1.84%MSFT 372.97 ▲5.71%AMD 521.58 ▼2.06%INTC 128.32 ▼3.42%TSMC 432.35 ▼0.61%AMZN 232.69 ▲2.50%META 550.25 ▲1.36%AAPL 283.78 ▲3.14%PLTR 112.93 ▲5.28%
Markets at last close

IBM · Models

SpeechCombine brings instruction following to speech models

·1 min read

SpeechCombine is an instruction-following speech language model designed to avoid the complexity and scale demands of conventional speech instruction tuning. The work targets a core challenge in speech language models: adapting a text-based LLM to a new modality while supporting speech-specific instructions, without relying on the large synthetic datasets often used in text LLM training pipelines.

The method starts with a text LLM base model and uses continuous pre-training on speech utterances to produce a speech-adapted model. It then combines that model’s weights with the weight difference between the instruction-tuned and base versions of the original text LLM, transferring instruction-following capabilities directly into the speech domain.

IBM Research reports that SpeechCombine can be trained with only a single round of speech pre-training on as little as 30k hours of speech data. Results indicate that the approach preserves the knowledge and capabilities of the original text LLM while extending them to speech, pointing to a training path that reduces dependence on massive speech datasets.

Originally reported by research.ibm.comRead the source →
Related coverage
All IBM news →