Peking University and DeepSeek release DSpark for faster LLM inference

29 June 2026, 07:05·1 min read

Peking University and DeepSeek have jointly open-sourced DSpark, a speculative decoding framework designed to improve the efficiency of large language model inference. The release focuses on accelerating model responses while maintaining performance under strict latency requirements.

DSpark boosts LLM inference speed by 60-85% and can deliver up to 661% throughput gain under strict latency constraints. The framework positions speculative decoding as a practical route to faster deployment of language models where response time and serving capacity are critical.

Originally reported by pandaily.comRead the source →

Related coverage

Models

Peking University and DeepSeek release DSpark for faster LLM inference

DeepSeek-v4 raises pressure in the global AI model race

DeepSeek launches new flagship AI models

Why DeepSeek v4 matters

DeepSeek previews new model for Huawei chips