Speculative decoding is extended to RL training rollouts, preserving output distributions while speeding up sampling. The result matters for agentic systems because rollout throughput is often a bottleneck in large-scale reinforcement learning.

Speculative decoding was applied to RL rollouts without changing output distributions, delivering up to 1.8x throughput gains and projected 2.5x end-to-end speedups at scale.