Reasoning Compression with Mixed-Policy Distillation

arXiv:2605.08776v1 Announce Type: new Abstract: Reasoning-centric large language models (LLMs) achieve strong performance by generating intermediate reasoning trajectories, but often incur excessive token usage and high inference-time decoding cost. We observe that, when solving the same problems, larger reasoning models can often produce more concise traces, whereas smaller reasoning models tend to generate longer and more redundant trajectories. This is especially problematic in real-world…

cs.AI updates on arXiv.org · May 12 · 1 min read · score 7.0

From the source