DARE: Diffusion Language Model Activation Reuse for Efficient Inference

arXiv:2605.08134v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to auto-regressive (AR) models, offering greater expressive capacity and potential for parallel generation and faster inference. However, open-source dLLMs remain immature, lagging behind AR models in both efficiency and quality. We identify an underexplored property of dLLMs: *token-wise redundancy* in bi-directional self-attention. Self-attention activations are…

cs.LG updates on arXiv.org · May 12 · 1 min read · score 7.0

From the source