Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

This paper proposes Variational Linear Attention, an online least-squares formulation that stabilizes linear attention memory with an adaptive penalty matrix. It targets a core bottleneck in long-context transformers: reducing interference while keeping attention efficient.

cs.LG updates on arXiv.org · May 13 · 1 min read · score 9.2

From the source

arXiv:2605.11196v1 Announce Type: new Abstract: Linear attention reduces the quadratic cost of softmax attention to , but its memory state grows as in Frobenius norm, causing progressive interference between stored associations. We introduce \textbf{Variational Linear Attention} (VLA), which reframes the memory update as an online regularised least-squares problem with an adaptive penalty matrix maintained via the Sherman-Morrison rank-1 formula. We prove that normalising the write direction to…