This paper proposes Variational Linear Attention, an online least-squares formulation that stabilizes linear attention memory with an adaptive penalty matrix. It targets a core bottleneck in long-context transformers: reducing interference while keeping attention efficient.
arXiv:2605.11196v1 Announce Type: new Abstract: Linear attention reduces the quadratic cost of softmax attention to , but its memory state grows as in Frobenius norm, causing progressive interference between stored associations. We introduce \textbf{Variational Linear Attention} (VLA), which reframes the memory update as an online regularised least-squares problem with an adaptive penalty matrix maintained via the Sherman-Morrison rank-1 formula. We prove that normalising the write direction to…