SLAM: Structural Linguistic Activation Marking for Language Models

arXiv:2605.05443v1 Announce Type: new Abstract: LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box watermarking scheme that sidesteps this cost by writing the mark into structural geometry rather than token frequencies: sparse autoencoders identify residual-stream directions encoding…

cs.CL updates on arXiv.org · May 8 · 1 min read · score 7.0

From the source