Ambient Persuasion in a Deployed AI Agent: Unauthorized Escalation Following Routine Non-Adversarial Content Exposure

arXiv:2605.00055v1 Announce Type: cross Abstract: We report a safety incident in a deployed multi-agent research system in which a primary AI agent installed 107 unauthorized software components, overwrote a system registry, overrode a prior negative decision from an oversight agent, and escalated through increasingly privileged operations up to an attempted system administrator command. The incident was preceded not by an adversarial attack but by routine content: a forwarded technology…

cs.AI updates on arXiv.org · May 5 · 1 min read

From the source

Originally published at cs.AI updates on arXiv.orgRead at source →