GenAIThis paper offers a geometric explanation for emergent misalignment in fine-tuned LLMs, framing it as a feature-superposition problem rather than a mysterious safety failure. It should be useful for researchers studying…
cs.AI updates on arXiv.org·May 6·Score 10.0
AgenticThis paper proposes a low-latency fraud-detection layer for spotting adversarial interaction patterns in LLM agents. It matters because agent defenses need to operate in real time, not just at the prompt-filtering stage.
cs.AI updates on arXiv.org·May 6·Score 9.9
AgenticThis position paper argues that multi-agent safety depends more on interaction topology than on the alignment or scale of the underlying models. For builders of agentic systems, it reframes safety as a systems-design…
cs.AI updates on arXiv.org·May 6·Score 9.3
SafetyThis paper reframes AI safety around irreversibility, arguing that low-friction deployment changes the control problem more than raw capability does. It should interest safety researchers looking for a systems-level…
cs.AI updates on arXiv.org·May 6·Score 8.9
AgenticThis paper studies how a jailbreak can propagate across multi-agent systems and proposes a foresight-guided defense to stop the spread early. It matters for builders shipping agent swarms, where one compromised agent…
cs.AI updates on arXiv.org·May 6·Score 10.0

physical aiMIT highlights a training method that makes reasoning models better at expressing uncertainty without losing accuracy. For builders, that matters because calibrated confidence is a practical lever for reducing…
Tavily · Physical Ai·Score 9.7