Section

Agentic

Autonomous agents, coding agents, agent frameworks, and the orchestration patterns that make them work.

67 stories

Agentic
The Sequence Radar #880: Last Week in AI: A $60B Cursor Deal, Google's Brain Drain, and Midjourney's Body Scanner
A week of really unexpected turns in the AI market.
TheSequenceJun 21Score 7.0
Jun 21
Score 7.0
Agentic
Investing in multi-agent AI safety research
Google DeepMind and partners announce a $10M funding call for multi-agent safety research.
Google DeepMind NewsJun 10Score 7.0
Jun 10
Score 7.0
Agentic
Running OpenAI Models on Amazon Bedrock (58 minute read)
OpenAI cookbook walks through building production workflows with OpenAI models hosted on Amazon Bedrock using the Responses API. It covers structured outputs, tool calling, file inputs, state management, prompt caching,…
TLDR AI FeedJun 2Score 7.0
Jun 2
Score 7.0
Agentic
Opus 4.8 (4 minute read)
Anthropic released Claude Opus 4.8 with benchmark improvements, adjustable effort controls, dynamic workflows in Claude Code, and a faster mode that became significantly cheaper.
TLDR AI FeedMay 29Score 7.0
May 29
Score 7.0
Agentic
Secure MCP Tunnel (6 minute read)
Secure MCP Tunnel enables connecting private MCP servers to OpenAI products without exposing them to the internet. It uses tunnel-client to establish outbound HTTPS paths for request handling while maintaining server…
TLDR AI FeedMay 28Score 7.0
May 28
Score 7.0
Agentic
The Sequence Opinion #864: Every AI Agent Needs a Computer
The raise of agentic sandboxes.
TheSequenceMay 21Score 7.0
May 21
Score 7.0
Agentic
The Sequence AI of the Week #863: The Model is the Interface: Inside Thinking Machines' Interactive Models
Thinking Machines' interactive models turn real-time conversation, vision, audio, and tool use into one continuous learned system.
TheSequenceMay 20Score 7.0
May 20
Score 7.0
Agentic
Medicare's new payment model is built for AI, and most of the tech world has no idea
Medicare’s new ACCESS payment model creates a reimbursement path for AI agents that monitor patients between visits and coordinate follow-up care. For builders, the significance is less the policy headline than the fact…
AI News & Artificial Intelligence | TechCrunchMay 13Score 7.5
May 13
Score 7.5
Agentic
FOD#152: AI Agent Skills: Why Skill Curation Is the Next Bottleneck
This newsletter piece frames agent skills as the next bottleneck in building useful AI agents. It is relevant as a market-and-product signal, but appears light on technical detail or new research.
Turing PostMay 12Score 6.6
May 12
Score 6.6
Agentic
Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge
arXiv:2605.08518v1 Announce Type: new Abstract: Competition retrospectives are useful when they explain what a leaderboard measured, how hidden evaluation changed conclusions, and which design patterns were rewarded. We…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
Agentic
MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction
arXiv:2605.08670v1 Announce Type: new Abstract: Large language model (LLM) powered AI agents have emerged as a promising paradigm for autonomous problem-solving, yet they continue to struggle with complex, multi-step…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
Agentic
Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs
arXiv:2605.08686v1 Announce Type: new Abstract: Multi-agent large language model (LLM) systems often rely on a controller to coordinate a pool of heterogeneous models, yet existing controllers are typically limited to…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
Agentic
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
arXiv:2605.08769v1 Announce Type: new Abstract: Large language model (LLM)-based multi-agent systems have shown strong potential on complex tasks through agent specialization, tool use, and collaborative reasoning.…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
Agentic
AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization
arXiv:2605.08704v1 Announce Type: new Abstract: Multi-agent reasoning has shown promise for improving the problem-solving ability of large language models by allowing multiple agents to explore diverse reasoning paths.…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
Agentic
GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning
arXiv:2605.06671v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
Agentic
Beyond the Black Box: Interpretability of Agentic AI Tool Use
arXiv:2605.06890v1 Announce Type: new Abstract: AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagnose and control.…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
Agentic
Switchcraft: AI Model Router for Agentic Tool Calling
arXiv:2605.07112v1 Announce Type: new Abstract: Agentic AI systems that invoke external tools are powerful but costly, leading developers to default to large models and overspend inference budgets. Model routing can…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
Agentic
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization
arXiv:2605.07214v1 Announce Type: new Abstract: Large Language Models have recently emerged as a promising paradigm for automated heuristic design for NP-hard combinatorial optimization problems. Despite this progress,…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
Agentic
Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations
arXiv:2605.06696v1 Announce Type: new Abstract: Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
Industry
Running Codex safely at OpenAI (6 minute read)
OpenAI outlines the safety controls behind Codex, including sandboxing, approval gates, and constrained execution environments. It’s a useful look at how agentic coding systems are being boxed in to reduce blast radius…
TLDR AI FeedMay 11Score 7.9
May 11
Score 7.9
Agentic
AlphaCrafter: A Full-Stack Multi-Agent Framework for Cross-Sectional Quantitative Trading
arXiv:2605.05580v1 Announce Type: new Abstract: Financial markets are inherently non-stationary, driven by complex interactions among macroeconomic regimes, microstructural frictions, and behavioral dynamics. Building…
cs.AI updates on arXiv.orgMay 9Score 7.0
May 9
Score 7.0
Agentic
Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation
arXiv:2605.05657v1 Announce Type: new Abstract: Multi-agent LLM systems for code generation face a fundamental routing problem: the optimal orchestration topology depends on the structural complexity of the code under…
cs.AI updates on arXiv.orgMay 9Score 7.0
May 9
Score 7.0
Agentic
Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure
arXiv:2605.05440v1 Announce Type: new Abstract: The security discussion around agentic AI focuses heavily on prompt injection. This paper argues that multi-agent systems also create a distinct authorization problem:…
cs.AI updates on arXiv.orgMay 9Score 7.0
May 9
Score 7.0
Agentic
Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers
arXiv:2605.05725v1 Announce Type: new Abstract: Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly…
cs.AI updates on arXiv.orgMay 9Score 7.0
May 9
Score 7.0
Agentic
When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration
arXiv:2605.04361v1 Announce Type: new Abstract: The prevailing assumption in agent orchestration is that more context is better. We test this on multi-agent software design across 10 tasks, 7 context-injection…
cs.AI updates on arXiv.orgMay 8Score 7.0
May 8
Score 7.0
Agentic
AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use
arXiv:2605.04785v1 Announce Type: new Abstract: Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action,…
cs.AI updates on arXiv.orgMay 8Score 7.0
May 8
Score 7.0
Agentic
Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games
arXiv:2605.04906v1 Announce Type: new Abstract: While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all…
cs.AI updates on arXiv.orgMay 8Score 7.0
May 8
Score 7.0
Agentic
SensingAgents: A Multi-Agent Collaborative Framework for Robust IMU Activity Recognition
arXiv:2605.04608v1 Announce Type: new Abstract: Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction.…
cs.AI updates on arXiv.orgMay 8Score 7.0
May 8
Score 7.0
Agentic
GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking
arXiv:2605.04449v1 Announce Type: new Abstract: Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle…
cs.CL updates on arXiv.orgMay 7Score 7.0
May 7
Score 7.0
Agentic
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
This paper proposes a coordinated few-step flow method for offline multi-agent decision making, aiming to preserve inter-agent coordination without the usual multi-step sampling cost. It is most relevant to researchers…
cs.AI updates on arXiv.orgMay 6Score 9.6
May 6
Score 9.6
Agentic
Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling
This paper compares common test-time scaling strategies for language models through a compute-efficiency lens, including self-consistency, self-refinement, multi-agent debate, and mixture-of-agents. It is useful for…
cs.AI updates on arXiv.orgMay 6Score 9.8
May 6
Score 9.8
Agentic
Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy
This paper proposes a clinician-in-the-loop speech therapy agent that combines stuttering classification with multi-agent LLM reasoning to support personalized therapy planning. It is notable for grounding agentic AI in…
cs.AI updates on arXiv.orgMay 6Score 9.8
May 6
Score 9.8
Agentic
CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers
CP-SynC tackles a brittle step in LLM-assisted constraint programming by adding synthesized checkers for zero-shot MiniZinc modeling. The work is relevant to agentic code generation because it closes the loop between…
cs.AI updates on arXiv.orgMay 6Score 10.0
May 6
Score 10.0
Industry
Effect-Transparent Governance for AI Workflow Architectures: Semantic Preservation, Expressive Minimality, and Decidability Boundaries
This paper formalizes a governance layer for AI workflows and proves it can constrain effects like memory access and LLM calls without reducing internal expressivity. The machine-checked Rocq development should interest…
cs.AI updates on arXiv.orgMay 6Score 9.9
May 6
Score 9.9
Agentic
AI Agents for Sustainable SMEs: A Green ESG Assessment Framework
This paper applies an agent-based workflow to ESG scoring for SMEs, using expert-validated baselines and automated classification over survey data. It is more interesting as an applied agent orchestration case than as a…
cs.AI updates on arXiv.orgMay 6Score 7.2
May 6
Score 7.2
Agentic
Towards Multi-Agent Autonomous Reasoning in Hydrodynamics
This arXiv paper proposes a multi-agent reasoning prototype for hydrodynamics, arguing that specialized agents can reduce the context bottlenecks of single-agent scientific workflows. It is most relevant as an applied…
cs.AI updates on arXiv.orgMay 6Score 9.2
May 6
Score 9.2
Agentic AI
Sakana AI Introduces KAME: A Tandem Speech-to-Speech Architecture That Injects LLM Knowledge in Real Time
A MarkTechPost write-up highlights KAME, a tandem speech-to-speech setup that claims to inject LLM knowledge into live conversation without added latency. The underlying technical substance is unclear from the summary,…
MarkTechPostMay 3Score 3.5
May 3
Score 3.5
Agentic AI
Mistral AI Launches Remote Agents in Vibe and Mistral Medium 3.5 with 77.6% SWE-Bench Verified Score
Mistral’s latest Le Chat update adds remote, async coding agents and a new flagship model, but the coverage is thin and reads more like a launch roundup than a technical analysis. The SWE-Bench score is notable, though…
MarkTechPostMay 3Score 4.5
May 3
Score 4.5
Agentic AI
A Coding Implementation to Parsing, Analyzing, Visualizing, and Fine-Tuning Agent Reasoning Traces Using the lambda/hermes-agent-reasoning-traces Dataset
A tutorial walks through parsing and visualizing an agent-reasoning-traces dataset, then sketches fine-tuning workflows on top of it. It may be useful as a hands-on notebook, but it reads more like a generic…
MarkTechPostMay 2Score 3.5
May 2
Score 3.5
Industry
Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models
The first week of the Musk v. Altman trial surfaces new testimony about OpenAI’s origins and xAI’s model-distillation practices. It matters less as a technical breakthrough than as a window into the legal and strategic…
Artificial intelligence – MIT Technology ReviewMay 1Score 7.2
May 1
Score 7.2
GenAI
New Frontier Models Are Faster, Not More Reliable, at Spatial Biology (10 minute read)
A benchmark check on spatial biology shows newer frontier models running faster without becoming more reliable. The takeaway for builders is that domain-specific training and analysis patterns still matter more than raw…
TLDR AI FeedMay 1Score 6.5
May 1
Score 6.5
Infra
Speculative Decoding for RL Training (18 minute read)
Speculative decoding is extended to RL training rollouts, preserving output distributions while speeding up sampling. The result matters for agentic systems because rollout throughput is often a bottleneck in…
TLDR AI FeedMay 1Score 10.0
May 1
Score 10.0
Industry
Claude Security is now in public beta (4 minute read)
Claude Security enters public beta as an enterprise-facing vulnerability scanning and patching feature. The announcement is light on technical detail and reads more like a product update than a substantive agentic…
TLDR AI FeedMay 1Score 3.5
May 1
Score 3.5
GenAI
xAI has launched Grok 4.3 (3 minute read)
Grok 4.3 is positioned as a cheaper, stronger successor in xAI’s model line, with reported gains on instruction following and agentic customer support tasks. The main signal is benchmark positioning, but the writeup…
TLDR AI FeedMay 1Score 3.7
May 1
Score 3.7
Infra
KV Cache Locality: The Hidden Variable in Your LLM Serving Cost (11 minute read)
KV cache locality emerges as a major serving lever: the same model and hardware can deliver very different latency and throughput depending on request routing. The piece is useful for teams running long-context or…
TLDR AI FeedMay 1Score 8.5
May 1
Score 8.5
Agentic
Cursor's war chest, xAI's redemption (16 minute read)
A market-structure take on Cursor and xAI frames the deal as strategic positioning rather than a product breakthrough. It may matter for AI tooling distribution and model access, but it offers little technical substance…
TLDR AI FeedMay 1Score 4.5
May 1
Score 4.5
GenAI
Tracing the Goblin Quirk in GPT Models (6 minute read)
OpenAI traces a quirky GPT-5.1 behavior back to reward signals from personality tuning, illustrating how small optimization choices can steer model style. The piece is more of a model-behavior note than a builder-facing…
TLDR AI FeedMay 1Score 7.0
May 1
Score 7.0
Agentic
GLM-5V-Turbo (25 minute read)
GLM-5V-Turbo folds multimodal perception into reasoning and tool use, aiming to make agent workflows work across text, code, and visual inputs. It looks especially relevant for builders exploring unified models that can…
TLDR AI FeedMay 1Score 9.6
May 1
Score 9.6
GenAI
Qwen-Scope: Decoding Intelligence, Unleashing Potential (9 minute read)
Qwen-Scope is an interpretability toolkit for Qwen3 and Qwen3.5 that exposes internal model behavior for analysis and control. It may be useful for debugging, controllable inference, and dataset inspection, though the…
TLDR AI FeedMay 1Score 6.2
May 1
Score 6.2
Infra
AWS Neuron SDK now available with Neuron Agentic Development for NKI kernel development on Trainium (1 minute read)
AWS adds agent-assisted development tools for Neuron kernel work on Trainium and Inferentia. The release matters for teams tuning low-level performance on AWS accelerators, though the announcement is still light on…
TLDR AI FeedMay 1Score 7.5
May 1
Score 7.5