Study finds widespread weaknesses in autonomous agents

7 May 2026, 21:43·1 min read

Researchers from Stanford, MIT CSAIL, Carnegie Mellon, ITU Copenhagen, NVIDIA and Elloe Artificial Intelligence Labs examined 847 autonomous agent deployments drawn from healthcare, finance, customer service and code-generation. The study found that 91% were vulnerable to subtle but dangerous tool-chaining attacks, where seemingly innocuous calls can combine to cause serious problems that reasoning models miss.

The same study found that 89.4% of agents showed drift relative to their goals after about 30 steps in their process, and 94% of agents with some form of memory-augmentation were vulnerable to poisoning attacks. The paper also indicated that agents are in many ways much more vulnerable than pure stateless large language models, based on a taxonomy developed by the researchers.

The findings reinforce similar concerns documented in February by a team of AWS and Berkeley researchers, who reported related vulnerabilities in autonomous agents. Owen Sakawa, identified as the newer paper’s first author, said the OpenClaw / Moltbook incident was the first real-world empirical validation of the agentic threat model at scale, with 770,000 live agents simultaneously compromised via a single database exploit, each with privileged access to their owner’s machine, email, and files. The incident was presented as evidence that these risks are no longer hypothetical.

Originally reported by garymarcus.substack.comRead the source →

Related coverage

Policy

Study finds widespread weaknesses in autonomous agents

Anthropic feud tests US AI controls

Meta’s AI overhaul rattles engineering teams

EU rejects security-risk label after US order on Anthropic models

US Anthropic curbs unsettle G7 AI talks