Patronus AI raises $50 million to expand simulated testing for AI agents

Patronus AI, a startup founded in 2023 by former Meta AI researchers Anand Kannappan and Rebecca Qian, has announced a $50 million Series B funding round led by Greenfield Partners, with participation from Notable Capital, Lightspeed, Datadog, and Samsung. The financing brings the company’s total funding to $70 million. Patronus, based in San Francisco, is focused on building simulated digital environments designed to test how AI agents perform across complex tasks.

The company is targeting a growing challenge in artificial intelligence as agents move beyond answering prompts and begin handling multi-step assignments such as booking travel or conducting financial analysis. Patronus argues that strong benchmark scores alone do not show whether an AI system can carry out real-world work correctly. Its approach uses what it calls digital world models, which replicate websites and internal systems so agents can be evaluated after training through reinforcement learning, a process that rewards successful outcomes and penalizes mistakes.

Supporters of the company’s approach say those simulations are becoming increasingly important as AI developers seek to measure behavior in unpredictable situations. Glenn Solomon, a managing director at Notable Capital, said virtually every frontier AI lab and many emerging startups are customers, and described demand for the environments as extremely strong. He also said Patronus’ tools are effective at identifying shortcuts taken by agents that may allow them to appear successful without actually completing tasks properly. The company said its revenue has grown 15-fold over the past year.

Patronus compares its testing method to the use of synthetic environments in autonomous vehicle development, where rare but important hazards can be modeled before systems are deployed in the real world. Kannappan said the company is currently focused on verifiable areas including software engineering and finance, but intends to expand into harder-to-verify domains over time. He said the broader aim is to create environments where agents can operate over much longer time horizons, potentially running for hours, days, or weeks.

The startup says its main competition comes from in-house evaluation teams built by AI labs themselves. It also distinguishes its work from human-data firms such as Mercor and Surge, which assist model makers with reinforcement learning. Patronus says its system instead evaluates agent behavior without human involvement, positioning its platform as an automated way to test whether AI agents can perform reliably before being trusted with more consequential tasks.