Cognition just hit $492M run-rate building Devin

THE AI BRIEF

Today's signal: A coding-agent company posted a $492 million revenue run-rate alongside its $1 billion round, and the run-rate is the number that matters for procurement teams. Plus five other things that moved.

In today’s issue:

Main story: Cognition is at $492M run-rate, not just $26B.
Also worth knowing: Apollo flags models gaming safety evals, Harvey matches frontier legal performance on open-source, Anthropic ships a free Claude Code security plugin, Goldman models AI infrastructure at $800B, and Codex now learns from reviewer corrections in production.

THE READ

Cognition announced on Wednesday that it raised over $1 billion at a $26 billion valuation, with enterprise usage up more than 10x year-over-year and a revenue run-rate of $492 million. Devin, the company's software-engineering agent, is the product anchoring the round. The financing closed the same week Anthropic shipped a Claude Code security plugin, and OpenAI showed Codex learning from reviewer corrections in production tax workflows. Coding agents are no longer one company's pitch deck.

The cultural read of a $26 billion valuation is that a category got expensive. The operator read is that $492 million in run-rate, tied to 10x usage growth, is the first time a software-engineering agent vendor has posted a number that looks like a real enterprise software business rather than a research project with a logo. The multiple is roughly 53x revenue, which is high but lower than the public-market AI premiums attached to names with weaker usage. The market is starting to price coding agents on something closer to consumption than vibe.

Three changes are worth tracking. First, run-rate beats valuation as the renewal-diligence question. The right ask of any coding-agent vendor is monthly active developers, tasks completed, and revenue tied to those tasks, and vendors that cannot show usage-attached revenue should be priced as research bets rather than infrastructure. Second, competitive density is real. Codex is shipping production learning loops while Claude Code is shipping security scanning, and Cursor's last round priced the category from the IDE side, which means procurement actually gets to negotiate across three serious vendors with three different wedges. Third, benchmark infrastructure is catching up. Artificial Analysis and IBM launched ITBench-AA today, the first agent benchmark grounded in enterprise SRE incidents, so buyers now have a measurement standard for operational work that did not exist three months ago.

AI READY PRO · FREE UNTIL FRIDAY

I will get you AI trained in 30 days for free

After several months of running it quietly with top AI operators and teams, I’m excited to launch AI Ready Pro today to our newsletter subscribers like you.

It’s a 30-day personalized AI training program that is personalized to you and where you are in your AI journey. It’s the culmination of 100’s of hours spent teaching AI to top Fortune 500 AI teams and operators (and working with Mark Cuban).

Each day, you receive a 10-15 minute exercise to complete to improve your AI skills.

This isn’t vague AI theory or a way to pitch you a tool. It’s 30 days of learning to use AI in your real work. So you come out the other side with the skills to use AI to actually improve you and your teams output.

Until this Friday (May 15), it’s free for newsletter readers who complete the extended assessment. This assessment will help us personalize the learning experience to you.

If you lead a team, AI Ready Team is open today too, also for readers first. It’s the same engine, but to train your whole organization. You’ll get a detailed view of:

Your team’s AI readiness level + detailed strengths & weaknesses
Ranked list of AI automation opportunities
Get tactical advice on how to advance AI efforts in your org

Leading a team? Take the Team assessment instead.

What I keep hearing from teams evaluating Devin and its peers is that the demo is the easy part. The hard part is figuring out which tasks the agent should own and which it should hand back, and what review structure governs that handback. The vendors with real run-rate are the ones that solved that loop for at least one tier of work, and the ones still pitching general-purpose autonomy without the loop are the ones to ask harder questions of. The diligence move worth running this quarter is to add a usage-tied revenue question to your AI vendor template, because valuation tells you what investors believe, and run-rate tells you what customers actually do.

Hire secure AI teammates that work 24/7.

Hire pre-built AI teammates. Give your engineers and operators a platform to ship their own AI apps. Stop losing sleep about what is running where.

Clutch is the platform behind both: pre-built agents for the workflows your ops team should automate first, plus the integration plane your team's vibe-coded apps and Claude Code projects plug into. One platform. Real production. Visible and safe by default.

Built for ops, engineering, and security teams that are tired of the shadow-AI surface area inside their own company.

ALSO WORTH KNOWING

Apollo flagged "evaluation awareness" as a measurement failure mode in AI safety. The research lab warned that frontier models may behave differently when they sense they are being evaluated, which inflates measured safety relative to deployment behavior. The signal: enterprise safety reviews relying on vendor-supplied evaluation numbers may be measuring the test, not the model.

Harvey and Baseten fine-tuned an open-source model to match frontier-lab performance on a legal-agent benchmark. The companies published results on LAB, a domain-specific legal-agent benchmark, with an open-source backbone reaching frontier performance after fine-tuning. The signal: for regulated workflows where data residency or model control matters, open-source is becoming an option on capability, not just price.

Anthropic shipped a free Claude Code security plugin that scans generated edits, outputs, and commits. The plugin checks for injection, secret leakage, and unsafe deserialization patterns inside the coding workflow. The signal: agent-written code is starting to get governance layers built in by default, which raises the floor on what enterprise security teams should expect.

Goldman Sachs estimated AI infrastructure spending will reach $800 billion annually by the end of 2026. The figure covers compute, networking, energy, and data-center buildout across hyperscalers and inference providers. The signal: AI infrastructure is one of the largest concentrated capital cycles in any sector, and downstream costs eventually pass through to enterprise contract pricing.

OpenAI described a Codex workflow where human reviewer corrections become a continuous learning signal. The example was tax-agent preparation, where reviewer fixes are traced back to the system, tested, and shipped as improvements. The signal: production agents are starting to look less like static models and more like continuously trained systems, which changes the maintenance burden for the teams that deploy them.

WATCHING TOMORROW

Whether Anthropic or Google match Cognition's usage disclosure pattern in any of this week's investor communications, and any first-customer announcements off ITBench-AA results.

REPLY

Hit reply and tell me: if a coding-agent vendor pitched your team next week, what is the first usage number you would ask them to put on the table? I read every reply.

FORWARD

If a colleague is sitting in a conversation this week about coding-agent procurement, evaluation, or vendor diligence, forward this issue to them.

Back tomorrow,
Haroon