Codex hit 4 million weekly developers

In this edition:

This week: Codex hit 4 million weekly developers, the buyer-side gap got measured by SaaS-Bench, and Anthropic posted its first profitable quarter
Under the radar: Meta's FAIR layoffs and what "your work traces become training data" means for every operator running agents on employee workflows
What's on the calendar: Microsoft Build, Nvidia earnings, and the OpenAI IPO filing window

THE WEEK IN AI

THE WEEK IN ONE SENTENCE

Coding agents stopped being prompts this week and started being workers. OpenAI, Google, and Anthropic each shipped a version of the same idea on the same five-day arc. The buyer-side gap got clearer at the same time, which is the part of the story most coverage missed.

THREE SIGNALS

01 • Agents

Coding agents finally got a long-running mode

OpenAI shipped Codex goal mode on Thursday across the app, IDE, and CLI. The shift is in the interface. You stop writing prompts and start assigning outcomes. Earlier in the week, OpenAI also put Codex inside the ChatGPT mobile app so the agent keeps working after the laptop closes. The same post disclosed that Codex now has more than 4 million weekly developers and is the fastest-growing enterprise product OpenAI has shipped.

Anthropic moved on the same axis from the other side. Andrej Karpathy joined Anthropic to lead a team using Claude to speed up Claude's own training. Recursive AI research is now a department, not a side project. Google shipped Gemini Spark at I/O on Tuesday, a personal agent that runs on its own cloud VM and keeps working after the user closes the laptop.

Three labs, one move. The agent surface is becoming a persistent worker. You hand it an outcome, and it goes away to finish the work. The planning question shifts with that. The old question was where to use Copilot. The new one is which jobs in your company can be handed off as an outcome, and which still need a person watching every step.

AI READY PRO · FREE UNTIL FRIDAY

I will get you AI trained in 30 days for free

After several months of running it quietly with top AI operators and teams, I’m excited to launch AI Ready Pro today to our newsletter subscribers like you.

It’s a 30-day personalized AI training program that is personalized to you and where you are in your AI journey. It’s the culmination of 100’s of hours spent teaching AI to top Fortune 500 AI teams and operators (and working with Mark Cuban).

Each day, you receive a 10-15 minute exercise to complete to improve your AI skills.

This isn’t vague AI theory or a way to pitch you a tool. It’s 30 days of learning to use AI in your real work. So you come out the other side with the skills to use AI to actually improve you and your teams output.

Until this Friday (May 15), it’s free for newsletter readers who complete the extended assessment. This assessment will help us personalize the learning experience to you.

If you lead a team, AI Ready Team is open today too, also for readers first. It’s the same engine, but to train your whole organization. You’ll get a detailed view of:

Your team’s AI readiness level + detailed strengths & weaknesses
Ranked list of AI automation opportunities
Get tactical advice on how to advance AI efforts in your org

Leading a team? Take the Team assessment instead.

02 • Adoption

The buyer-side gap got measured, not just argued

The headline number this week was SaaS-Bench, the academic benchmark released May 16 that runs real cross-app workflows in real SaaS systems. The strongest computer-using agent finished under 4% of tasks end-to-end. The failure modes were the boring kind: planning breakdowns, state loss across apps, and no recovery from mistakes.

The same week, METR released its Frontier Risk Report with head-to-head testing of internal agents from Anthropic, Google, Meta, and OpenAI. The framing matters more than the rankings. The bottleneck on shipping agents is now whether you can watch them, not how smart they are. Agents do well when the result is easy to check, and fall apart fast when checking gets hard.

Microsoft answered with infrastructure. RAMPART, released this week, turns agent red-team scenarios into pytest tests that run in CI. Agent safety stops being a one-time policy review and starts being a build check that runs on every commit. That is the kind of boring engineering move that decides whether the "agent as worker" frame survives a real production environment. Once an agent can send email, query a CRM, and run code, supervision stops being a theory question.

03 • Market

Anthropic crossed a financial line that changes the AI economy

Anthropic will post its first profitable quarter, the Wall Street Journal reported Wednesday. The same paper ran a separate piece saying OpenAI may file IPO paperwork as soon as this week. Friday brought a second proof of scale. SpaceX's IPO filing disclosed an Anthropic partnership worth up to $40 billion, the largest single commitment to Anthropic on the record. The Information also reported Anthropic is in talks to rent servers running Microsoft-designed AI chips for capacity beyond AWS and Google.

This changes the math for buyers. If Anthropic is profitable, the case that Claude pricing has to climb sharply to fund the next training run is weaker than it was six months ago. If Microsoft becomes Anthropic's third hyperscaler, the procurement picture changes again. Claude becomes available wherever the operator already has compute, and the "pick a model, pick a cloud" decision starts to decouple.

For a COO with multi-year SaaS contracts up for renewal in 2026 and 2027, this lines up with what The Information reported about enterprise buyers: the average contract length is shrinking. Finance teams want the option to walk if AI makes the underlying app less critical. Shorter contracts are how buyers are pricing in that option. Anthropic profitability and Microsoft chip access in the same week are how the labs are building the same option from the other side.

UNDER THE RADAR

The most under-reported story this week was the Meta layoffs at FAIR. The headline read is that Meta cut a layer of AI researchers. The under-the-radar read is what employees have alleged since. Internal monitoring data on the same teams was reportedly used to train systems built to do those employees' jobs. The signal is not the layoffs. The signal is that "your work traces become training data" is now a labor question, not a research-ethics edge case. Once AI replacement and employee monitoring show up in the same complaint, the regulatory frame shifts from "AI safety" to "labor and surveillance," and those regulators move faster.

For a mid-market operator, the practical version is narrower. What I keep hearing in these conversations is that the people side of the house was not looped in before the agent went live. The question to surface this quarter is who owns the resulting training data and whether the employees whose work is being captured were told. That is a five-minute conversation to have now, and a much longer one to have once HR has read about the Meta version.

QUOTE OF THE WEEK

❝

"Codex is becoming one of OpenAI's fastest-growing enterprise products. More than 4 million developers now use Codex every week."

OpenAI, in the Dell partnership announcement, May 18, 2026

Hire secure AI teammates that work 24/7.

Hire pre-built AI teammates. Give your engineers and operators a platform to ship their own AI apps. Stop losing sleep about what is running where.

Clutch is the platform behind both: pre-built agents for the workflows your ops team should automate first, plus the integration plane your team's vibe-coded apps and Claude Code projects plug into. One platform. Real production. Visible and safe by default.

Built for ops, engineering, and security teams that are tired of the shadow-AI surface area inside their own company.

WHAT’S ON THE CALENDAR

Microsoft Build runs June 2-3 at Fort Mason, San Francisco. First Build outside Seattle in years. Watch for what's genuinely net-new on Copilot agents versus repackaging of features already shipping inside Microsoft 365.
Nvidia reports earnings Wednesday May 27 after the bell. First print since HBM was disclosed at 63% of frontier-chip component spend. Data-center revenue is the cleanest read on whether enterprise AI workloads are pulling through real budget.
The OpenAI IPO filing window runs May 25 to June 1 per the WSJ. If the S-1 drops this week, it's the first public disclosure of OpenAI's full financials. Every AI vendor pricing conversation between now and the roadshow gets priced against what's in that document.

REPLY

Hit reply and tell me which of the three signals above matches what you are actually seeing inside your company this week. I read every reply, and the patterns from those replies are how I keep finding the next issue.

Have a good weekend,
Haroon