Karpathy joins Anthropic

THE AI BRIEF

Today's signal: Anthropic turned the recursive AI research loop from a research idea into an org structure, and the lab race is about to compound on whichever team can use its own model to build the next one fastest.

In today’s issue:

Main story: Anthropic just made AI research recursive.
Also worth knowing: OpenAI's reasoning model reportedly solved an 80-year-old Erdős math problem, Nvidia posted a record data-center quarter, Cursor is now embedded inside Jira, Ramp built finance agents on Google's Managed Agents, and GitHub disclosed a breach via a poisoned VS Code extension.

THE READ

Andrej Karpathy joined Anthropic to lead a team using Claude to accelerate Claude's own training. The model improving the model is now a department, not a thought experiment.

Andrej Karpathy announced he is joining Anthropic to return to frontier LLM research after Tesla, OpenAI, and his Eureka Labs education project. Axios reported he will lead a new team focused on Claude-assisted pretraining research, meaning Claude itself is being used to accelerate the work of training the next Claude. Techmeme made the story its top AI item on the day Google opened I/O. The signal carried through the rest of the week.

The talent move is the obvious headline. The org structure is the more useful read. Most labs already use their own models to help write code, run experiments, and draft research notes. Anthropic just gave that work a leader, a charter, and a name. When research velocity becomes an explicit product surface inside the lab, two things happen. The teams that are good at it pull away faster on the next generation, because every shipped capability feeds the work on the one after it. And the gap stops being about who has more researchers, and starts being about who has built the better internal loop.

AI READY PRO · FREE UNTIL FRIDAY

I will get you AI trained in 30 days for free

After several months of running it quietly with top AI operators and teams, I’m excited to launch AI Ready Pro today to our newsletter subscribers like you.

It’s a 30-day personalized AI training program that is personalized to you and where you are in your AI journey. It’s the culmination of 100’s of hours spent teaching AI to top Fortune 500 AI teams and operators (and working with Mark Cuban).

Each day, you receive a 10-15 minute exercise to complete to improve your AI skills.

This isn’t vague AI theory or a way to pitch you a tool. It’s 30 days of learning to use AI in your real work. So you come out the other side with the skills to use AI to actually improve you and your teams output.

Until this Friday (May 15), it’s free for newsletter readers who complete the extended assessment. This assessment will help us personalize the learning experience to you.

If you lead a team, AI Ready Team is open today too, also for readers first. It’s the same engine, but to train your whole organization. You’ll get a detailed view of:

Your team’s AI readiness level + detailed strengths & weaknesses
Ranked list of AI automation opportunities
Get tactical advice on how to advance AI efforts in your org

Leading a team? Take the Team assessment instead.

For a buyer, the category change matters more than the personnel. The frontier model market this year is going to be priced on perceived research velocity, not just on benchmark wins. Anthropic, OpenAI, Google, and xAI are now all racing on how quickly their own assistants can be wired into the work of training the next model. Procurement teams should expect the gap between labs to widen on capabilities that improve session-over-session, including coding, long-context retrieval, and agentic planning, because those are the surfaces where the recursive loop pays back first. The model you sign a contract on this quarter is not the model you will run next quarter, and the lab that compounds research fastest is going to set the pace on both.

What I keep hearing in conversations with technical leaders is that this is the moment to stop evaluating models on a single point-in-time benchmark and start evaluating labs on their rate of improvement. A vendor whose model is 80% as good but improving 3x faster is a different procurement decision than one whose model is 100% today and flat. Anthropic just told the market which side of that line it is trying to be on.

Hire secure AI teammates that work 24/7.

Hire pre-built AI teammates. Give your engineers and operators a platform to ship their own AI apps. Stop losing sleep about what is running where.

Clutch is the platform behind both: pre-built agents for the workflows your ops team should automate first, plus the integration plane your team's vibe-coded apps and Claude Code projects plug into. One platform. Real production. Visible and safe by default.

Built for ops, engineering, and security teams that are tired of the shadow-AI surface area inside their own company.

ALSO WORTH KNOWING

OpenAI said a general-purpose reasoning model found a proof on an Erdős problem from 1946. OpenAI claims the result came from a general reasoning model, not a math-specific system. If the result holds up under peer review, the operator implication is that the capability ceiling on "knowledge work that requires real reasoning" moved up overnight, and the bar for what counts as a defensible human-in-the-loop process moves with it.

Nvidia posted a record data-center quarter. GB200 and GB300 deployments and edge-compute revenue drove the print, and the company's read on inference demand remains the most direct signal on whether AI usage is actually scaling at the enterprise tier. The number to track is data-center revenue trajectory; it is the cleanest macro indicator on whether AI workloads are pulling through real budget or still sitting in pilot.

Cursor is now embedded inside Jira. Engineers can assign Cursor to a ticket and the agent reads the ticket, comments, and repo, and opens a PR. The shift to watch is that coding agents are moving out of the IDE and into the system of record where work is assigned, which is also where the supervision arrangement, approval flow, and audit trail have to live.

Ramp built finance agents on Google's Managed Agents in the Gemini API. The interesting part is not the partner, it is the surface. Finance workflows have approvals, exceptions, and audit needs. A managed-agent layer handling that without custom backend work is the first credible enterprise pattern for agentic finance ops.

GitHub said a poisoned VS Code extension compromised an employee device and exposed internal repositories. No movie-style attack on core infrastructure. The breach came through a trusted developer

WATCHING TOMORROW

Week in AI ships Friday at 8 am ET with the week's pattern read. Worth watching alongside it: any Anthropic update on Karpathy's first research direction, and whether OpenAI or Google publicly names a comparable "model-assisted research" lead in response.

Back tomorrow,
Haroon