Gemini moved into every surface Google owns

THE AI BRIEF

Today's signal: Google made agents a default surface across Search, Android, Workspace, and shopping, and METR put real numbers on what those agents still can't verify on their own.

In today’s issue:

Main story: Google made the agent layer official. METR explained why that's still risky.
Also worth knowing: Karpathy joins Anthropic, METR releases the first head-to-head lab comparison on agent risk, Gemini 3.5 Flash tops a finance agent benchmark, Grafana discloses a GitHub repository attack, Cursor ships Composer 2.5, and Armada AI closes a $230M Series B.

THE READ

Two stories from the same day, telling one story: the runway for shipping agents got longer, and the runway for trusting them stayed where it was.

Yesterday Google made Gemini the default layer across Search, Android, Workspace, and Flow at I/O 2026. The keynote ran long. The two pieces that matter most for operators were near the end. The first was Gemini Spark, a personal agent that runs on a dedicated cloud VM and keeps working after the user closes the laptop. The second was Universal Cart, a commerce layer for agents that ships on YouTube this summer with an Agent Payments Protocol underneath. Google also said Gemini App is at more than 900 million monthly users and that Gemini handles 3.2 quadrillion tokens per month across 8.5 million developers.

On the same day, METR released its Frontier Risk Report. METR tested internal agents from Anthropic, Google, Meta, and OpenAI on real engineering work. The main finding is plain. Agents do well when success is easy to check. They fall off fast when checking gets hard or unclear. METR's frame is that the bottleneck on shipping agents is now whether you can watch them, not how smart they are.

AI READY PRO · FREE UNTIL FRIDAY

I will get you AI trained in 30 days for free

After several months of running it quietly with top AI operators and teams, I’m excited to launch AI Ready Pro today to our newsletter subscribers like you.

It’s a 30-day personalized AI training program that is personalized to you and where you are in your AI journey. It’s the culmination of 100’s of hours spent teaching AI to top Fortune 500 AI teams and operators (and working with Mark Cuban).

Each day, you receive a 10-15 minute exercise to complete to improve your AI skills.

This isn’t vague AI theory or a way to pitch you a tool. It’s 30 days of learning to use AI in your real work. So you come out the other side with the skills to use AI to actually improve you and your teams output.

Until this Friday (May 15), it’s free for newsletter readers who complete the extended assessment. This assessment will help us personalize the learning experience to you.

If you lead a team, AI Ready Team is open today too, also for readers first. It’s the same engine, but to train your whole organization. You’ll get a detailed view of:

Your team’s AI readiness level + detailed strengths & weaknesses
Ranked list of AI automation opportunities
Get tactical advice on how to advance AI efforts in your org

Leading a team? Take the Team assessment instead.

Read together, the two stories are the shape of the next year. Google is betting on reach: agents in front of billions of users, on the surfaces those users already use. METR is betting on measurement: agents will get handed harder jobs, and someone has to know when the job is being done badly. Both can be right at the same time.

For a COO or chief of staff, the read is narrow this week. Spark and the agent payment rails are mostly consumer-facing today. The work versions, the Workspace tie-ins, and the shopping flows will reach a mid-market company on Google's timeline, not tomorrow. What I keep hearing from operators in our 280+ interviews is closer to METR's question than Google's. For the agent work your team has already shipped, who checks whether the work got done. If the answer is "the user, when they notice," that is the gap the next twelve months will be about.

Hire secure AI teammates that work 24/7.

Hire pre-built AI teammates. Give your engineers and operators a platform to ship their own AI apps. Stop losing sleep about what is running where.

Clutch is the platform behind both: pre-built agents for the workflows your ops team should automate first, plus the integration plane your team's vibe-coded apps and Claude Code projects plug into. One platform. Real production. Visible and safe by default.

Built for ops, engineering, and security teams that are tired of the shadow-AI surface area inside their own company.

ALSO WORTH KNOWING

Karpathy joined Anthropic. Andrej Karpathy, formerly of OpenAI and Tesla, announced he is joining Anthropic. The signal that cut through I/O: where one of the most public AI researchers chooses to work points to where the frontier-research center of gravity is moving.

METR Frontier Risk Report covers labs head-to-head. METR evaluated internal agents from Anthropic, Google, Meta, and OpenAI on real engineering work, with a focus on monitorability. First public report that compares labs on operational agent risk rather than capability alone.

Vals AI puts Gemini 3.5 Flash first on a finance agent benchmark. Vals' Finance Agent Benchmark ranked Gemini 3.5 Flash number one and the model placed third on the broader Vals Index. First public ranking since the I/O launch and a useful data point if you're scoring models for finance workflows.

Grafana disclosed a targeted GitHub repository attack. Grafana said an attacker gained unauthorized access to internal repositories and downloaded code. This is the second high-profile software supply chain incident this month and another reminder that the security surface around AI development now includes every system touching your code.

Cursor shipped Composer 2.5. Cursor released Composer 2.5, the next version of its coding agent aimed at longer, sustained work sessions. Useful context if your engineering org is evaluating coding agents this quarter.

Armada AI raised $230M Series B. Modular AI data center company Armada AI closed a $230M Series B. One more data point that capital is still flowing into infrastructure layers below the model layer, not just into model labs.

WATCHING TOMORROW

Google's I/O Day 2 sessions land on Wednesday and will fill in details on Spark for Workspace, the Agent Payments Protocol spec, and Antigravity 2.0 pricing. If a major lab responds to the Karpathy news, expect it before Friday.

Back tomorrow,
Haroon