
The week AI stopped being a product
GPT-5.5 set a new bar, the coding wars got ugly, and Sullivan & Cromwell showed operators exactly how AI velocity breaks.
THE WEEK IN AI
THE WEEK IN ONE SENTENCE
GPT-5.5 reset the model leaderboard on Thursday, but the more important story is that every layer underneath it, the cloud contracts, the coding tools, the legal filings, started behaving like critical infrastructure, with all the acquisition drama and liability exposure that entails.
THREE SIGNALS
01 • Models
GPT-5.5 landed with numbers that force another roadmap review
OpenAI shipped GPT-5.5 on Thursday, along with a GPT-5.5 Pro tier for Business and Enterprise plans. The headline benchmarks: 82.7% on Terminal-Bench 2.0 (Claude Opus 4.7: 69.4%, Gemini 3.1 Pro: 68.5%), 73.1% on OpenAI's internal Expert-SWE, and 51.7% on FrontierMath Tiers 1-3. OpenAI also reports that the model uses significantly fewer tokens to complete the same Codex tasks as GPT-5.4. API availability is pending ("very soon," per the launch post); ChatGPT and Codex rollouts started Thursday.
The "pick a model and standardize" plan most companies put in place in the first half of this year was built on GPT-5.4 / Opus 4.7 / Gemini 3 Pro pricing and performance. Those assumptions are now stale by a full leaderboard position. If you run a production workflow pinned to a specific model, the question worth asking your vendor this week is whether your contract lets you opt in to the new tier and on what timeline. Most enterprise contracts I've seen lock you to a specific model generation for twelve months, which means a better model on Thursday turns into a Q3 conversation, not a this-week one.
02 • Distribution
The coding wars got ugly, and acquisitions are how
Three separate stories landed this week that, taken together, describe a land grab rather than a product category:
SpaceX signed a deal giving it the right to acquire Cursor for $60 billion by year-end, or pay $10 billion if it walks. (CNBC)
Anthropic pulled Claude from Windsurf after Windsurf's acquisition talks with OpenAI, a move widely read as signaling Anthropic will do the same to Cursor if SpaceX follows through.
OpenAI launched Workspace Agents in ChatGPT, bringing Codex-powered agents to Business, Enterprise, and Edu plans in production, not waitlist.
This is the pattern: every major lab wants to own the interface where developers write code, and they're willing to cut off competitors' models from competitors' tools to get there. If your engineering org standardized on Cursor six months ago because it was the best tool, your standardization now has an ownership question attached.
The move this week: ask your engineering lead which models your coding tool actually has access to today, and whether any of those access relationships are one-pulled-plug away from breaking. If the answer is "I'll check," that's the whole point.
03 • Velocity & Security
The velocity dividend showed up, so did the interest payments
Sundar Pichai confirmed at Cloud Next '26 that 75% of new code at Google is AI-generated, up from 50% last fall. Intercom, a mid-market SaaS company, told Lenny Rachitsky's podcast that it doubled engineering throughput in nine months using Claude Code. That's the dividend.
The interest payments arrived in the same week. Security researcher @polsia disclosed CVE-2025-48757, a vulnerability in Lovable's AI code generator that had propagated across 170+ apps on the platform. Anthropic's Mythos was accessed through a breached third-party vendor. Vercel confirmed a breach via an OAuth token from an AI tool that one employee connected to. Lovable had a broken-authorization flaw exposing user credentials.
The companies shipping 2x faster aren't automatically shipping 2x better. The teams that win this cycle treat AI-generated code with the same security gates as human-written code, and audit every AI integration with OAuth scope into their stack. If your security team can't produce that inventory inside a week, you already have the answer.
UNDER THE RADAR
A top law firm filed an AI-hallucinated case law
Sullivan & Cromwell, one of the top-tier white-shoe law firms on the planet, filed a motion in federal court this week containing case citations that did not exist. The citations came from an AI tool the firm uses for legal research, and no attorney verified them before filing.
The hallucination failure mode has been public knowledge since Mata v. Avianca in 2023. What's new is the institution. If a 140-year-old firm with partner rates north of $2,000 an hour can file fabricated case law, the question for any operator with AI in a compliance-adjacent workflow is not "could this happen to us" but "what is our check between AI output and external delivery, and has anyone audited whether it actually runs?" For most companies the answer is some version of "the analyst reads it," and for most analysts reading AI output critically enough to catch a confident hallucination isn't yet a trained skill.
Two moves worth putting on a Monday list. First, identify which of your workflows currently depend on no one double-checking: expense memos, contract redlines, investment summaries, outbound research are common examples. Second, decide whether the person doing the check has the domain knowledge to catch a plausible-sounding error, and if they don't, either add that person into the workflow or kill the workflow. Velocity without review is the new version of a problem every compliance function has managed before, and the governance question that answers it isn't new even if the tools are.
Quote of the week
❝“AI written at Google has climbed from 50% last fall to 75% today, all of reviewed by human engineers”
Sundar Pichai, Google Cloud Next '26 keynote (source)The quote is useful because it sets a public benchmark. If the largest technology company in the world has moved three-quarters of its code authorship to AI inside eighteen months, "our engineers aren't ready yet" stops being a defensible answer in a board meeting.
WHAT’S ON THE CALENDAR
GPT-5.5 API availability. OpenAI says "very soon." If it lands before mid-next week, expect vendor pricing sheets to move fast.
Anthropic Mythos post-mortem. Anthropic is committed to publishing a root-cause write-up on the breach. That document will set the template for how Frontier Labs discloses supply-chain failures.
ICLR 2026 paper awards wrap. Rio conference wraps Saturday. Worth a scan for the outstanding papers on energy-based transformers and image-generator-as-generalist-vision-learner, both of which have eighteen-month product implications.
Alphabet earnings. Alphabet reports next Tuesday after the close. The read-through will be how much of the Cloud Next '26 momentum translates into actual cloud revenue, and whether the Gemini Enterprise Agent Platform ramp shows up in guidance.
See you Tuesday with the Ready Memo,
Haroon