Daily Brief

Mercedes cut 8 months of work to 8 days

Coding agents stopped being a developer experiment yesterday. Plus Codex on Windows, a Claude Code console, and a quiet supply-chain attack on npm.

By Haroon Choudery·May 14, 2026·9 min read

THE AI BRIEF

Today's signal: Coding agents stopped being a developer experiment. Plus Codex on Windows, a new Claude console, and an npm attack.

In today’s issue:

Main story: Mercedes cut 8 months of work to 8 days
Also worth knowing: Codex on Windows, Claude Code multi-session console, npm supply chain attack, and OpenAI's Deployment Company is already acquiring.
Free webinar: Build Notion Agents to Automate Complex Tasks, a 30-minute Lightning Lesson, later today. Save your seat →

THE READ

Mercedes-Benz cut a COBOL modernization from eight months to eight days. The coding agent market is no longer a developer story.

Three product launches landed in one news cycle and the numbers behind them point to the same thing: enterprise engineering work is moving from human-paced to agent-paced faster than most procurement teams modeled.

Cognition AI published a customer story showing what happens when an autonomous coding agent gets pointed at legacy code at scale. In an initial four-week pilot, Devin analyzed more than 200,000 lines of COBOL inside Mercedes-Benz and cut the modernization timeline from an estimated eight months to eight days. Mercedes has now deployed Devin, Devin for Terminal, and the Windsurf IDE across engineering teams in the US, Europe, and Asia. The customer roster around Cognition reportedly now includes Goldman Sachs and the US Army, alongside what trade press is putting at $445 million in annualized revenue, with usage doubling every eight weeks.

Two other product moves landed in the same window. Anthropic shipped Agent View for Claude Code, a single terminal screen that lets a developer dispatch and supervise more than ten parallel coding sessions at once, grouped by whether they need input, are running, or are done. OpenAI published its Codex Windows sandbox work, removing the last operating-system gap in its coding agent and letting Codex run on Windows the same way it already runs on Mac and Linux. The pattern is the same in all three: less time approving each command, more parallel work running unattended, more of the developer's day spent reviewing what the agent did rather than typing.

AI READY PRO · FREE UNTIL FRIDAY

I will get you AI trained in 30 days for free

After several months of running it quietly with top AI operators and teams, I’m excited to launch AI Ready Pro today to our newsletter subscribers like you.

It’s a 30-day personalized AI training program that is personalized to you and where you are in your AI journey. It’s the culmination of 100’s of hours spent teaching AI to top Fortune 500 AI teams and operators (and working with Mark Cuban).

Each day, you receive a 10-15 minute exercise to complete to improve your AI skills.

This isn’t vague AI theory or a way to pitch you a tool. It’s 30 days of learning to use AI in your real work. So you come out the other side with the skills to use AI to actually improve you and your teams output.

Until this Friday (May 15), it’s free for newsletter readers who complete the extended assessment. This assessment will help us personalize the learning experience to you.

If you lead a team, AI Ready Team is open today too, also for readers first. It’s the same engine, but to train your whole organization. You’ll get a detailed view of:

Your team’s AI readiness level + detailed strengths & weaknesses
Ranked list of AI automation opportunities
Get tactical advice on how to advance AI efforts in your org

Leading a team? Take the Team assessment instead.

For an operator outside engineering, the part that matters is the procurement question coming next. The standard enterprise software conversation is seats: how many engineers, how much per seat, how do we govern access. Coding agents do not price cleanly on seats, because one engineer can have ten of them running. They price closer to consumption. A budget your CFO modeled at $1,000 per engineer per month can land closer to $5,000 once teams figure out how to keep three or four agents busy in parallel. The Mercedes deployment is a real number to negotiate against. The eight-month-to-eight-day figure tells procurement what the work is now worth, not just what the tool costs.

I want to be honest about the bear case. Most companies will not see Mercedes-level outcomes. Devin against legacy COBOL is close to a best case for autonomous coding: well-bounded, well-tested, with a clear correctness signal. A lot of the engineering work inside a 50-to-500-person company is the opposite of that. Greenfield product code, undocumented systems, decisions that need a hallway conversation. What I keep hearing from engineering leaders deploying these tools is that the productivity gain is real and the gain is uneven. Plan a contract that lets you scale up where it works and stop paying where it does not, rather than buying a flat seat count for everyone.

Hire secure AI teammates that work 24/7.

Hire pre-built AI teammates. Give your engineers and operators a platform to ship their own AI apps. Stop losing sleep about what is running where.

Clutch is the platform behind both: pre-built agents for the workflows your ops team should automate first, plus the integration plane your team's vibe-coded apps and Claude Code projects plug into. One platform. Real production. Visible and safe by default.

Built for ops, engineering, and security teams that are tired of the shadow-AI surface area inside their own company.

ALSO WORTH KNOWING

OpenAI shipped a sandbox that lets Codex run on Windows.
The engineering post details a custom isolation layer that took the place of AppContainer and Mandatory Integrity Control because neither met the bar. Windows was the last major developer environment where Codex still asked users to choose between approving every command and giving full access.

Anthropic published the docs for Claude Code Agent View.
The feature is a terminal-native console for running more than ten Claude Code sessions in parallel, sorted by whether each needs input or is still running. The console is the missing UX layer for multi-agent development at scale.

Cognition is running the same playbook inside HIL and SIL workflows.
A second customer post out this week details deployments at RV Tech and Mercedes for hardware-in-the-loop and software-in-the-loop testing, where Devin automates failure triage and test generation against rising ticket volumes. Same agent, different bottleneck inside the same enterprise stack.

Socket flagged a fresh wave of compromised npm packages, this time hitting TanStack.
Researchers detected 84 modified package artifacts carrying CI credential-stealing malware tied to the ongoing Mini Shai-Hulud attack. The blast radius matters for any team running automated builds against the npm registry, which now includes most teams running coding agents.

OpenAI's enterprise services arm started buying.
The new OpenAI Deployment Company acquired UK consultancy Tomoro for 150 deployment specialists and named TPG, Goldman Sachs, and McKinsey as launch partners. Three days in, the move already looks less like a partnership announcement and more like a competitor to the big system integrators.

WATCHING TOMORROW

Friday is the Week in AI roundup. The pattern threads to watch: how the coding agent revenue claims hold up under more public scrutiny, whether the Mercedes COBOL number gets corroborated in any earnings commentary, and where the Microsoft-OpenAI distribution split lands in any analyst notes from Build keynote follow-on coverage.

Back tomorrow,
Haroon