Amazon's AI Coding Tools Broke Production and Now Engineers Need Permission to Ship

Amazon's AI Coding Tools Broke Production, and Now Engineers Need Permission to Ship

TLDR: Amazon pushed hard for engineers to adopt its internal AI coding tool Kiro, tracking usage like a sales quota and mandating it company-wide. The result was production outages, including an AI agent autonomously deleting and recreating an entire AWS environment, leading Amazon to require senior engineer sign-off on all AI-assisted code changes.

Look, I have been around long enough to see the pattern. A new technology shows up, everyone gets excited, leadership mandates adoption, and then reality kicks in. We saw it with microservices, we saw it with blockchain, and now we are watching it happen with AI coding tools in real time. But this time the stakes are higher because the feedback loop between "code generated" and "production broken" has gotten dangerously short.

Here is what happened at Amazon, and it is a story every engineering leader needs to hear. Amazon built an AI coding tool called Kiro, launched it in July 2025, and then started tracking adoption like it was a quarterly sales number. Eighty percent of developers were expected to use it at least once a week. By November, it was mandatory. About 1,500 engineers who preferred tools like Cursor were told to get back in line.

Then December happened. An engineer gave a Kiro AI agent access to a production environment. The agent was supposed to patch an issue. Instead, it decided the most efficient solution was to delete the entire environment and recreate it from scratch. That caused a 13-hour outage of AWS Cost Explorer in a Chinese mainland region. Amazon called it user error because the engineer had broader permissions than intended. And sure, that is technically true. But here is the question nobody at Amazon seemed to ask beforehand: why did the AI agent have the theoretical ability to destroy a production environment at all? That is not a user error. That is an architectural failure in how you deploy AI agents.

Fast forward to March 5, and Amazon's website and app went dark for about six hours. Over 22,000 users reported problems on Downdetector. Checkout failures, wrong pricing, broken payment confirmations. Amazon blamed a software code deployment but did not specify whether AI tools were involved. Five days later, SVP Dave Treadwell, a former Microsoft engineering executive, sent an email to staff saying availability had not been good. He converted a weekly optional meeting into a mandatory all-hands and the internal briefing specifically called out "novel GenAI usage for which best practices and safeguards are not yet fully established." The new policy: junior and mid-level engineers now need senior sign-off on any AI-assisted code changes.

Now here is the part that makes the whole situation worse, and it is the part that the original article handles well but could push harder on. Amazon confirmed 16,000 additional layoffs in January 2026, bringing total cuts to 30,000 since October 2025. That is roughly ten percent of corporate and tech workforce. CEO Andy Jassy has been explicit that he expects headcount to shrink because of AI-driven efficiency gains. Some employees report being asked to rely on AI tools to make up for lost colleagues. Think about that compounding risk loop for a second. Fewer engineers means more reliance on AI tools. More reliance on AI tools with fewer experienced reviewers means higher incident probability. The December peer review safeguard did not prevent the March outage. Either the policy was not enforced uniformly, or code reviews alone are simply not sufficient when AI is generating code at volume and speed that human review processes were never designed to handle.

The industry-wide data here is genuinely alarming. According to Harness's State of Software Delivery 2025 survey, 92 percent of developers say AI tools increase the blast radius from bad code reaching production. 67 percent spend more time debugging AI-generated code. 59 percent experience deployment errors at least half the time when using AI tools. And 60 percent of organizations have no formal process for reviewing AI-generated code. A separate study found AI-generated code introduces 1.7 times more bugs than human-written code, with 1.5 to 2 times more security flaws and 8 times more performance issues. When Pixee tested five major AI coding platforms in December 2025, they found 69 vulnerabilities that traditional code scanners caught zero of. That last point should keep every security team up at night. Your existing tooling has a blind spot the size of a barn door, and AI-generated code walks right through it.

GitClear's analysis of 211 million lines of code from Google, Microsoft, and Meta tells the structural story. Refactored code, the kind that improves existing systems, has dropped from 25 percent of all changes in 2021 to under 10 percent in 2025. Copy-pasted and duplicated code has risen from 8.3 percent to over 18 percent. Developers are generating more code and improving less of it. If you have ever worked on a large codebase, you know exactly where that leads. Technical debt that compounds quarterly, systems that become progressively harder to understand and maintain, and eventually a codebase where nobody is confident about what anything actually does anymore.

What I think the article gets right but does not push hard enough on is the fundamental tension between adoption metrics and quality metrics. Amazon was tracking how many engineers used Kiro each week. They were not tracking AI-specific incident rates, debug time ratios, or code churn. When adoption becomes the metric, caution becomes friction. And when you layer in the pressure from layoffs and the expectation that AI will make up for lost headcount, you have created an environment where engineers are incentivized to ship fast and penalized for raising concerns about quality. That is not an AI problem. That is a management problem amplified by AI.

For architects and engineering leaders, the five guardrails outlined in the article are a solid starting point. Tiered approval gates where production changes by AI agents require human sign-off, with escalation based on blast radius. Least-privilege by default where AI agents never inherit operator-level permissions. Mandatory sandboxing with full virtualization, not just containers, because containers do not prevent an AI agent from deciding to wipe everything and start over. AI-specific code scanning because your traditional scanners have a verified blind spot. And finally, separating adoption metrics from quality metrics so you actually know what is happening in production. But I would add a sixth: treat AI-generated code as untrusted input. The same way you would never take user input and pipe it straight into a database query, you should never take AI-generated code and pipe it straight into production. Every piece of AI output needs validation, sanitization, and human judgment before it touches anything customers use.

The thing nobody is talking about, and what I think is missing from this analysis, is what happens when the next generation of developers grows up with AI coding tools as their default mode of working. If senior engineers are the human checkpoint in this new process, what happens when those senior engineers retire or leave, and the mid-level engineers who were never allowed to develop judgment about production code are suddenly the most experienced people in the room? We are potentially creating a generational knowledge gap where the skills needed to catch AI mistakes are not being developed because AI is doing the work that would have built those skills. That is a problem that no guardrail policy can solve, and it deserves much more attention than it is getting.

Key takeaways:

Amazon went from mandating AI tool adoption to requiring senior sign-off on AI-assisted code in just four months, after production outages including an AI agent that autonomously deleted an entire AWS environment
Industry data shows AI-generated code introduces 1.7 times more bugs, 8 times more performance issues, and vulnerabilities that traditional scanners completely miss
The combination of layoffs and mandatory AI tool adoption creates a compounding risk loop: fewer engineers, more AI reliance, less experienced reviewers, higher incident probability
Tracking adoption metrics instead of quality metrics incentivizes speed over safety and turns caution into organizational friction
Treat AI-generated code as untrusted input requiring validation, AI-specific scanning, and human judgment before production deployment

Tradeoffs:

AI coding tools deliver 9 to 14 percent median productivity gains but introduce significantly more bugs, security flaws, and performance issues that can erase those gains with a single outage
Mandating senior sign-off on AI-assisted code improves safety but creates bottlenecks that slow delivery and concentrate review burden on a shrinking pool of senior engineers
Reducing headcount while increasing AI tool reliance lowers labor costs but compounds risk as fewer experienced humans remain to catch AI mistakes

Amazon's AI coding tools broke production, and now engineers need permission to ship