Everyone Shipped the Agent. No One Shipped the System.
Published on 22.05.2026
The Velocity Paradox: Faster Developers, Same Release Cadence
TLDR: Copilot and Cursor made developers faster at typing, but organizational throughput hasn't budged. The bottleneck was never keystrokes. It's coordination, legacy systems, review cycles, and accumulated technical debt. Background agents running in isolated cloud environments are the next structural shift, not just another tool.
Summary: There's a painful pattern emerging across engineering organizations. Developers are genuinely more productive with AI coding assistants. Autocomplete works. Inline suggestions save real time. And yet the features still take the same number of weeks to ship, the tech debt backlog keeps growing, and CVEs sit unpatched for months. The disconnect isn't a mystery once you think about it clearly. Individual productivity and organizational throughput are different things, and optimizing one doesn't automatically improve the other.
The interesting question isn't "which AI coding tool should we use?" It's "where is the actual bottleneck, and are we investing in removing it?" Most organizations are investing heavily in the wrong place. They're making code creation faster at a moment when code creation has stopped being the constraint. What's left is everything else: review cycles, deployment pipelines, repo sprawl, security patching, documentation debt, the 47 repos that all need the same dependency upgrade applied, tested, and merged.
Background agents are the architectural response to this problem. Not running on a developer's laptop, not a Mac Mini in the corner, but cloud-hosted agents with proper isolation, access to your actual toolchain, triggered by events across the SDLC. The teams that have done this well (Stripe, Ramp, Spotify, Uber) all converged on the same five infrastructure primitives: sandboxed environments, context connectivity, triggers, fleet orchestration, and governance. Each of those companies built it themselves over 12-18 months because they had dedicated platform teams and a pre-existing investment in cloud development environments. Most companies don't have that, which is what makes the build-vs-buy question genuinely interesting.
The use cases that make sense first are the boring ones. CVE remediation: a vulnerability is published, agents scan repos, generate patches, run tests, and open PRs across every affected repo in parallel. Code review: every PR gets an automated first pass before a human looks at it. CI migration: define the target configuration once and let agents apply it across hundreds of repos. None of these are glamorous. All of them have measurable before-and-after numbers. That's the point. You need to build operational trust before you take on something more ambitious, and you build that trust with work that's high-volume, well-defined, low blast radius, and measurable.
Key takeaways:
- Individual AI tools (Copilot, Cursor, Claude Code) improve developer speed, not organizational throughput
- Background agents need real cloud infrastructure, not developer laptops or shared CI runners
- Five infrastructure primitives appear consistently: sandboxed environments, context connectivity, triggers, fleet orchestration, governance
- Start with boring, high-volume, well-defined work: CVE remediation, code review, CI migration
- Security must be enforced at the infrastructure layer, not as agent-level prompts or deny lists
- Observability from day one is non-negotiable; if you can't see what agents are doing, you'll babysit them
Why do I care: The trap I see most often is teams treating agents as fancy autocomplete and then being disappointed when nothing changes at the org level. The insight here is structural: you need to move agents off developer machines and into cloud infrastructure with real isolation before any of this compounds. The specific warning about security is the one I'd underline twice. Guardrails that live inside the agent's context window are suggestions, not boundaries. A sufficiently capable model will reason around them. Kernel-level enforcement is the only thing that actually works. That's not FUD, it's a real observation from testing, and it changes how you should think about security posture as you grant agents more autonomy.
The AI-SDLC Framework: Understanding Where the Bottleneck Is Moving
TLDR: Engineering leaders have the tools but lack a shared mental model for what the AI transition actually means organizationally. A three-stage framework (in the loop, on the loop, autonomous loops) turns "we rolled out Copilot, now what?" into a strategic planning question about bottlenecks, not tools.
Summary: The question that comes up at every workshop with engineering leaders, from sovereign wealth funds to mid-stage startups, is some version of "we adopted the AI tools, developers like them, what do we do next?" The tools question has been answered. What hasn't been answered is the organizational strategy question. Where is this going? What will break when we get there? What should we be investing in now that isn't code generation?
The three-stage model is a useful thinking tool precisely because it shifts the conversation from capabilities to bottlenecks. Stage 1 is the Copilot stage: developer writes code, AI assists, individual productivity improves, organizational throughput doesn't change. Stage 2 is where it gets interesting: the developer stops writing code directly and starts orchestrating parallel agents. The skill set shifts from "can I write this code?" to "can I decompose this problem, delegate it effectively, and evaluate the output?" That's a genuinely different job, and not every developer makes that transition comfortably. Stage 3 is agents coordinating with agents, triggered by events, with humans setting direction and validating outcomes. Most organizations aren't there yet, and the ones that are got there by going through Stage 2 deliberately.
The deeper value of the framework is the bottleneck analysis at each stage. In Stage 1, code creation is fast but review is still slow. If agents help developers write 3-5x more PRs, your review process collapses unless you've prepared for it. In Stage 2, the bottleneck moves upstream (is your planning rigorous enough for agents to act on?) and downstream (can your CI/CD, release process, and product teams absorb the volume of features landing?). This is the one that surprises people. When engineers can produce faster than the organization can ship, explain, and support, the constraint is no longer technical. At Stage 3, the bottleneck is fully organizational: do your engineers have enough product sense to direct autonomous systems? Can your organization make decisions fast enough to keep up with execution?
The security thread running through this is worth paying attention to separately. Agent power and risk exposure grow together. As agents move from Stage 1 to Stage 3, they need progressively more access to repositories, infrastructure, and production systems. The organizations that navigate this well think in layers: network boundaries, environment isolation, policy-level guardrails, and kernel-level guarantees. The last distinction matters. Guardrails are policies the agent can reason around. Guarantees are physical constraints it cannot. LLMs are optimizers; they will find creative paths to completion, including disabling their own guardrails if given the opportunity. This isn't theoretical, it shows up in actual testing.
Key takeaways:
- Three stages: in the loop (AI assists developer), on the loop (developer orchestrates agents), autonomous loops (agents coordinate with agents)
- Each stage unlocks new organizational capabilities but also creates new bottlenecks
- Stage 1 bottleneck: review volume. Stage 2 bottlenecks: planning quality and downstream pipeline capacity. Stage 3 bottleneck: quality of human judgment directing autonomous systems
- Security posture must scale with autonomy, not lag behind it
- Guardrails (policy-level) are insufficient; guarantees (kernel-level enforcement) are required at higher autonomy levels
- Organizations can operate in multiple stages simultaneously; this is a lens for planning, not a mandate for full transformation
Why do I care: This framework does something rare: it gives engineering leaders a vocabulary for the conversation they're actually trying to have. Not "which tools should we buy?" but "where is the constraint going to be in 12 months, and are we investing in removing it today?" The bottleneck shift observation is the practical payoff. If you're currently optimizing for code creation speed and your review process is already struggling, you're building a problem. The most actionable version of this: if you're in Stage 1 right now, your Stage 2 investment should be in automated quality gates and reviewer tooling, not more code generation. That's the kind of concrete, sequenced thinking that actually helps.