Long-Running Agents: The Engineering Reality Behind AI That Works for Days

Published on 30.04.2026

AI & AGENTS

Long-Running Agents: What Changes When AI Works for Days, Not Minutes

TLDR: The familiar single-session AI agent paradigm has a hard ceiling. Long-running agents that span multiple context windows, sandboxes, and sessions require solving three distinct engineering problems: finite context, lack of persistent state, and the model's inability to honestly grade its own work.

Long-running Agents


Anthropic's Harness Approach: Initializers, Coders, and the Brain/Hands Split

TLDR: Anthropic published two engineering posts that are worth reading end to end. The first describes a two-agent harness for autonomous full-stack development. The second introduces a brain/hands/session architectural separation that cuts time-to-first-token by 60 percent at median and over 90 percent at the 95th percentile.

Long-running Agents


Cursor's Planner/Worker/Judge Pattern for Autonomous Coding at Scale

TLDR: Cursor's approach to long-running autonomous coding introduces a three-role pipeline: a planner that defines the work, workers that execute in parallel, and a judge that evaluates completion. The separation of generation from evaluation is what makes the whole system honest.

Long-running Agents


Google's Agent Platform: The Brain/Hands Split at Infrastructure Scale

TLDR: Google's Vertex AI Agent Platform productizes the same patterns Anthropic and Cursor describe, adding a persistent Memory Bank layer, Agent Identity, Agent Registry, and a full observability stack. The new risk it introduces is memory drift, where agents learn procedural shortcuts from atypical interactions and apply them too broadly.

Long-running Agents


What to Actually Do This Week: A Practical Decision Tree

TLDR: The author distills the entire landscape into a clear decision tree: if you're extending your IDE workflow, start with Claude Code and commit progress often. If you're building a hosted agent product, pick a managed runtime. If you're doing autonomous operational work, Memory Bank-style persistence is what you need.

Long-running Agents