Build, Configure, or Use As-Is: How Agentic Harnesses Are Becoming the New Commodity
Published on 09.06.2026
Build, Configure, or Use As-Is: How Agentic Harnesses Are Becoming the New Commodity
TLDR: Agentic harnesses are converging on a shared blueprint of five layers, with about 80% of the stack commoditized across tools like Claude Code, Codex, and OpenCode. The real question is no longer how to build an agent but rather which specific layer is worth owning yourself. The answer, according to this analysis, is your context layer.
The framing here is genuinely useful and I want to spend time on it before diving into the architecture. Maxime Labonne and the author were putting together a book and kept noticing the same pattern: just as LLMs became interchangeable commodities, the harnesses wrapping them are following the same trajectory. They're hardening into a handful of standard frameworks with recognizable structure. Once something is a commodity, the strategic question shifts. You stop asking "how do I build this" and start asking "for each piece, do I build, configure, or just use it?" Overbuild and you burn weeks reimplementing a tool loop that already exists for free. Under-build and you end up relying on defaults forever, never constructing the one layer that's distinctly yours.
The five-layer blueprint that emerges across Claude Code, OpenCode, Codex, and pi is worth understanding concretely. The agent itself is the innermost piece: a ReAct loop where the model reasons then acts, wrapped around an LLM and its tools. Strip away compaction, task budgets, and thinking modes and the core is roughly 150 lines of code. Around that sits the harness proper, which includes message queues with priority gating, sandboxes, hooks, LLM gateway services, memory, LSP servers, MCP clients, skills, a permission system, an agents catalog, and subagent management. Below that is the runtime layer for durable execution (tools like Prefect, Temporal, or Kitaru), which gives you non-blocking human-in-the-loop interactions, scheduling, and caching. The presentation layer handles how the agent surfaces to users, either through a pub/sub bus where a headless server streams events to clients over HTTP and SSE (the OpenCode approach) or through a custom services bridge into a single in-process loop (the Claude Code approach). Finally, observability rounds it out with tracing, logging, metrics, and evals via tools like Opik, Langfuse, or Braintrust.
The tools component deserves specific attention because it illustrates how far commoditization has gone. Claude Code ships around 40 built-in tools across 10 families covering file I/O, bash execution, orchestration, task management, web access, MCP, and scheduling. The agents catalog similarly comes predefined: build agents, plan agents, general-purpose subagents, explore agents that run on cheaper models with read-only search, and code reviewer subagents. Permission scopes are narrowing-only, meaning a child agent can never out-permission the parent that spawned it. Skills are markdown recipes with an instructions block and an allowed-tool set, loaded on demand from bundled sources, user-defined files, or MCP prompts. They're capped at roughly 1% of the context window via progressive disclosure. Writing a skill is described here as the cheapest way to teach the harness a new workflow, and that rings true to me. The skill mechanism is genuinely accessible.
The permission layer analysis is where I think the author earns some credibility through honesty. There is no AI in the permission system. Per tool call, it allows, asks the user, or denies. Plan mode is enforced prompt-side via a system reminder. The author calls this outright: it's a fragile mechanism that hopes the model follows the instruction. That's a refreshingly direct assessment of something that gets treated with more confidence than it deserves in most agentic system discussions. The sandbox reframing is also worth noting: the author suggests thinking about sandboxes not just as security boundaries but as distributed workers. Each sandbox is a worker running jobs in parallel, and a single harness can manage many of them. That mental model changes how you architect for scale.
The memory architecture is where the build-vs-configure-vs-use question gets its sharpest answer. The current approach splits memory into user-defined markdown files (AGENTS.md always loaded, per-directory AGENTS.md files loaded contextually) and LLM-extracted files with an index capped at around 200 lines and 25KB. A small-model side-query ranks topic files by frontmatter description. The defaults get you started, and AGENTS.md is worth configuring. But the author's conclusion is that the highest-leverage investment is a custom memory layer behind an MCP server, one that is harness-independent and fully yours. That's the real moat. The 80% shared blueprint is what you use as-is. The context layer is where differentiation actually lives.
Key takeaways:
- The agentic harness stack is converging on a five-layer blueprint shared across major tools, with the core agent loop being roughly 150 lines once you strip away extras.
- The permission system is prompt-enforced and explicitly fragile; harnesses trust the model to comply rather than enforcing constraints mechanically.
- Your context layer, specifically a custom memory system behind an MCP server that works across harnesses, is the one piece worth building rather than configuring or accepting as a default.
Why do I care: As a senior frontend developer or architect, the build-vs-configure-vs-use framework maps cleanly onto decisions you already make every day with component libraries, state management, and backend APIs. The observation that matters most here is about the presentation layer: the pub/sub bus pattern where a headless server streams events to clients over HTTP and SSE is directly relevant to frontend architecture. If you're integrating an agentic backend into a web application, understanding whether your harness uses that model or the in-process loop model affects how you design your client, how you handle streaming state, and how you think about resilience. The memory-as-MCP-server conclusion also has frontend implications: if the context layer is the moat and it's harness-independent by design, the frontend that exposes and edits that context layer becomes a meaningful surface to own.