AI Code Black Boxes, the Art of Metaphor, and Composable Frontend Architecture
Published on 20.04.2026
The Black Box Problem: Why AI-Generated Code Stops Being Maintainable
TLDR: AI coding tools produce code that works initially but becomes increasingly difficult to maintain as structural problems accumulate. The root cause is that AI generates code without real-time architectural feedback, resulting in monolithic outputs with hidden dependencies and missing contracts.
Summary: There is a pattern forming across engineering teams, and it is worth paying close attention to. The first month after adopting AI coding tools is almost always euphoric. Velocity doubles, features ship, stakeholders applaud. Then, by month three, a different number starts climbing: the time it takes to safely change anything. The code keeps getting technically better with each model improvement, but the teams generating the most code are increasingly the ones requesting the most rewrites. That contradiction should bother you.
The core problem is structural, not qualitative. When a developer opens a module generated in a single AI session, they often discover that the only thing which understood the relationships inside that code was the context window that produced it. Function signatures do not document their assumptions. Three services call each other in a specific order, but the reason for that ordering lives nowhere in the codebase. The author calls this the black box problem, and it is a fair name.
AI-generated code has predictable failure modes. It tends toward monoliths because AI has a bias toward the fast path. Ask for a checkout page, and you get cart rendering, payment processing, form validation, and API calls bundled into one file. That file works, but you cannot touch any part of it without touching all of it. Worse, AI wires things together based on what appeared in the same context window, producing circular dependencies that nobody declared and nobody knows about until something breaks.
The structural difference between maintainable and unmaintainable AI output is composability. The author compares two implementations of a notification system: one sprawling 600-line module where everything is coupled, and one decomposed into independent subdirectories each with declared interfaces. Both produce identical runtime behavior. The difference is entirely in structure, and that structural difference determines whether the system is still workable six months later. This is not a new idea. Component-based architecture, microfrontends, plugin systems all express some version of this. What is new is the scale at which AI generates code faster than anyone can manually structure it.
The practical advice here is concrete. Before prompting an AI, define what the component is responsible for, what it depends on, and what its public interface is. Treat every generation as a boundary decision. For code already generated, audit for implicit coupling, mixed responsibilities, and modules that cannot be tested in isolation. The author also argues that generation environments need to provide structural feedback in real time, the same way type errors and linting signals constrain human developers. Without that feedback loop, even the best models produce architecturally incoherent output.
I want to push back on one thing here. The author's company, Bit Cloud, is positioned as the solution to exactly this problem, and Towards Data Science's parent company is an investor in Bit Cloud. That disclosure is at the bottom of the article, which is not ideal. The diagnosis is accurate and well-argued. The prescription should be read with that context in mind. The broader point stands on its own merits regardless of who sells the tooling.
Key takeaways:
- AI-generated code is structurally risky by default, not because the code is bad but because it lacks architectural intent
- Monolithic outputs, implicit circular dependencies, missing contracts, and implementation-only documentation are the four main failure modes
- Composability must be a constraint during generation, not something added in review afterward
- Prompt with architectural intent: define boundaries, dependencies, and public interfaces before generating
- Evaluate AI tools by what happens after generation, not just how fast they produce code
Why do I care: This is the most honest diagnosis of the AI coding productivity trap I have read. The velocity numbers are real. The debt is also real. The teams I see struggling most with AI tooling are the ones who measured success by lines generated rather than by whether they could change those lines three months later. Structure is not bureaucracy. It is what makes change possible without archaeology. If you are shipping AI-generated code into production today without auditing for implicit coupling, you are borrowing against future sprints in ways that will not be visible until something genuinely urgent needs changing.
The Black Box Problem: Why AI-Generated Code Stops Being Maintainable
The Mighty Metaphor
TLDR: Metaphors are not decorative language for architects, they are the primary mechanism for translating technical constraints into business reasoning. Choosing the right metaphor determines whether your audience grasps trade-offs or walks away confused.
Summary: Gregor Hohpe makes a case that metaphors are essential tools for architects, specifically for the moment when you need to communicate something real and technical to people whose job is not to understand the technical details. The goal is not to simplify until things become untrue. The goal is to find a frame that lets the audience reason about trade-offs without needing the vocabulary of distributed systems.
Good metaphors do something specific. They transfer structure from a familiar domain to an unfamiliar one. When that structural mapping is accurate, the audience can make correct inferences on their own without you having to spell out every consequence. When the mapping is wrong or loose, the audience draws incorrect conclusions and you have made things worse. Hohpe's point is that the architect's job is to find metaphors where the structure genuinely maps, not just ones that sound clever in a slide deck.
The article also addresses something that often goes unspoken: metaphors enable two-way conversations. A well-chosen metaphor transforms a presentation into a collaborative thinking session. When a stakeholder understands the frame, they can push back, ask questions, and surface constraints you had not considered. A one-way explanation produces nodding heads and no genuine alignment. The metaphor is not there to make you look good, it is there to create shared understanding that survives the meeting.
There is a warning embedded here worth taking seriously. Overextended metaphors become misleading. Every analogy breaks down somewhere, and the architect needs to know exactly where their metaphor stops mapping to reality. The path to the mountain of gold leads through the swamp is the kind of metaphor that communicates something true about risk and reward in architecture without requiring any technical vocabulary. But you have to own where the analogy stops.
Key takeaways:
- A good metaphor transfers structural understanding, not just surface-level familiarity
- Metaphors that enable audience inference are more valuable than ones that just sound good
- Know where your metaphor breaks down before you use it in a high-stakes conversation
- The goal is collaborative reasoning, not one-way explanation
- Precision matters more than elegance when the trade-offs are real
Why do I care: Most technical communication fails not because the content is wrong but because the frame is missing. I have sat through countless architecture presentations where every slide was technically accurate and nobody in the room changed how they thought about the problem. The metaphor work is not soft skills decoration, it is the actual mechanism by which decisions get made at the level where budget and timeline live. If you are an architect who thinks communication is someone else's job, you are leaving your own decisions vulnerable to misinterpretation by people who write the checks.
"Clean Code": a Timeless Truth or a Myth We Keep Telling Ourselves?
TLDR: Kevin Henney and Daniel Terhorst-North revisit whether "clean code" is a durable principle or a set of dogmatic rules that have calcified past their usefulness. The answer depends heavily on context, and the conversation traces the term's evolution from 1970s programming texts to Robert C. Martin's 2008 book.
Summary: This is a 33-minute video discussion, so the format rewards patience, but the core argument is worth engaging with directly. Henney and Terhorst-North are not dismissing the idea of clean code. They are questioning whether the specific rules that travel under that label, the ones from the book, the ones repeated in code reviews and job postings, are actually principled guidelines or accumulated dogma that has been separated from the contexts that made them sensible.
The historical framing is useful. The concern for readable, maintainable code goes back to the 1970s, well before Uncle Bob's formulation became canonical. The 2008 book codified a particular set of practices in a way that made them transferable, which is also what made them brittle. Rules travel better than reasoning. When someone applies the single responsibility principle or insists on method length limits without understanding why those rules exist or what problems they solve, they are following dogma, not engineering.
What Henney and Terhorst-North argue is that cleanliness is fundamentally contextual. Code that is clean for a small team moving fast is not the same as code that is clean for a large organization where the cost of mistakes is high. Code that is clean during exploration is not the same as code that is clean in a production critical path. The rules were never meant to be universal, but they got taught as if they were, and now we have developers apologizing in code reviews for functions that are technically longer than some arbitrary limit.
The more interesting question the discussion opens is what we actually mean when we call code clean. Is it readable? Testable? Changeable? Correct? These are related but not identical properties, and optimizing hard for one often creates tension with another. A codebase that is maximally readable might not be maximally changeable. A codebase optimized for testability might introduce abstractions that obscure intent. The honest answer is that clean is always relative to a purpose, and that purpose changes.
Key takeaways:
- "Clean code" as a term has roots predating the 2008 book, and the book codified context-specific practices as universal rules
- Rules without the reasoning behind them become dogma
- Cleanliness is relative to team size, risk profile, and the lifecycle stage of the code
- Optimizing for one quality (readability, testability, changeability) often creates tension with others
- Context shapes what quality means, and pretending otherwise produces cargo cult engineering
Why do I care: The clean code rules are taught early and questioned late, if ever. I have seen teams spend real time in review debates about function length while genuine architectural problems accumulate unchallenged. The dogma is sticky because it is easy to apply mechanically. But the actual goal, code that a team can reason about, change safely, and trust in production, requires judgment that no ruleset can fully encode. Henney and Terhorst-North are asking the right question. The answer is not that clean code is a myth, it is that you have to do the harder work of understanding what clean means for your specific situation.
"Clean Code" a Timeless Truth OR a Myth We Keep Telling Ourselves?
Orchestrating Scalable Frontends: The Power of the Composition Root
TLDR: Frontend applications built with isolated feature modules face a real coordination problem: how do features communicate without coupling to each other? Giuseppe Ciullo argues the answer is treating the Pages layer as a Composition Root, borrowed from Clean Architecture, where features are instantiated and their interactions orchestrated in one explicit place.
Summary: The coordination problem in modular frontend architecture is one of those problems that looks solved until the application grows past a certain size. You build isolated feature modules, which is good. Then those features need to talk to each other, and the naive solution is to have them import from each other directly, which quietly destroys the isolation you built.
The Composition Root concept from Clean Architecture offers a cleaner answer. The idea is to designate a specific layer, in this case the Pages layer, as the place where features are instantiated and their dependencies are wired together. Features do not know about each other. They only know about the interfaces they consume. The Pages layer knows about everything and is responsible for connecting the pieces. This is an explicit inversion: instead of features reaching outward for what they need, the composition layer pushes dependencies in.
This pattern has a real benefit for team scale. When multiple teams own different feature modules, a clear Composition Root means each team can reason about their module in isolation. The page is the meeting point, not an implicit contract negotiated through shared imports. You can change how features are wired without touching the features themselves. You can test each feature module independently without needing the rest of the application running.
The practical challenge the article acknowledges is that this requires discipline about what counts as the Pages layer and what counts as a feature. The moment a feature starts knowing about the page it lives in, or another feature it shares a page with, the pattern breaks down. It also requires thoughtful interface design at the feature boundary, which is not free effort.
Key takeaways:
- Features should communicate through the Pages layer acting as a Composition Root, not through direct imports from each other
- Composition Root is a Clean Architecture concept that separates instantiation from feature logic
- This pattern allows teams to own feature modules independently without coordination overhead
- The Pages layer becomes the single place where feature interactions are wired and visible
- Requires disciplined interface design at feature boundaries to work correctly
Why do I care: This is the kind of pattern that solves a problem teams usually discover by accident rather than by design. The coupling-through-feature-imports anti-pattern is extremely common in React applications, especially when features start sharing state or triggering each other's side effects. Moving that coordination explicitly into the page layer makes the application's actual dependency graph visible rather than inferred. The connection to the Black Box Problem article earlier in this issue is worth noting: both are fundamentally about making structure explicit rather than letting it accumulate implicitly.
Orchestrating Scalable Frontends: The Power of the Composition Root
I Had No Weekend Plans, So I Let Earth Tell Its Story
TLDR: A developer used a free weekend to build an interactive Earth Day storytelling webpage where Earth narrates its own 4.5-billion-year history from formation to the present. The project covers biodiversity, environmental statistics, and ends with a pledge section.
Summary: Sometimes the best projects start with no plan at all. A developer with an open weekend and an Earth Day prompt built something that is part creative writing exercise, part web project, and part environmental reflection. The webpage is structured as a first-person narrative from Earth's perspective, covering its geological origins, the emergence of life, the rise of biodiversity, and the current human moment.
The technical choices lean toward a modern frontend stack with interactive sections covering distinct chapters of Earth's history. What makes this worth noting is less the technology and more the framing. Using the narrative voice of Earth as the content structure forces a different kind of engagement than a conventional infographic or article. The pledge section at the end asks visitors to commit to something specific, which is a choice about what the project is actually for.
Projects like this sit at the intersection of web craft and purpose, and there is something genuinely refreshing about a developer spending discretionary time on something that is not a productivity tool or a SaaS side project. The Earth Day hook is obvious, but the execution as a storytelling experience rather than a data dashboard shows real creative judgment.
Key takeaways:
- Interactive storytelling as a web format can carry environmental content more effectively than statistics alone
- First-person narrative voice from a non-human perspective is an underused creative technique in web projects
- Weekend projects with a clear thematic constraint often produce more focused work than open-ended builds
Why do I care: Web development sometimes forgets that browsers are capable of delivering experiences that feel genuinely different from a brochure or a dashboard. This project is a small reminder that the craft of building for the web includes choosing what story to tell and how to tell it, not just which framework to reach for. It is also a refreshing counterexample to the assumption that developers only build what they can monetize.