AI Agents Need Better Memory, Harper 5.0 Goes Open Source, and Karpathy's AutoResearch Ran Overnight
Published on 13.04.2026
Why Your AI Agent Keeps Forgetting (Even With 1M Tokens)
TLDR: Bigger context windows don't solve the memory problem. Sreekanth Ramakrishnan argues that treating the context window as a memory system is a fundamental design mistake, and proposes layered memory architectures for reliable agent behavior.
The author's core insight landed with me: the problem wasn't that the model ran out of tokens. The problem was assuming a context window works like RAM. It does not. When you put an LLM into an action-observation loop, the pressure on that context grows from two directions simultaneously. Every tool call result, every observation, every intermediate thought competes for space. The window fills up, old information falls off, and the agent forgets what it was doing three steps ago.
The article examines practical reasons why this happens. The context window has no notion of importance. It cannot decide that the original user goal matters more than the output of step four of a twelve-step process. It just has positions. When new tokens arrive, old ones get pushed out regardless of their relevance to the task at hand.
The author references OpenClaw's approach with two different memory layouts, suggesting that structured memory systems, not bigger windows, are the actual solution. This is the kind of architectural insight that comes from building real agents rather than just chatting with models in a browser tab.
I keep thinking about how this mirrors a pattern we have seen before. Database connection pools did not get better by making the connection object bigger. They got better by introducing tiers, eviction policies, and cache strategies. Agent memory needs the same treatment.
The article could have gone deeper on what those memory layers actually look like in production. What is the eviction policy? Who decides what gets compressed and what stays hot? The OpenClaw reference is tantalizing but feels more like a teaser than a blueprint. I would have liked to see actual code or at least a concrete architecture diagram.
Why Your AI Agent Keeps Forgetting (Even With 1M Tokens)
Harper Launches 5.0: Fully Open-Source Runtime for Building and Deploying Cost-Efficient Agents
TLDR: Harper 5.0 ships as a fully open-source unified runtime that combines database, cache, API, and real-time pub/sub into a single process built on Node.js and RocksDB.
Harper takes the approach that agent infrastructure should not require a diagram with twelve boxes and eleven arrows. Instead of stitching together a database, a cache layer, a message broker, a vector store, and a blob store, Harper packs all of these into one in-memory process backed by RocksDB. It is purpose-built for agentic engineering where AI agents need fast, reliable infrastructure without navigating a maze of microservices.
The claims are ambitious: 100,000 requests per second per node, enterprise deployments at Verizon, Lufthansa, and Ubisoft. Going fully open-source with version 5.0 is a significant shift that should lower the barrier for developers who want to evaluate the runtime without procurement conversations.
The RocksDB storage engine choice is interesting. It gives you persistence without the operational overhead of a separate database server. For agents that need to store conversation state, tool results, and intermediate artifacts, having that baked in rather than bolted on is a genuine simplification.
What I am not seeing discussed is how this scales horizontally. One process with RocksDB works well until it does not. What happens when you need to shard? How does replication work? The enterprise names are reassuring but they also suggest this has been battle-tested at a scale that most developers will not hit. The open-source release should answer some of these questions as more people start poking at it.
Harper Launches 5.0: Fully Open-Source Runtime for Building and Deploying Cost-Efficient Agents
AI Didn't Take Your Job, It Took the Part That Made It Yours
TLDR: The real crisis with AI at work is not job loss but job hollowing. Elhadj_C argues that AI strips away the creative and meaningful parts of work while leaving the shell behind.
This piece uses Studs Terkel's 1974 book Working as its lens, and that is a good choice. Terkel understood something we keep forgetting: work is not just about earning a living. It is about finding daily meaning in what you do. The argument here is that AI does not just automate tasks. It automates the parts of work that give people a sense of ownership and craft.
The author frames this as part five of a six-part series using science fiction to understand AI, work, and power. That framing works because the best science fiction has always been about the present, not the future. It gives us language for patterns we are living through but cannot yet name.
What gets me is the specificity of the claim. It is not the usual vague worry about AI replacing humans. It is the observation that the work that remains after AI automation is often the work nobody wanted to do in the first place. The creative decisions, the judgment calls, the moments where craft meets execution, those get absorbed into the model. What is left is supervision, compliance, and maintenance.
The piece does not fully explore what happens when enough jobs get hollowed out simultaneously. If every profession loses its most meaningful layer, what replaces it? The author gestures toward meaning but stops short of offering a concrete answer. I suppose that is what part six is for.
AI Didn't Take Your Job, It Took the Part That Made It Yours
I Let Karpathy's AutoResearch Agent Run Overnight!
TLDR: Raviteja Nekkalapu runs Andrej Karpathy's autoresearch agent autonomously and documents what happens when an AI agent optimizes a neural network while you sleep.
There is something almost magical about starting a process, going to bed, and waking up to results. This article captures that experience with Karpathy's autoresearch repository, which lets an AI agent autonomously design, run, and optimize neural network experiments without human intervention.
The hands-on review format works well here. Rather than just describing what autoresearch does conceptually, the author actually ran it and shared the results. That is the kind of writing I want to read more of. Less speculation, more empirical evidence from someone who pressed the button and waited.
The interesting question this raises is about the role of the researcher when the research runs itself. If an agent can design experiments, execute them, analyze results, and iterate on hypotheses overnight, what does the human researcher actually do? The answer seems to be framing the question, interpreting the broader significance, and deciding which direction to point the agent next.
The article could have gone deeper on the quality of the results. Did the agent find anything a human researcher would have missed? Were there surprising optimizations? The overnight experiment setup is compelling but the actual findings deserve more space than the format allowed.
I Let Karpathy's AutoResearch Agent Run Overnight!
The ER Bill You Might Never Have to Pay
TLDR: David Deal examines how consumer-facing AI health tools are helping people avoid unnecessary emergency room visits, potentially saving thousands of dollars in medical bills.
The piece looks at four consumer-facing products that are bringing AI health guidance to everyday users. The framing is practical: people are already using AI for health questions whether the medical establishment approves or not. These tools are catching up to that reality.
The financial angle is the hook. A single emergency room visit in the United States can cost thousands of dollars. If AI-powered triage can help someone decide whether their symptoms warrant a visit or can be managed at home, the savings are immediate and substantial.
The author's approach of examining distinct personalities across different products is useful. Not all AI health tools are the same, and they serve different needs. Some focus on symptom checking, others on ongoing health monitoring, and still others on connecting users with human professionals when the AI reaches the edge of its confidence.
What the article avoids, perhaps wisely, is any claim that these tools replace medical professionals. The framing is about avoiding unnecessary visits, not replacing necessary care. That is an important distinction that keeps the piece grounded.
The ER Bill You Might Never Have to Pay
Market Timing and Relevance: The Factor You Can't Fully Control
TLDR: Proof of Usefulness argues that timing shapes how hard a problem is to solve but does not determine whether the solution has value.
This is part of the Proof of Usefulness series on HackerNoon, which scores projects based on real-world utility rather than pitch deck promises. The piece examines market timing as one of the uncontrollable factors in startup success.
The core argument is clean: you cannot control when the market is ready for your solution. What you can control is whether the solution is actually useful. Timing might make the problem harder or easier, but it does not change the fundamental value of solving it well.
This feels like advice that is easy to agree with and hard to act on. Yes, build useful things. Yes, you cannot control timing. But the tension between these two truths is where most founders live. You built something useful too early and ran out of runway. Or you built it too late and the market was already saturated.
The article stops short of offering a framework for navigating that tension. It identifies the problem clearly enough but does not give you a way to think about timing risk in your own decisions. A more complete treatment would examine how to assess market readiness without falling into the trap of timing paralysis.
Market Timing and Relevance: The Factor You Can't Fully Control
Adversarial Machine Learning and Its Role in Fooling AI
TLDR: An exploration of how adversarial techniques can deceive machine learning models and what this means for AI security in production systems.
This piece examines adversarial machine learning, the practice of crafting inputs designed to fool AI models into making wrong predictions. It is the AI equivalent of finding a SQL injection vulnerability, except instead of a database, you are probing a neural network for blind spots.
The implications for production AI systems are substantial. If a model can be fooled by carefully crafted inputs, any system that depends on that model for decision-making inherits the vulnerability. This applies to image classification, natural language processing, and increasingly to agent-based systems where models make decisions about what actions to take.
The article connects adversarial ML to the broader AI security landscape. As we deploy models in more critical roles, understanding their failure modes becomes a security requirement, not just an academic curiosity. The techniques used to attack models also inform how to defend them.