Your File System Isn't Agent Memory: Inside Neo4j's Knowledge Graph Architecture

Your File System Isn't Agent Memory

TLDR: Most AI agents rely on flat files or vector indexes for memory, and both approaches fall apart at scale. Neo4j's open-source agent-memory library offers a three-tier knowledge graph architecture that gives agents durable, relationship-aware memory. The patterns are portable enough to implement on Postgres or MongoDB if you don't want a full graph database in production.

Summary:

There's a question that keeps coming up in every serious AI engineering conversation I have: how does your agent actually remember things? Not just retrieve them from a vector store with fuzzy semantic matching, but truly remember, in the way that you'd expect a thoughtful collaborator to remember that Andrej Karpathy is the same person you mentioned three sessions ago, not some new ambiguous entity to be embedded and filed away. Most teams never get there, and the reason is that they're reaching for the wrong tools.

The article from Decoding AI walks through neo4j-labs/agent-memory, a library that the author spent two days studying after building their own second-brain setup with Obsidian, Readwise, NotebookLM, and Claude Code. The problem they hit is one I recognize immediately: once your knowledge base grows past roughly 50 documents, a file-based system starts rotting. You end up with the same entity mentioned in a dozen places with no canonical version, no relationship map, and no way to surface how frequently something appears. A vector index gives you fuzzy recall but no identity, no merge logic, and no sense of whether today's "Anthropic" is the same node as yesterday's.

The neo4j-labs architecture solves this with one graph and three memory tiers. Short-term memory holds the linear message sequence as ordered Message nodes. Long-term memory holds deduplicated typed entities with vector embeddings. Reasoning memory, which is the genuinely novel piece here, stores a tree of each agent run, capturing thoughts and tool calls so the agent can recognize successful patterns and avoid repeating mistakes. The whole thing is stitched together with three typed relationships: MENTIONS joins conversations to entities, INITIATED_BY links reasoning traces back to conversations, and TOUCHED connects reasoning steps to the long-term entity nodes. Cross-tier queries become one-hop lookups in Cypher rather than multi-store join nightmares.

What makes the ontology work is its deliberate constraint. Every entity is exactly one of five types borrowed from intelligence-analysis practice: Person, Object, Location, Event, or Organization. This is the POLE+O model, and having a closed top-level vocabulary means queries are efficient and predictable. Subtypes are open, so you can extend it for your domain, but you never lose the ability to search by type without scanning the entire graph. Beyond entities, the architecture adds Fact nodes for claims that don't fit neatly into any entity type, and Preference nodes with a SUPERSEDED_BY relationship so the agent can learn and update what it knows about you over time.

Entity extraction runs as a cost-ordered ladder: spaCy for fast statistical NER, GLiNER and GLiREL for zero-shot extraction, and an LLM stage only when ambiguity or relationship extraction is genuinely needed. The deduplication logic is where I think the real engineering care shows. Resolution and deduplication are treated as separate problems, which is the right call. A score above 0.95 triggers an auto-merge. Below 0.85 creates a new node. The middle band, 0.85 to 0.95, creates a SAME_AS edge with a pending status for a human or downstream agent to resolve. The reason is simple and correct: a false merge is silent and unrecoverable, while a false split is noisy but fixable.

Key takeaways:

File systems and vector indexes both fail as agent memory at scale, for different reasons: append-only logs fragment context, while vector indexes lack identity and merge logic
The neo4j-labs/agent-memory library uses one Neo4j graph with three tiers: short-term (conversations), long-term (typed entities), and reasoning traces
Three typed edges, MENTIONS, INITIATED_BY, and TOUCHED, make every cross-tier question a one-hop Cypher query
The POLE+O ontology (Person, Object, Location, Event, Organization) provides a closed five-type vocabulary for efficient, predictable entity querying
Entity extraction uses a speed-versus-accuracy ladder: spaCy first, then GLiNER/GLiREL, then LLM only for ambiguous cases
Deduplication uses a scored approach with a pending SAME_AS edge in the 0.85-0.95 range, because false merges are unrecoverable and false splits are not
The architecture is portable: the patterns work on Postgres or MongoDB if a full graph database in production isn't practical

Why do I care:

I've been thinking about agent memory as a solved problem for too long, and this article is a useful correction. The insight that reasoning memory, stored at the database level rather than baked into weights, behaves like a lightweight form of reinforcement learning is something I want to sit with. The SAME_AS pending edge pattern for uncertain deduplication is exactly the kind of practical, recoverable-by-default design I wish I saw more often in AI systems. The whole architecture is a reminder that the boring infrastructure decisions, ontology design, merge strategies, relationship types, are what separate agents that actually accumulate intelligence from agents that just seem smart until they forget everything three sessions later.

Your File System Isn't Agent Memory