Keeping an AI Agent's Knowledge Graph Clean: Why Naming and Identity Are Two Different Problems

How to Keep Your AI Agent's Knowledge Graph Clean

TLDR: Most people building memory layers on knowledge graphs treat entity resolution and deduplication as one fuzzy step, and that is exactly what corrupts the graph. The fix is to separate naming ("what do we call this?") from identity ("is this the same real-world thing?"). Get the split wrong and two different entities silently merge, trust evaporates, and the memory layer you paid to build goes unused.

Summary: The author spent four months building unified memory layers on top of knowledge graphs, and the question that kept coming back from readers was how to handle entity resolution and deduplication without wrecking the graph. So rather than guess, the author went and studied how mem0, cognee, and Neo4j actually do it. The conclusion is blunt. Almost everyone collapses two separate decisions into one, and that single confusion is what causes graphs to quietly rot. A name that looks similar is not proof that two things are the same. When the system acts as if it is, two genuinely different real-world entities get fused, and the failure stays invisible until it becomes expensive to undo.

The pipeline they describe goes from raw text to a clean graph node in a few stages. An LLM extractor reads a document or a conversation turn and emits entities and relationships as triplets, anchored to an ontology so it only pulls out the entity types you actually care about. A sentence about someone working at a company becomes a person-works-at-organization triplet because the ontology told the extractor those are the types that matter. If running everything through an LLM gets too expensive, the author suggests a cost-tiered cascade, starting with fast statistical models like spaCy for common entities, moving to zero-shot models like GLiNER for domain-specific cases, and only falling back to an LLM for the hard ones. That is a sensible nod to the fact that extraction at scale is a budget problem as much as an accuracy one.

Then comes the part the author cares most about. Resolution decides what to call a node. It normalizes the name against existing nodes of the same type using exact, then fuzzy, then semantic matching, in a short-circuit chain that only escalates when no confident match is found. "NYC" resolves to "New York City". "JP Morgan" resolves to "JPMorgan Chase". Critically, it only ever compares against names of the same type, so a person never gets matched against an organization. And at this stage nothing merges. The system just updates a canonical name property and optionally tracks aliases to speed up future lookups. Naming is the noisy, forgiving step that absorbs typos, casing, and whitespace.

Deduplication is the riskier decision, and it answers whether two nodes are the same real-world entity. Here the system embeds the full context of a node, not just its bare name, so it can tell apart two people who happen to share a name and a type. The example used is Jensen Huang the CEO of NVIDIA versus a doctor in Taipei with the same name. The combined score is an explicit weighted blend, embedding similarity at seventy percent and fuzzy similarity at thirty percent, sorted into three bands. A high score auto-merges, a medium score flags the pair for a human, and a low score just creates a new node. The middle band is the dangerous gray zone, and the whole point of flagging rather than merging is that undoing a merge means re-ingesting the source data, which is painful. Flagged pairs get a pending same-as edge written into the graph itself, and the review queue is just a Cypher query over those edges ordered by confidence.

The author also covers two safety nets most tutorials skip. The first is tombstoning, where a merged source node is kept queryable for forensics but skipped from future matching, so you keep an audit trail. The second is the "dream pipeline", a nightly pass that re-runs deduplication only on recently ingested nodes. This catches duplicates that slipped through because two entities were processed in parallel and neither existed in the graph when the other was written. Because the embeddings were already computed at ingest, the dream pass is mostly database reads and writes rather than fresh model calls, so it runs cheaply when organic traffic is low. One thing the author skips over a little fast is how you tune those band thresholds for a specific domain, and whether 0.95 and 0.85 are anything more than reasonable starting guesses. The honest answer is that those numbers are dataset-dependent, and the article would be stronger if it admitted that out loud.

Key takeaways:

Resolution and deduplication are two distinct jobs: one assigns a canonical name, the other decides real-world identity, and merging them is what corrupts graphs.
Embed the full node context, not just the name, so you can separate two same-named, same-type entities like a famous CEO and an unrelated doctor.
Use confidence bands with a deliberate gray zone: auto-merge only when near-certain, flag the uncertain middle for human review, and create a new node when evidence is weak.
Keep an audit trail with tombstoned nodes and a pending same-as edge, since undoing a bad merge is far more expensive than preventing one.
Run a nightly "dream" pass over recently ingested nodes to catch duplicates created by parallel ingestion, and keep it cheap by reusing stored embeddings.

Why do I care: If you are an architect putting an agent memory layer into production, this is the part that decides whether the thing is trustworthy six months from now, and it is exactly the part teams skip because it is unglamorous plumbing. The naming-versus-identity split maps cleanly onto something we already know from data modeling, which is that a display label and a primary key are not the same thing, and conflating them has always caused pain. What I like here is the operational honesty: the human-review gray zone, the tombstones, the nightly reconciliation job. Those are the boring details that separate a demo from a system you can actually query and bill against. If you own the data layer behind an AI feature, budget for this maintenance work up front, because a graph nobody trusts is just an expensive way to store noise.

How to Keep Your AI Agent's Knowledge Graph Clean