Stop Chasing the Perfect Ontology: Ship a Knowledge Graph Memory Layer in 5 Minutes

TLDR: Building a knowledge graph memory layer stalls because people try to design the perfect ontology before touching real data. The POLE+O model gives you five generic base types to start with immediately, then you grow the schema through a lightweight data-exploration loop rather than upfront theory.

I keep thinking about the number of side projects I have abandoned somewhere between "I need to model my domain properly" and "okay, maybe next weekend." The knowledge graph is one of those perpetual ambitions. You read about GraphRAG outperforming plain RAG, you want a memory layer that models entities and relationships instead of raw files, and then you hit the ontology wall. What nodes do you create? What edges? Which properties matter? The question spirals and the project dies on a whiteboard.

The author of this piece had the same experience. Everything lived in Obsidian, files piling up, and the dream was to shift from file primitives to entity primitives: people, locations, topics, facts, preferences. Real knowledge graph stuff. The resistance was always the same. Without the right ontology your graph lies about the world. You extract the wrong things, your GraphRAG performs worse than the dumb vector search you were trying to replace, and you've spent weeks getting there.

The insight that breaks the deadlock is looking at what real, shipped ontologies actually look like. Neo4j published 22 domain-specific ontologies through the create-context-graph catalog, and every single one lands between 10 and 12 entity types. Not 80. Not 200. Ten to twelve. They all share a 5-noun base and add 5 to 7 domain-specific extensions. The lesson is that small is a deliberate design choice, not laziness. A good ontology is a narrow funnel from reality into something queryable.

POLE+O is that funnel. Person, Object, Location, Event, Organization. Five fixed base types, each extensible with subtypes you discover only after you run real extractions on real data. You start with nothing extended. You extract against your actual corpus. You watch where the generic labels lie: Claude Code tagged as a Person, a conference tagged as an Event when you wanted an Organization. Each mislabeling is a signal to add one subtype, not to redesign the whole thing. It is exactly how you would iterate on any other AI application, and that framing matters. The ontology is not a schema you design once; it is a living artifact you grow.

Two extra primitives complete the picture. Preferences capture stances: a person likes Italian food, prefers dark mode, dislikes long meetings. They hang off Person by default and act as the personalization layer that makes every future response feel tailored. Facts are the fallback for everything that does not cleanly fit a noun or preference. A Fact is an atomic triplet: subject, predicate, object. "Eiffel Tower / is / 330m tall" stored as one granular unit with an embedding for semantic search. Facts wire to nothing in the graph. They degrade gracefully. Early on you lean on them heavily, and as the schema matures some claims migrate into typed entities and edges. Nothing blocks the build.

Key takeaways:

The POLE+O base model (Person, Object, Location, Event, Organization) gives you a queryable foundation to ship immediately without schema analysis paralysis.
Real production ontologies are intentionally small, 10 to 12 entity types max, discovered through data exploration rather than upfront design.
Preferences and Facts fill the gaps: Preferences capture user stances for personalization, Facts store atomic triplets for anything that does not fit the typed graph yet.

Why do I care: As someone who builds frontend architectures, the knowledge graph problem maps uncomfortably well to state management debates. We spent years arguing about the perfect global store before just shipping something and iterating. POLE+O is the equivalent of "use a simple store, extend when you have evidence." The data-exploration loop, run extraction, inspect mislabels, add one subtype, repeat, is the scientific method applied to schema design. I am genuinely interested in plugging this into a personal assistant context, and the Neo4j agent-memory SDK with POLE+O as a swappable default looks like the right starting point. Stop waiting for the perfect model. Ship the mediocre one and let reality correct it.

Stop Chasing the Perfect Ontology