Semantic Layers Are Overrated
March 10, 2026

Someone told me recently that “Claude.md is the ontology for AI-assisted coding.” It was meant as a compliment to the practice of writing comprehensive context files for coding agents. But it made me think about ontologies. And then about semantic layers. And then about every other time the industry convinced itself that one big abstraction layer would tame complexity.
It never works. Not the way we hope.
The Silver Bullet Pattern
Every generation of enterprise technology produces its own version of the same idea: build one comprehensive thing, and the complexity goes away.
Monolithic data warehouses were supposed to be the single source of truth. Load everything in, model it once, and every question gets a governed answer. What actually happened was a years-long implementation that couldn’t keep pace with business change. By the time the warehouse modeled last quarter’s org structure, the org had restructured. The business needed answers faster than the warehouse could evolve, so shadow spreadsheets and rogue Access databases filled the gap.
Report factories were going to eliminate ad-hoc requests. Standardize every metric, build every report, and the business never has to ask IT for anything again. The catalog grew to thousands of reports. Nobody could find the one they needed. People kept asking for new ones anyway.
Monolithic enterprise software suites promised to unify everything under one roof. One vendor, one platform, every department aligned. The implementation took three years, cost four times the estimate, and required so many customizations that upgrades became impossible.
Each time, the pitch was the same: centralize, standardize, and the problem dissolves. Each time, the organization outran the abstraction. That’s why distributed approaches like data mesh emerged. The industry learned, painfully, that the answer wasn’t one big thing. It was many coordinated smaller things.
Now it’s happening again.
The Evidence Is Already Here
A recent study from ETH Zurich (Gloaguen et al., 2026) tested whether repository-level context files, the AGENTS.md and Claude.md files that developers write to help coding agents understand their codebase, actually improve performance. These files describe architecture, conventions, testing requirements, the works. The idea is that if the agent understands the big picture, it’ll make better decisions on specific tasks.
The results were counterintuitive. LLM-generated context files reduced task completion by about 3% on average, while increasing computational costs by over 20%. The agents spent more time exploring, testing, and reasoning about information that was largely redundant with what they could discover from the code itself. Even carefully hand-written context files only provided a marginal 4% improvement, and only when they focused on minimal, non-obvious requirements.
The key finding: the codebase is its own best documentation. Summarizing it into a separate abstraction layer added noise, went stale, and created overhead that outweighed the benefit. The only context that helped was small, specific, and focused on things the model consistently got wrong on its own.
Now apply that same logic to the semantic layer conversation in data.
The Semantic Layer Pitch
The semantic layer pitch sounds almost identical to the context file pitch. Define all your metrics, business rules, and relationships in one comprehensive layer. Your BI tools pull from it. Your AI agents query it. Everyone gets consistent, governed answers. Build this one thing, and your data becomes “AI-ready.”
Consistent metric definitions are genuinely important. When finance and sales disagree on what “revenue” means, that’s a real problem worth solving. Nobody is arguing against shared definitions for shared metrics.
The problem is the leap from “consistent metric definitions are good” to “let’s build a comprehensive semantic layer that covers the entire organization.” That’s the boil-the-ocean move. And it runs into three compounding problems.
It’s expensive to build and maintain. A comprehensive semantic layer across every department, every metric, every business rule is a massive undertaking. It needs dedicated ownership, constant updates, and cross-functional alignment that most organizations struggle to sustain.
It goes stale. The business evolves faster than the semantic layer gets updated. New products launch. Teams reorganize. KPIs shift. The layer that was supposed to be the source of truth becomes another artifact that’s slightly out of date, which is arguably worse than having no layer at all, because people trust it when they shouldn’t.
It can’t hold multiple valid perspectives. This is the deeper issue. Finance looks at revenue differently than sales looks at revenue differently than the board looks at revenue. These aren’t errors. They’re legitimate, necessary perspectives shaped by different contexts and different decisions. A single ontology can’t hold all of them without becoming so complex it defeats its own purpose.
Just because you documented everybody’s job in the entire company doesn’t mean anybody in the company can do every job. And that mass of documentation would be incredibly hard to manage or search through effectively.
What Actually Works: Org Structure, Not Encyclopedias
What works in organizations isn’t an encyclopedia of everyone’s role. It’s org structure. Specialists with deep domain knowledge, clear boundaries, and a coordination layer that routes work to the right people.
The same principle applies to how we should architect AI systems for data.
Build small, focused agents. An HR reporting agent that deeply understands headcount metrics, compliance rules, and the specific nuances of how your organization tracks people data. A finance agent that knows the chart of accounts, understands accrual vs. cash, and can navigate the ERP. A sales agent that knows your pipeline stages, your territory model, and your commission structure.
Each of these specialists can have its own focused semantic context. A small, maintainable set of definitions scoped to its domain. The HR agent doesn’t need to know about COGS. The finance agent doesn’t need to know about candidate pipeline stages. This isn’t a limitation. It’s what makes each agent good at its job.
Then build a routing layer that delegates questions to the right specialist. “What’s our revenue?” goes to finance. “How’s hiring going?” goes to HR. “How is hiring affecting our margin?” gets routed to both, and their perspectives get composed into an answer.
The semantic layer becomes a local feature of each specialist, not a global infrastructure project. Consistent definitions still exist. They’re just scoped, maintained by the people who use them, and loaded only when relevant. This is the same organizational design principle I explored in Architecting the Factory: distributed authority works better than centralized control when you pair it with clear ownership and coordination mechanisms.
Feature, Not Foundation
The semantic layer isn’t bad. It’s a feature of a mature data platform. A useful tool for enforcing consistent metric definitions in specific, well-scoped contexts.
But it’s being positioned as the foundation, the thing you need to build before AI can work with your data. That framing is the trap. It delays value indefinitely while you try to model an organization that won’t hold still long enough to be modeled. It’s the same “get your house in order first” advice that has stalled more data programs than any technical limitation ever has.
Start with a specific problem. Build one specialist agent that’s genuinely good at one domain. Learn what your users actually ask and what context the agent actually needs. Let the architecture emerge from what you learn, not from what you imagine you’ll eventually need. Starting small and finishing beats starting big and stalling.
The gap between “AI-ready data” and actual AI value isn’t a completeness gap. It’s a sequencing gap. And the organizations closing it fastest are the ones willing to start small and get specific.
Are you building specialist agents or wrestling with a comprehensive semantic layer? I’d love to hear what’s working for you. Reach out on LinkedIn or Bluesky.