Azure HorizonDB Folds Knowledge Graphs Into Postgres With AI Functions and Apache AGE

Microsoft's new HorizonDB tutorial builds a queryable knowledge graph from raw text using nothing but SQL, collapsing the usual stack of NLP pipelines and standalone graph databases into one managed Postgres engine. For teams weighing cloud data platforms, the move signals where the consolidation pressure is heading.

Microsoft published a hands-on tutorial on Microsoft Learn showing how to turn unstructured incident tickets into a connected, queryable knowledge graph inside Azure HorizonDB using AI functions and the Apache AGE extension. The headline is not the graph itself. It is what the workflow no longer requires: no external NLP service, no separate graph database, no orchestration glue between them. Everything runs as SQL against a single managed Postgres engine.

That consolidation is the part worth examining if you are deciding where your data and AI workloads should live over the next few years.

What changed

Knowledge graphs have been an architecture-diagram staple for years, but building one has traditionally meant assembling a pipeline. You extracted entities with an NLP model, resolved duplicates with custom logic, loaded the results into a dedicated graph store like Neo4j or Amazon Neptune, and kept all three systems in sync. The graph was valuable; the assembly cost was the reason most teams never shipped one.

HorizonDB's tutorial compresses that into five steps that stay inside the database:

Extraction. azure_ai.extract() parses services, teams, root causes, and relationship triples directly from ticket text in a single SQL call, calling an LLM behind the function.
Deduplication. azure_ai.generate() with structured JSON output collapses variants like "API gateway," "api-gateway," and "the gateway service" into one canonical node.
Loading. Cypher MERGE statements run inside PL/pgSQL loops to build service nodes, team nodes, incident hubs, relationship edges, and a chronological timeline chain.
Querying. Variable-length Cypher path patterns like *1..3 trace cascading failure chains up to three hops deep, the kind of question relational tables answer only with recursive CTEs or manual correlation.
Visualization. The PostgreSQL extension for VS Code renders Cypher output as an interactive node-edge graph.

The underlying argument is one most data teams will recognize once it is named. Your relational tables store facts. They do not store the connections between those facts. A postmortem question like "which upstream services most commonly trigger failures that reach checkout?" requires tracing relationships across tickets, teams, services, and root causes. That is a graph problem hiding inside a relational schema, and most teams have one without realizing it.

The same pattern extends well past incident management. The tutorial maps it onto contract conflict detection, fraud rings connected through shared devices, medication interaction checks, codebase dependency analysis, supply chain single points of failure, and transitive access auditing in IAM. In every case the shape is identical: extract entities, deduplicate them, store and traverse the graph. What differs is only the domain vocabulary.

Provider comparison

The strategic question is not whether graph-augmented retrieval is useful. It clearly is, especially for graph-augmented RAG, where vector search alone fails because the answer requires connecting facts across multiple documents rather than retrieving the single most similar passage. The question is where you run it, and the major clouds are taking visibly different routes.

Microsoft Azure is betting on consolidation inside Postgres. HorizonDB bundles vector search, the azure_ai function family, and Apache AGE graph capabilities into one engine. The pitch to a buyer is operational simplicity: one database to secure, back up, scale, and bill. You trade some best-of-breed depth for a dramatically shorter integration path, and you stay on standard Postgres surface area, which limits lock-in at the query-language level even as the AI functions tie you to Azure.

AWS takes the specialized-service approach. Amazon Neptune is a purpose-built graph database with its own Neptune Analytics engine, and graph-augmented RAG is wired together through Bedrock Knowledge Bases plus separate vector stores like OpenSearch or Aurora pgvector. The components are mature individually, but you are integrating three or four managed services rather than calling functions in one. That suits teams that want each layer independently tunable and are comfortable owning the orchestration.

Google Cloud sits between the two. Spanner Graph added property-graph queries on top of Spanner's distributed relational core, pairing graph traversal with vector search in a single globally consistent store, while Vertex AI supplies the model layer. The differentiator is Spanner's horizontal scale and strong consistency, which matters more for graphs that grow into the billions of edges than for the incident-ticket scale most teams start at.

The practical contrast for a buyer comes down to integration surface. Azure's approach minimizes the number of moving parts at the cost of being newer and Azure-specific. AWS maximizes component maturity and independent scaling at the cost of integration overhead. Google leans on Spanner's scale story but carries Spanner's pricing model and operational profile with it.

On cost, the consolidation play has a quieter advantage that does not show up in a per-service price sheet. Running extraction, deduplication, vector search, and graph traversal in one engine removes the data-movement and duplicate-storage costs of keeping a relational store, a vector store, and a graph store aligned. Cross-service data transfer and the engineering time to maintain sync pipelines are real line items, and they tend to be underestimated in early architecture decisions.

Business impact

For an organization already standardized on PostgreSQL, the migration calculus is unusually gentle. Apache AGE is an open-source extension and Cypher is a portable query language, so the graph logic itself is not Azure-proprietary. The azure_ai functions are the lock-in surface, and they sit at the edges of the pipeline rather than at its core. A team could prototype the entire workflow on HorizonDB and, if needed later, swap the extraction and deduplication calls for an external model service while keeping the AGE graph and Cypher queries intact. That portability profile is worth weighing against the deeper integration but heavier coupling of a Neptune-plus-Bedrock or Spanner-plus-Vertex stack.

The more immediate impact is on time to value. The reason knowledge graphs stayed on slide decks was never that the queries were hard to imagine. It was that standing up the pipeline took a specialized team and weeks of integration. Collapsing that into SQL statements a data engineer already knows how to write changes who can build one and how fast. A team that can express "show me every cascading failure chain that touched the payment service in the last 90 days" as a single Cypher query, without provisioning a second database, is a team far more likely to actually ship the capability into production.

For cloud strategy specifically, this is a signal about direction. The database vendors are absorbing AI and graph workloads that used to justify separate specialized platforms, and they are doing it inside engines teams already operate. The decision in front of most organizations is shifting from "which graph database do we adopt" to "how much of our AI retrieval stack can our primary database already absorb." Microsoft's tutorial is a concrete answer to that second question, and the other providers are working toward their own versions of it. Evaluate the consolidation against your existing data gravity: if you are already on managed Postgres, HorizonDB's approach removes more friction than it adds, and the portable pieces leave you room to change your mind. Teams can share feedback on the PostgreSQL Hub developer forum.