Building Postgres Compatibility in Rust: pgwire and DataFusion as Modern Database Infrastructure
#Rust

Building Postgres Compatibility in Rust: pgwire and DataFusion as Modern Database Infrastructure

Tech Essays Reporter
4 min read

GreptimeDB demonstrates how implementing Postgres protocol compatibility through Rust libraries pgwire and DataFusion enables modern databases to join the PostgreSQL ecosystem while maintaining architectural flexibility. This 'top-down' approach unlocks the mature PostgreSQL tooling and driver ecosystem without being constrained by PostgreSQL's implementation.

In the evolving landscape of database technologies, PostgreSQL has experienced renewed prominence in 2025 following the acquisitions of Neon and CrunchyData. This resurgence has highlighted two distinct approaches for integrating with the PostgreSQL ecosystem: the 'bottom-up' extension approach and the 'top-down' protocol compatibility approach. The former, exemplified by projects like ParadeDB, uses pgrx to introduce Rust ecosystem libraries into PostgreSQL itself. The latter, which GreptimeDB follows, involves simulating PostgreSQL's protocol and interfaces to build 'Postgres-like' databases that can leverage PostgreSQL's extensive tooling without being PostgreSQL.

The Postgres protocol, at its core, consists of multiple layers. The narrow protocol refers to the application-layer wire protocol running over TCP, comprising five main components: Startup (handshake and authentication), Simple Query (text-based queries), Extended Query (PreparedStatement with parameterized queries), Copy (data import/export), and Cancel (query termination). However, achieving meaningful compatibility requires addressing the broader protocol, which includes PostgreSQL's SQL dialect, data type system, and pg_catalog metadata information system.

Layers of Postgres protocol compatibility

Implementing Postgres compatibility offers substantial advantages over designing a new layer 4 protocol from scratch. The current version 3.0 protocol has proven its reliability over more than a decade of production use, supporting TLS, various authentication methods, and client negotiation. Perhaps most significantly, protocol compatibility unlocks access to a wealth of existing tools, including programming language drivers, database management utilities, and BI tools. This compatibility extends to the ability to serve as a Foreign Data Wrapper (FDW) data source for native PostgreSQL, enabling federated queries that can JOIN data between PostgreSQL and the compatible database.

However, the Postgres protocol presents certain limitations. Its row-oriented design necessitates conversion when underlying data structures are column-oriented. The query cancellation mechanism is intrinsically tied to PostgreSQL's process model, introducing potential complexity and security concerns. Additionally, the Extended Query protocol requires type inference without query execution, a capability that databases like SQLite and DuckDB lack.

Connecting to GreptimeDB with a database management tool via Postgres protocol

GreptimeDB implements Postgres compatibility through two key Rust components: pgwire and the DataFusion ecosystem. pgwire, authored by the team, serves as the foundational layer for protocol implementation, analogous to how hyper or axum function in the HTTP ecosystem. It provides two levels of APIs: a low-level API focused on message-level processing (SimpleQueryHandler::on_query) and a high-level API concerned with data-level semantics (SimpleQueryHandler::do_query). This modular design allows implementers to selectively support various protocol features based on their requirements.

The implementation of a simple query handler demonstrates pgwire's flexibility. The provided code example shows how an EchoHandler can process incoming queries and return them as results, demonstrating that the protocol doesn't even require incoming queries to be SQL. This flexibility enables diverse applications beyond traditional database scenarios.

GreptimeDB's Postgres compatibility stack — pgwire, datafusion-pg-catalog, arrow-pg, and DataFusion

While pgwire handles the network protocol level, comprehensive compatibility requires metadata support through pg_catalog. GreptimeDB addresses this through datafusion-postgres, an adapter that bridges pgwire, the DataFusion query engine, and the Arrow data format. This component includes support for pg_catalog and arrow-pg, which handles conversion from Arrow data to Postgres data. The implementation focuses on key system tables including pg_database, pg_namespace, pg_tables, pg_class, and pg_attributes, which are essential for database management tools to function properly.

The complexity of pg_catalog compatibility should not be underestimated. Tools like DataGrip generate intricate queries during startup that may involve unsupported functions or UDFs. This adaptation work requires dedicated effort and represents an ongoing challenge for full compatibility.

Featured image

What emerges from GreptimeDB's implementation is a decoupling of protocol from implementation. By implementing Postgres compatibility as reusable libraries, the approach enables any modern data infrastructure to quickly adapt to this protocol and build a 'Postgres-like' ecosystem. This creates a new kind of 'Postgres'—one not bound to a single codebase but to a shared protocol and compatibility layer.

The broader impact of this approach extends beyond GreptimeDB. The pgwire library has been adopted by several other projects including PeerDB (acquired by ClickHouse), SpacetimeDB (designed for real-time online games), corrosion (by fly.io), and db9.ai. This growing ecosystem suggests a pattern where protocol compatibility becomes a key differentiator in database development.

Looking forward, the integration of geoarrow and geodatafusion with DataFusion presents opportunities to build a PostGIS-compatible ecosystem within this framework. The approach opens possibilities for specialized databases to maintain compatibility with established tooling while optimizing for specific workloads.

The implementation of Postgres compatibility in Rust represents more than a technical achievement; it reflects a broader trend in database development where protocol and interface compatibility become as important as the underlying implementation. As data systems continue to diversify, the ability to participate in established ecosystems without sacrificing architectural innovation may become a critical factor in database adoption and success.

Comments

Loading comments...