DuckDB’s new Quack protocol adds HTTP‑based client‑server capabilities, letting several DuckDB instances share a catalog and query data concurrently. The article compares Quack with Arrow Flight, PostgreSQL and managed analytics services, outlines migration steps and pricing implications, and explains how the change reshapes data‑engineering workflows.

DuckDB Quack Brings Client‑Server over HTTP – What It Means for Multi‑User Analytics

What changed

DuckDB announced Quack, a native HTTP protocol that turns the traditionally in‑process, single‑process engine into a multi‑user service. The extension ships as a core module in DuckDB v1.5.3 and can be loaded automatically on both client and server nodes. Unlike the earlier Arrow Flight SQL integration, Quack uses DuckDB’s own columnar format for transport, allowing a full query and its result set to travel in a single round‑trip. The DuckDB team reports up to 3.5× faster data movement than Arrow Flight and a noticeable edge over PostgreSQL for typical analytical workloads.

Key capabilities introduced by Quack:

Standard HTTP(s) transport – works behind firewalls, load balancers and cloud‑native ingress controllers.
Shared catalog support via the upcoming DuckLake catalog server, enabling multiple writers to see the same metadata.
Autoloadable extension – no custom binaries required for most deployments.
MIT‑licensed implementation, keeping the project free for commercial use.

Provider comparison

Feature	DuckDB Quack	Arrow Flight SQL (Apache Arrow)	PostgreSQL (PG‑14+)	Snowflake (managed)
Transport	Plain HTTP/1.1 or HTTP/2, optional TLS	gRPC‑based Flight RPC, requires protobuf runtime	TCP with optional SSL, pg‑protocol	HTTPS REST/ODBC/JDBC, managed network
Data format	DuckDB native columnar (no conversion)	Arrow IPC/Flight format	PostgreSQL binary or text	Snowflake internal columnar, automatic conversion
Query latency (single small query)	~1 round‑trip, < 5 ms on LAN	2‑3 round‑trips, ~12 ms on LAN	2‑3 round‑trips, ~15 ms on LAN	Managed latency, ~30 ms typical
Large‑result throughput	3.5× faster than Arrow Flight in benchmark (10 GB Parquet)	Baseline	1.2× slower than Quack on same hardware	Depends on warehouse size, often higher cost
Concurrency model	Multi‑writer via DuckLake catalog, optimistic MVCC	Read‑only or limited write support	Full ACID, row‑level locking	Fully ACID, auto‑scaling
Licensing / cost	Open source, free	Open source, free	Open source, free (self‑hosted)	Pay‑as‑you‑go, usage‑based pricing
Cloud‑native integration	Works with Kubernetes Ingress, AWS ALB, GCP Cloud Run	Requires custom Flight server deployment	Requires managed DB instance or self‑hosted VM	Native connectors for most clouds
Ecosystem support	Python, R, Julia, Rust, Java, C++ via DuckDB driver	Java, C++, Python via Arrow libraries	All major drivers, ORMs	Native connectors, Snowpark APIs

Why Quack can be faster than Arrow Flight

Single‑hop serialization – DuckDB writes results directly in its own columnar layout, avoiding the extra Arrow IPC wrapper that Flight adds.
HTTP/2 multiplexing – multiple concurrent queries share the same TCP connection, reducing handshake overhead.
No external protobuf dependency – the server can stream binary buffers without the extra reflection step required by Flight.

When PostgreSQL still wins

Transactional workloads that need strong row‑level locking and complex stored procedures.
Environments where a mature ecosystem of extensions (PostGIS, pg_partman) is required.
Organizations that already run PostgreSQL as a managed service and prefer a single vendor contract.

When Snowflake remains attractive

Unlimited auto‑scaling compute with separate storage billing.
Global data sharing across regions without additional networking configuration.
Built‑in security and governance features that are hard to replicate in a self‑hosted stack.

Business impact

Migration considerations

Step	Action	Typical effort	Risks
1. Assess data size	Identify tables that exceed 10 GB or are stored in object storage (S3, GCS).	1‑2 days	Under‑estimating network egress costs.
2. Deploy a Quack server	Spin up a small VM or container with DuckDB v1.5.3, enable the extension, point the catalog to DuckLake if needed.	1 day	Misconfiguration of TLS may expose data.
3. Update clients	Replace local DuckDB driver calls with the Quack client library (Python `duckdb.connect('quack://host:port')`).	1‑3 weeks depending on code base size	Incompatible query patterns (e.g., reliance on file‑system‑only functions).
4. Validate concurrency	Run a load test with 10‑50 concurrent users, monitor latency and transaction abort rates.	2‑3 days	Unexpected contention if MVCC settings are too aggressive.
5. Cutover	Switch production traffic to the Quack endpoint, keep the local fallback for a week.	1 day	Data loss if replication is not enabled.

Pricing implications

Compute – DuckDB runs on the same hardware you already provision for batch jobs; no extra license cost is added.
Network – Because Quack uses plain HTTP, egress charges apply when the server sits in a different cloud region than the client. A typical 10 TB monthly transfer on AWS costs about $90, far lower than the per‑second compute charges of a Snowflake warehouse of comparable performance.
Storage – DuckDB stores data in Parquet or native files on object storage; you pay only the storage tier price (e.g., $0.023/GB on S3 Standard).
Operational overhead – You need to monitor the Quack server for memory pressure; however, the lightweight footprint (≈ 300 MB RAM for a 20 GB dataset) keeps operational cost modest.

Strategic outcomes

Reduced vendor lock‑in – Teams can keep their existing DuckDB notebooks while adding a central service layer, avoiding a migration to a heavyweight RDBMS.
Faster prototyping – Data scientists can spin up a shared DuckDB endpoint in minutes, collaborate on the same dataset, and still run heavy analytical queries locally when needed.
Simplified stack – By using the same engine for both embedded and server modes, organizations eliminate the need for a separate OLAP warehouse for ad‑hoc analysis.
Path to production – Quack’s roadmap includes replication and higher transaction throughput, meaning the same code that powers a notebook can later serve a low‑latency API for downstream services.

Looking ahead

The DuckDB team plans a production‑ready Quack release with DuckDB 2.0 later in 2026, adding features such as custom protocol extensions (e.g., authentication plugins), write‑ahead logging for durability, and read‑replica scaling. Early adopters can experiment today with the autoloadable extension, but they should design their architecture to allow a smooth transition to the upcoming replication layer.

“Quack removes the ‘single‑process only’ myth and gives us a pragmatic way to share analytical state without moving to a heavyweight warehouse,” says Mehdi Ouazza, data engineer at MotherDuck.

For teams that already rely on DuckDB for ETL, model training or exploratory analysis, Quack offers a low‑cost bridge to multi‑user scenarios. The decision now rests on whether the organization values speed and simplicity over the full transactional guarantees of a traditional RDBMS.

Renato Losio is a principal cloud architect, AWS Data Hero and regular contributor to InfoQ. Connect on LinkedIn.

#DuckDB #Quack #HTTP #Data Engineering #Analytics

DuckDB Quack Brings Client‑Server over HTTP – What It Means for Multi‑User Analytics

DuckDB Quack Brings Client‑Server over HTTP – What It Means for Multi‑User Analytics

What changed

Provider comparison

Why Quack can be faster than Arrow Flight

When PostgreSQL still wins

When Snowflake remains attractive

Business impact

Migration considerations

Pricing implications

Strategic outcomes

Looking ahead

Comments