How ClickHouse Sidesteps the Update Bottleneck: Turning Mutations Into Inserts
Share this article
Columnar databases excel at analytical queries but often stumble when faced with row-level updates. Traditional row stores like PostgreSQL handle mutations efficiently by overwriting data in-place, but this comes at the cost of analytical speed. ClickHouse, optimized for blistering-fast inserts and aggregations, initially avoided this trade-off entirely—until real-world demands for updates in IoT sensor feeds, inventory systems, and financial records forced innovation. The solution? Treat updates as inserts.
The Insert-Only Philosophy: Why Updates Are a Column Store Nightmare
In column stores, data is split into per-column files. An update affecting multiple columns requires rewriting fragments across these files, creating I/O bottlenecks. ClickHouse flips this model by exploiting its core strength: insert performance. With no global indexes to lock, inserts scale linearly, handling over 1 billion rows per second in some deployments. By recasting updates and deletes as new row inserts, ClickHouse defers the heavy lifting to background merges—leveraging the continuous consolidation already inherent in its MergeTree engine.
"We sidestepped the update problem by treating updates as inserts. This isn’t a workaround; it’s a deliberate design choice," explains the ClickHouse team. The approach avoids random I/O, instead using sorted, immutable data parts that merge efficiently in linear passes.
Foundation: Parts, Merges, and Sorted Efficiency
Every ClickHouse insert creates a self-contained part—a folder of column files, sorted by the table’s key (e.g., (order_id, item_id)). For example, an initial insert into an orders table:
INSERT INTO orders VALUES
(1001, 'kbd', 10, 45.00, 0.00),
(1001, 'mouse', 6, 25.00, 0.00);
Generates a part like all_1_1_0, storing rows contiguously. Background merges combine parts through single-pass interleaving (no resorting needed), making consolidation cheap. This efficiency enables lightweight handling of mutations during merges.
Purpose-Built Engines: Three Paths to Lightning-Fast Mutations
1. ReplacingMergeTree: Updates as New Versions
Define a table with ENGINE = ReplacingMergeTree and insert a new row with the same sorting key to update. Merges retain only the latest version:
CREATE TABLE orders (
order_id Int32,
item_id String,
quantity UInt32,
price Decimal(10,2),
discount Decimal(5,2)
) ENGINE = ReplacingMergeTree ORDER BY (order_id, item_id);
-- Initial insert
INSERT INTO orders VALUES (1001, 'mouse', 6, 25.00, 0.00);
-- Update: Insert increased quantity
INSERT INTO orders VALUES (1001, 'mouse', 60, 25.00, 0.20);
During merges, outdated rows are discarded.
alt="Article illustration 2"
loading="lazy">
CREATE TABLE orders (
order_id Int32,
item_id String,
quantity Nullable(UInt32),
price Nullable(Decimal(10,2)),
discount Nullable(Decimal(5,2))
) ENGINE = CoalescingMergeTree ORDER BY (order_id, item_id);
-- Partial update: Only quantity and discount change
INSERT INTO orders VALUES (1001, 'mouse', 60, NULL, 0.20);
alt="Article illustration 3"
loading="lazy">
CREATE TABLE orders (
order_id Int32,
item_id String,
quantity UInt32,
price Decimal(10,2),
discount Decimal(5,2),
is_valid UInt8
) ENGINE = CollapsingMergeTree(is_valid) ORDER BY (order_id, item_id);
-- Delete: Insert a "cancelling" row
INSERT INTO orders VALUES (1001, 'mouse', NULL, NULL, NULL, -1);
Merges collapse matching
+1 and -1 rows.
alt="Article illustration 4"loading="lazy">
Immediate Consistency with FINAL: No Merge Wait Needed
Background merges introduce eventual consistency. For real-time accuracy, use FINAL in queries:
SELECT * FROM orders FINAL;
This performs in-memory consolidation during reads, applying engine logic (e.g., keeping latest rows or collapsing deletes) without on-disk merges. Optimizations skip unnecessary work, ensuring speed even on large datasets.
The Trade-Off: Power vs. Complexity
While these engines handle high-throughput scenarios elegantly, they demand ClickHouse-specific modeling. Developers must understand sorting keys, merge semantics, and engine quirks—ReplacingMergeTree can’t update keys, and schemas often require Nullable columns or sign flags. Yet, the payoff is profound: updates become as cheap as inserts, sustaining ClickHouse’s analytical prowess. As the team notes, this insert-and-merge rhythm mirrors traditional row stores in outcome but achieves it through scalable, lock-free parallelism.
In Part 2, ClickHouse evolves further, wrapping this ingenuity into SQL-standard UPDATE syntax via patch parts—proving performance and familiarity can coexist.
Source: ClickHouse Blog