#Backend

The Real Cost of Random I/O in PostgreSQL: Why Default Settings May Be Misleading

Tech Essays Reporter
5 min read

A deep dive into PostgreSQL's random_page_cost parameter reveals that default values are significantly underestimated, with real-world measurements showing costs up to 30x higher than the default 4.0 setting, particularly on modern storage systems.

Tomas Vondra's comprehensive analysis of PostgreSQL's random_page_cost parameter reveals a critical disconnect between the default value of 4.0 and the actual performance characteristics of modern storage systems. This parameter, introduced approximately 25 years ago, has remained unchanged despite dramatic shifts in storage technology and PostgreSQL's own evolution.

The fundamental question Vondra explores is whether the default random_page_cost of 4.0 accurately reflects the real-world cost of random I/O operations compared to sequential reads. Many administrators have been advised to reduce this value on SSD storage, with some recommendations suggesting setting it as low as 1.0—the same as seq_page_cost. But is this intuition correct?

Experimental Methodology

To measure the true cost, Vondra designed a controlled experiment using a 4.4GB table with 10 million rows. The setup deliberately eliminates caching effects by setting shared_buffers to 128MB, making the table an order of magnitude larger than available memory, and enabling direct I/O. This ensures that all reads are physical I/O operations, providing accurate measurements of random versus sequential access costs.

The experiment compares two access patterns: a sequential scan reading the entire table, and an index scan that performs nearly perfect random I/O. By measuring the time per page for each access pattern, Vondra calculates the actual random_page_cost:

  • Sequential page time: ~1.56 microseconds
  • Random page time: ~39.3 microseconds
  • Calculated random_page_cost: ~25.2

This result is nearly an order of magnitude higher than the default value of 4.0, directly contradicting the common recommendation to lower random_page_cost on SSDs.

Cross-System Validation

Vondra repeated the experiment across three different systems: two with local SSDs (Ryzen and Xeon processors) and one on Azure using remote SSD storage. The results consistently showed random_page_cost values between 25-35 for local SSDs, with even higher values for remote storage. This consistency across different hardware configurations strengthens the conclusion that the default value is significantly underestimated.

Impact on Query Planning

The implications for query planning are substantial. PostgreSQL uses these cost parameters to decide between different execution strategies, particularly between sequential and index scans. When the random_page_cost is set too low, the planner may choose index scans when sequential scans would be more efficient.

Consider a query selecting a variable fraction of table rows. With random_page_cost = 4.0, the planner switches from index to sequential scan at approximately 2.2% selectivity. However, the actual performance crossover point occurs at only 0.2% selectivity. This means queries with selectivities between 0.2% and 2.2% may execute up to 10x slower than optimal.

When random_page_cost is set to the experimentally determined value of 30.0, the cost and performance curves align almost perfectly, with the planning decision matching the actual performance crossover point.

The Role of Bitmap Scans

Bitmap scans provide a crucial mitigating factor. These index-based access methods combine the benefits of index scans with more sequential I/O patterns, making them consistently more efficient than both pure index scans and sequential scans across most selectivities. This effectively reduces the window where the planner might choose suboptimal plans.

Prefetching Complications

Prefetching adds another layer of complexity to the cost model. While it can dramatically improve performance—sometimes by an order of magnitude—it's not currently factored into PostgreSQL's cost calculations. This creates an asymmetry: enabling prefetching changes query durations without affecting costs, while changing random_page_cost affects costs without changing durations.

This disconnect becomes particularly problematic when considering that different scan types support prefetching differently. Bitmap scans have long supported prefetching, sequential scans gained it with AIO, but index scans still lack prefetching support (though this may change in PostgreSQL 19). Disabling prefetching makes sequential reads slower, which paradoxically makes random I/O appear more expensive in relative terms.

When Lowering random_page_cost Might Be Justified

Despite the experimental evidence showing random I/O is more expensive than the default suggests, there are legitimate reasons why lowering random_page_cost might improve real-world performance:

Cached Data Workloads: Many production systems have high cache ratios, particularly OLTP workloads where the "active set" of frequently accessed data fits comfortably in memory. In these cases, most random reads hit in cache and are very cheap, while sequential scans might evict hot data from memory.

Planning in Isolation: PostgreSQL plans queries in isolation without considering the broader system workload. A sequential scan that benefits one query might evict data needed by other queries, creating a net performance loss. Index scans tend to be more "focused" and cause less cache pressure.

Poor Estimates: Real-world data rarely follows uniform distributions. Correlated data, skewed distributions, and complex relationships can lead to poor cardinality estimates, causing the planner to choose suboptimal plans. Adjusting cost parameters can sometimes compensate for these estimation errors.

Recommendations and Future Directions

Vondra's analysis suggests that blindly lowering random_page_cost based solely on the "SSDs are fast" intuition is misguided. The actual cost of random I/O on modern storage is significantly higher than the default value suggests.

However, the complexity of real-world workloads means that tuning this parameter may still be beneficial in specific scenarios. Vondra strongly recommends monitoring performance when adjusting this or any other cost parameter, using tools like pg_stat_statements to evaluate the impact.

Several areas for improvement emerge from this analysis:

  1. Separating non-IO costs: The random_page_cost currently conflates I/O costs with other overheads like I/O combining and CPU costs. Separating these into dedicated parameters would provide more granular control.

  2. Improving cached data statistics: The effective_cache_size parameter provides only a rough estimate of cache behavior. Better statistics on which data is actually cached would improve planning decisions.

  3. Incorporating prefetching into costing: Given prefetching's significant impact on performance, it should be reflected in the cost model, though predicting its effects remains challenging.

This analysis demonstrates that PostgreSQL's cost model, while sophisticated, requires ongoing refinement to keep pace with evolving hardware and workload characteristics. The gap between theoretical costs and actual performance highlights the importance of empirical measurement and continuous tuning in database administration.

Comments

Loading comments...