A detailed analysis of breaking down infrastructure costs into per-request metrics to drive data-driven optimization in distributed systems.
Cost Per Request Modeling: Optimizing Distributed System Economics
In complex distributed systems, infrastructure costs can spiral out of control without clear visibility into what drives expenses. Cost per request modeling provides a granular approach to understanding and optimizing these costs by decomposing them into the cost of serving individual requests. This metric enables data-driven optimization: if a request costs $0.001 and you serve 100 million requests per month, a 20% reduction saves $20,000 monthly. More importantly, understanding per-request costs reveals which features, endpoints, or user segments are profitable and which may need rethinking.
The Problem with Monolithic Cost Metrics
Traditional infrastructure monitoring focuses on aggregate costs—total monthly bill, instance utilization, or bandwidth consumed. While these metrics provide high-level visibility, they fail to reveal how specific features or user behaviors impact costs. A single expensive feature can disproportionately increase costs across all users, while a popular but efficient feature might appear costly simply because of high volume.
Without per-request cost attribution, optimization efforts become guesswork. Teams might optimize the wrong components or implement changes that reduce costs for low-volume features while missing the true cost drivers in high-traffic areas.
Breaking Down Per-Request Costs
Compute Costs
Compute cost is calculated from the resources consumed during request processing. For a containerized service with 500 millicores of CPU and 512 MB of memory allocation, running on instances costing $50/month with 100 requests per second capacity, the per-request compute cost is approximately $0.0000058 (50 / 30 / 86400 / 100 * 500/1000).
This minimal per-request cost compounds through the chain of services involved—the API gateway, multiple backend services, and background job processors all contribute. A single user request might trigger compute costs across five services, multiplying the base cost significantly.
Storage Costs
Storage cost includes database capacity, object storage, and caching layers. A request that reads 10 KB of product data, writes 2 KB of order data, and caches the result for 60 seconds has a direct storage cost plus the amortized cost of the storage infrastructure.
Database IOPS and provisioned throughput costs are usually larger than raw storage costs. For relational databases, each request's storage cost must account for the total database cost divided across all served requests, not just the marginal cost. This means a small database serving many requests has lower per-request storage costs than a large database with fewer requests.
Network Costs
Network cost is often the easiest to quantify. Cloud providers charge for inter-zone, inter-region, and egress traffic. A request that enters through the load balancer, hits three services across two availability zones, and returns a 50 KB response incurs network costs at each hop.
Network costs scale linearly with response size and service depth. Optimizing network cost involves reducing response sizes (compression, partial responses), collocating services in the same availability zone, and using internal load balancers. For example, reducing a response from 50 KB to 20 KB through compression can reduce network costs by 60%, but adds CPU compression costs that must be factored into the total per-request cost.
Database Costs
Database per-request cost depends on query complexity and data volume. A simple primary key lookup costs less than a full-text search or a join across multiple tables. Write operations generally cost more than reads—they require transaction log writes, index updates, and replication.
The database cost per request should include: query CPU time, IOPS consumed, data transfer, and a proportional share of the database instance cost. For serverless databases (Aurora Serverless, DynamoDB on-demand), per-request costs are directly observable. These services charge per request rather than provisioned capacity, making cost attribution straightforward but potentially expensive for high-volume workloads.
Optimization Strategies and Trade-offs
Caching Strategies
Caching reduces all cost dimensions simultaneously. A cached response eliminates compute, storage, and network costs for the downstream services. Cache hit ratio directly multiplies cost savings. A 99% cache hit ratio means the full request cost is paid for only 1% of requests.
The cache layer itself has a cost (Redis nodes, CDN bandwidth), but this is typically far smaller than the cost of serving requests from origin. However, caching introduces trade-offs: increased latency for cache misses, potential stale data, and complexity in cache invalidation strategies. For applications requiring strong consistency, caching may not be viable, forcing higher costs but ensuring data accuracy.
Request Batching
Request batching reduces per-request overhead. Instead of making 20 individual requests to fetch related data, a single batch request reduces network round trips, database queries, and serialization overhead. The per-unit cost decreases as batch size increases, subject to diminishing returns at very large batch sizes.
Batch endpoints should have reasonable maximum sizes to prevent memory and latency issues. For example, batching 100 requests might reduce per-request costs by 70%, but batching 1,000 requests might only provide an additional 10% reduction while significantly increasing response latency and memory pressure.
Right-Sizing Infrastructure
Right-sizing infrastructure is the fundamental cost optimization. Over-provisioned services waste money on idle capacity. Under-provisioned services waste money on performance-related customer churn.
Autoscaling policies should target 60-70% utilization during peak—low enough to handle traffic spikes, high enough to avoid over-provisioning. The trade-off here is between cost and availability: higher utilization reduces costs but increases risk of performance degradation during unexpected traffic spikes.
Implementation with Distributed Tracing
Cost attribution requires distributed tracing metadata. Each trace span should carry cost-related attributes: service name, instance type, data size processed, cache hit status, and database query cost. The tracing system can then sum costs across spans to compute end-to-end per-request cost.
This correlation enables architects to identify the most expensive path for any request and target optimization efforts effectively. For example, traces might reveal that product detail pages cost 3x more than category pages due to expensive recommendation queries, even though both endpoints serve similar traffic volumes.
Practical Cost Attribution
Instrument every request with cost attribution tags: feature, user segment, endpoint, service. Aggregate costs by these dimensions. Pareto principle applies—20% of features typically drive 80% of infrastructure cost.
Common high-cost patterns include:
- Expensive database queries in hot paths
- Excessive logging for high-traffic endpoints
- Unnecessary API calls in critical request flows
- Large response payloads with unused fields
By identifying these patterns, teams can prioritize optimization efforts where they'll have the most impact. For example, optimizing a single expensive query used in a popular feature might save more than optimizing ten minor queries used in rarely accessed features.
Conclusion
Cost per request modeling transforms infrastructure optimization from art to science. By breaking down costs into granular components and attributing them to specific features and endpoints, teams can make data-driven decisions that maximize efficiency without compromising performance or user experience.
The most effective optimization strategies often involve multiple techniques—caching to reduce compute and network costs, batching to reduce overhead, and right-sizing to eliminate waste. The key is continuous measurement and iteration: implement cost attribution, identify optimization opportunities, measure the impact, and repeat the process.
In distributed systems where every component interaction adds cost, understanding per-request economics isn't just about saving money—it's about building more efficient, scalable, and sustainable systems that can grow without cost becoming a limiting factor.

Comments
Please log in or register to join the discussion