Agoda engineers built Storefront, a Rust-based reverse proxy that solves uneven load distribution in S3-compatible storage by replacing DNS round-robin with intelligent request routing and credential-less authentication.
Agoda engineers have developed an internal S3-compatible proxy called Storefront to improve load balancing, reliability, and operational control for large-scale object storage traffic in the company's data platform. The proxy sits between internal services and backend object storage systems, routing requests while addressing limitations the team observed with DNS-based load distribution used by S3-compatible endpoints.

Agoda relies on object storage for data processing and analytics workloads, including pipelines that read and write large volumes of files. According to the engineering team, the S3 endpoints exposed by their storage provider, VAST Data, use DNS round-robin to distribute traffic across multiple virtual IP addresses. Application clients often cache DNS responses, which can result in repeated requests to the same backend node. This behavior led to uneven load distribution, creating hotspots in which certain nodes received disproportionate traffic while others remained underutilized.
The DNS Round-Robin Problem
DNS round-robin load balancing works by returning multiple IP addresses for a single hostname, with clients cycling through them. However, this approach assumes clients will evenly distribute requests across all returned addresses. In practice, several factors break this assumption:
- DNS caching: Operating systems and applications cache DNS responses, causing clients to repeatedly connect to the same IP addresses
- Connection persistence: HTTP clients often maintain persistent connections to specific backends
- Client-side load balancing: Some applications implement their own connection pooling that doesn't respect DNS rotation
- Geographic affinity: Clients may prefer certain endpoints based on network topology
These factors combine to create the uneven distribution pattern Agoda observed, where some backend nodes became overwhelmed while others sat idle.
Storefront's Architecture
To address this limitation, the team introduced Storefront as a reverse proxy that actively distributes S3 requests across backend nodes. The service is implemented in Rust and built on top of Pingora, an open-source proxy framework developed by Cloudflare.
Instead of relying on DNS resolution to balance traffic, Storefront evaluates backend availability and request load in real time before routing requests. Early implementations used a least-in-flight requests algorithm, which the team later refined with latency-aware scoring to improve distribution under production workloads.

The proxy also introduced operational safeguards to improve reliability:
- IO timeouts: Handle cases where certain S3 clients failed to fully consume HTTP responses, which could otherwise exhaust backend connection pools
- Cross-data-center isolation: Separate traffic into dedicated backend pools to prevent inter-region bottlenecks
- HTTP Expect: 100-continue optimization: Reduce latency for object upload requests by streamlining the pre-flight handshake
Credential-Less Authentication
Storefront integrates credential-less authentication by automatically identifying the calling Kubernetes pods and applying access controls internally. This approach centralizes permission management, enabling services to securely access object storage without directly handling credentials, reducing operational complexity, minimizing the risk of credential leaks, and simplifying compliance across large-scale distributed workloads.
As emphasized by Desmond Xu, Technical Lead at Agoda: "Storefront evolved from a simple reverse proxy into a core component of our data infrastructure. In addition to request routing, Storefront incorporates operational capabilities such as credential-less authentication by identifying calling Kubernetes pods and applying access controls internally, reducing the need for services to manage storage credentials directly."
Telemetry and Observability
The proxy also exposes telemetry via OpenTelemetry, including metrics on performance, resource utilization, traffic patterns, and S3 API usage. This comprehensive monitoring allows the team to:
- Track request latency distributions across backend nodes
- Identify emerging hotspots before they impact performance
- Monitor authentication patterns and access control effectiveness
- Analyze traffic patterns to optimize backend capacity planning
Production Impact
While specific performance metrics aren't publicly disclosed, the Storefront approach addresses several critical production concerns:
- Improved resource utilization: By actively balancing requests rather than relying on client-side DNS rotation, Storefront ensures more even backend utilization
- Enhanced reliability: The proxy can detect and route around unhealthy backends, improving overall system availability
- Simplified security: Credential-less authentication reduces the attack surface and operational overhead of managing storage credentials
- Better observability: Centralized metrics provide visibility into storage usage patterns that would be difficult to aggregate from distributed clients
Technical Implementation Details
The choice of Rust and Pingora reflects Agoda's focus on performance and reliability. Rust provides memory safety guarantees without garbage collection overhead, while Pingora offers a high-performance foundation for building HTTP proxies with features like:
- Zero-copy parsing: Minimize memory allocations for HTTP request processing
- Connection pooling: Efficiently manage backend connections across multiple requests
- Backpressure handling: Gracefully manage load spikes without overwhelming backends
- Protocol compliance: Full HTTP/1.1 and HTTP/2 support for S3 compatibility
The latency-aware scoring algorithm likely incorporates factors such as:
- Current connection count per backend
- Recent request latency distributions
- Backend health status
- Geographic proximity for multi-region deployments
Broader Implications
Storefront represents a pattern that's increasingly relevant as organizations scale their cloud-native infrastructure. DNS-based load balancing, while simple, often proves insufficient for production workloads where:
- Traffic patterns are bursty and unpredictable
- Backend capacity must be fully utilized
- Security and compliance requirements are strict
- Observability is critical for operations
By moving load balancing logic into a dedicated proxy layer, Agoda gains fine-grained control over request distribution while maintaining compatibility with existing S3 clients. This approach could be applied to other storage systems, databases, or any service where DNS-based distribution proves inadequate.
Future Considerations
The Storefront team may consider additional enhancements as the system matures:
- Predictive routing: Using machine learning to anticipate traffic patterns and pre-warm backend connections
- Multi-cloud support: Extending the proxy to balance across different storage providers
- Advanced caching: Implementing intelligent content caching to reduce backend load
- Service mesh integration: Integrating with existing service mesh infrastructure for unified traffic management
The success of Storefront demonstrates how targeted infrastructure investments can solve specific operational challenges while providing broader benefits in terms of security, observability, and reliability. As data platforms continue to scale, such specialized components will likely become increasingly common in production environments.

Comments
Please log in or register to join the discussion