The article walks through three implementations of a simple HTTP file server—blocking thread‑per‑request, event‑driven epoll, and modern io_uring—showing the shared code, core differences, and the trade‑offs each model presents for network and disk I/O on Linux.
Thesis
A tiny HTTP file server is an ideal laboratory for contrasting three Linux I/O strategies. By starting with a naïve thread‑per‑request design, then refactoring to an epoll‑based event loop, and finally embracing io_uring’s submission‑completion ring, we can see how each approach handles concurrency, system‑call overhead, and the fundamental asymmetry between network sockets (which can be polled) and regular files (which cannot). The progression illustrates why io_uring is rapidly becoming the default for high‑performance servers, while also exposing the practical realities that keep epoll and even plain blocking I/O relevant.
1. Shared Foundations
Both the synchronous and asynchronous versions need a small set of utilities:
- listen_socket() – creates a non‑blocking listening socket.
- parse_http_get() – extracts the request path from a
GETline, falling back to/index.html. - mime_for() – maps common extensions to MIME types.
- build_ok_headers() / build_404() – format minimal HTTP responses.
These helpers live in common.h and keep the three servers focused on I/O strategy rather than request parsing.
2. Synchronous Thread‑Per‑Request Server
Core Argument
The simplest way to serve many clients is to spawn a detached thread for each accepted connection. The thread reads the request, opens the file, streams it with read()/write(), and then exits.
Evidence & Code Walk‑through
main()creates a listening socket and loops onaccept(). Every new descriptor is handed topthread_create()which runsserve().serve()callsparse_request()(a blockingread()loop) and thensend_response()which performs a blockingopen(), a header write, and awhile(read()/write())copy.- A tiny helper
write_all()guarantees that partial writes are retried until the buffer is exhausted.
Implications
- Simplicity – the code mirrors the textbook “one thread per client” model; no special kernel interfaces are required.
- Scalability limits – each thread consumes stack space and scheduling overhead. Under high concurrency the kernel may thrash, and the process quickly hits the per‑process thread limit.
- Disk I/O is still blocking – even though the network socket is owned by a single thread, the
read()of the file blocks the entire thread, preventing it from serving other connections.
3. Epoll‑Based Event Loop
Core Argument
Epoll lets a single thread monitor many sockets for readiness, eliminating the thread‑per‑connection explosion. However, epoll only works for pollable descriptors; regular files cannot be added to an epoll set.
Evidence & Code Walk‑through
- The listening socket is set
O_NONBLOCKand added to an epoll instance withEPOLLIN. epoll_step()callsepoll_wait()and distinguishes two cases:- NULL user data – the event is the listening socket; accept all pending connections, make each new socket non‑blocking, allocate a
struct conn, and register it forEPOLLIN. - Pointer to
struct conn– the event belongs to a client connection;on_readable()accumulates request bytes until\r\n\r\nis seen, then switches the interest toEPOLLOUT.
- NULL user data – the event is the listening socket; accept all pending connections, make each new socket non‑blocking, allocate a
on_writable()writes any pending response headers, then reads from the file descriptor (still blocking) and writes the body until the socket would block.
Implications
- Network I/O becomes non‑blocking – the server can handle thousands of sockets with a single thread.
- Disk I/O remains synchronous – because regular files cannot be polled, the code still performs a blocking
read()insideon_writable(). To avoid blocking the event loop, a separate thread pool would be required, re‑introducing complexity. - State machine complexity – each connection now carries its own read/write offsets, and the epoll loop must carefully toggle interest flags.
4. io_uring‑Based Server
Core Argument
io_uring unifies network and disk I/O under a single asynchronous interface. By submitting batches of operations to a kernel‑managed ring buffer, a single thread can drive both sockets and files without ever blocking on a system call.
Evidence & Code Walk‑through
io_uring_queue_init()creates a submission/completion ring.- A multishot accept (
io_uring_prep_multishot_accept) is submitted once; the kernel repeatedly generates accept completions, each carrying a new client fd. - Each completion carries a
cb_ctxstructure with a callback pointer and opaque user data. The main loop iterates over completions, invoking the appropriate callback (on_accept,on_recv,on_write_headers,on_read_file,on_write_file,on_close). - The callbacks chain the logical steps:
on_accept→ schedulerecv.on_recv→ accumulate request, then callstart_response.start_response→ prepare headers, open the file, schedule asendof the headers.on_write_headers→ once headers are flushed, schedule areadfrom the file.on_read_file→ schedule asendof the file chunk.on_write_file→ loop back to anotherreaduntil EOF, then close.
- All file reads (
io_uring_prep_read) are truly asynchronous; the kernel may perform the I/O on a worker thread, but the application never blocks.
Implications
- Universal async I/O – sockets and regular files are handled through the same mechanism, removing the need for a separate thread pool.
- Reduced syscall overhead – up to the queue depth (commonly 256 or more) submissions are merged into a single
io_uring_submitsyscall, cutting per‑operation cost dramatically. - Back‑pressure handling – the kernel reports completion status; if a read returns 0 or an error, the corresponding callback can close the connection immediately.
- Complexity shift – the mental model moves from explicit state machines to a callback‑driven pipeline. Managing the lifetime of
cb_ctxobjects and ensuring they are freed only whenIORING_CQE_F_MOREis not set adds subtle bookkeeping. - Ring‑buffer limits – if
io_uring_get_sqe()returnsNULLthe submission queue is full. The article notes that a production server would need a fallback queue or a retry loop; the sample code omits this for brevity.
5. Comparative Implications
| Aspect | Thread‑per‑request | epoll | io_uring |
|---|---|---|---|
| Concurrency model | One OS thread per client | Single thread multiplexing sockets | Single thread multiplexing sockets and files |
| System‑call cost | accept, read, write, open per request (blocking) |
epoll_wait + non‑blocking read/write (still many syscalls) |
Batch io_uring_submit + completions, far fewer syscalls |
| Disk I/O handling | Blocking read on the worker thread |
Must offload to a thread pool or accept blocking reads | Native asynchronous reads via the kernel |
| Memory overhead | Stack per thread (often 1 MiB) | Minimal per‑connection state (struct conn) | Similar per‑connection state; plus ring buffers (few KiB) |
| Scalability ceiling | Limited by thread limits and scheduler overhead | Scales to tens of thousands of sockets, but disk I/O becomes bottleneck | Scales best for high‑concurrency workloads that involve both network and storage |
| Implementation complexity | Low – straightforward procedural code | Moderate – state machine, edge‑trigger handling | |
| Portability | Works on any POSIX system | Linux‑specific, but widely available | |
| Future‑proofing | Hard to evolve without major redesign | ||
| Epoll + thread pool | Could approximate io_uring performance for disk I/O | ||
| io_uring | Already handles both domains; kernel may still use worker threads internally |
When to Choose Which?
- Low traffic, simple deployments – the blocking version is acceptable; its clarity outweighs performance concerns.
- High connection count, disk‑light workloads – epoll gives excellent network throughput with a single thread, and a modest thread pool can handle disk reads.
- Heavy storage traffic, high concurrency – io_uring shines because it eliminates the need for a separate thread pool and reduces syscall overhead, delivering better latency and throughput.
6. Counter‑Perspectives
- Kernel maturity – io_uring is relatively new; older kernels lack some features (e.g., multishot accept, fixed buffers). Deployments on legacy distributions may need to fall back to epoll.
- Debugging difficulty – the callback‑centric flow can be harder to trace than a linear state machine. Tools such as
trace-cmdorperfbecome essential. - Library support – many high‑level frameworks still expose epoll‑style APIs (e.g., libevent, libuv). Integrating io_uring may require additional wrappers or waiting for ecosystem adoption.
- Resource contention – io_uring’s internal worker threads still compete for CPU with the application thread; in CPU‑bound scenarios the theoretical advantage narrows.
7. Closing Thoughts
By walking through a concrete file server, the article demonstrates that the choice of I/O primitive is not merely a performance tweak but a redesign of how an application thinks about work. Synchronous code is easy to read but does not scale; epoll introduces an event‑driven mindset that solves network concurrency but leaves disk I/O as a lingering blocker; io_uring unifies the model, allowing truly asynchronous handling of both sockets and files while dramatically cutting syscall overhead. The trade‑offs—complexity, kernel version requirements, and debugging ergonomics—mean that the “best” solution depends on workload characteristics and operational constraints. Nevertheless, for any new Linux service that expects moderate to high concurrency and non‑trivial disk access, io_uring presents a compelling default that future‑proofs the codebase.
Further reading
- Jens Axboe, Efficient I/O with io_uring – a deep dive into kernel design.
- The liburing project: https://github.com/axboe/liburing
- Epoll man page: https://man7.org/linux/man-pages/man7/epoll.7.html
- A comparison of thread‑per‑request vs. event‑driven servers: https://www.scs.stanford.edu/~dm/blog/epoll.pdf
Phil Eaton is the founder of The Consensus and a former Postgres contributor. He maintains the Software Internals Discord and co‑runs NYC Systems.
Comments
Please log in or register to join the discussion