An in-depth analysis of the engineering principles behind building financial exchanges with extreme reliability and performance, focusing on single-threaded architectures, consensus algorithms, and deterministic systems.
Building High-Performance Financial Exchanges: Determinism, Consensus, and Sub-Millisecond Latency

Financial exchanges represent some of the most demanding distributed systems in existence. They require perfect correctness, absolute fairness, 24/7 availability, and sub-millisecond response times—all while handling massive transaction volumes. Frank Yu, Director of Engineering at Coinbase, shared the engineering philosophy behind building such systems in his recent presentation at QCon San Francisco.
The Challenge: Financial Infrastructure with Extreme Requirements
Exchanges serve as financial infrastructure where participants submit orders to buy and sell assets, with the exchange providing up-to-date prices and executing trades. As Yu explains, "Everyone who participates in finance and markets, they come to us to get the up-to-date price of things and also to print trades, assign prices when folks want to trade something."
The reliability requirements are staggering. "My mind, it goes really fast, and if it goes wrong then by the time PagerDuty rings it's over," Yu shares. "The potential losses we risk are multiple orders of magnitude more than any revenue we might get from any transaction."
Core Requirements for Exchanges
- Correctness: The system must maintain accurate state at all times
- Fairness: All participants must have equal opportunity for performance
- Availability: 24/7 operation is non-negotiable
- Consistency: Predictable performance (flat P99s, P50s, and averages)
- Auditability: Ability to reconstruct market state at any microsecond in the past
These requirements stem from exchanges being trusted third parties in financial markets. "We focus so much on reliability that people in financial markets effectively outsource their reliability to the exchange because they figure, if the exchange goes down, we're toast anyways," Yu notes.
Functional Requirements: Simplifying Complexity
When building an exchange, Yu approaches the problem by first defining functional requirements through a test harness. "The last four or five times, build an exchange. What I do is open up the IDE or the whatever, build up a black box harness of user API actions or user stories to specify the behavior that I want."
A key simplification is the "price, time, priority" model that condenses pages of functional specifications into three concepts:
- Price: The value at which a participant is willing to buy or sell
- Time: When the order was submitted
- Priority: The order's position in the queue for execution
This model handles complex order matching logic. For example, when multiple orders overlap in price, the system determines execution order based on time and priority. When Bob submits a sell order at 99 that overlaps with existing buy orders at 100, the system executes trades starting with the highest priority buy order, ensuring price improvement for Bob while maintaining fairness.
Architecture Choice: Single-Threaded Determinism
The most controversial yet critical design choice is the single-threaded architecture. "If I want to handle a lot of different orders, why don't I assign different CPUs to handle orders? Then each of them can go find other matches, and this and that. That way maybe I can take advantage of parallelism and all that. Could work, for sure," Yu explains before presenting the counter-argument.
The problem with parallelism is determinism: "If you let things be concurrent, that's where you lose determinism. What we talked about, 1, 2, 3, 4, 5, all those actions really should result in the same output every single time."
Benefits of Single-Threaded Architecture
- Determinism: Same inputs always produce same outputs
- Simplified Scaling: Performance directly tied to single-thread speed
- Easier Optimization: Clear hot path to optimize
- Simplified Debugging: Entire system state fits in memory on one machine
- Perfect Reproduction: Production logs can be replayed exactly for debugging
"If I just handle all those orders in order, like I was clicking through these buttons, I just put it all in one thread. One thread, what does that mean? That means I have one program running on one CPU core, not reordering anything, just handling orders and doing stuff," Yu explains.
Consensus and Durability: Raft in the Hot Path
While the matching engine runs single-threaded, durability is achieved through the Raft consensus algorithm. "Normally what happens is your web server receives an API call, it has to do the work, and then it sends its state over Postgres' network protocol to Postgres. Postgres has to do some computation, compute the result, send out some write-ahead log stuff, but also send out the acknowledgment to you. All of that is happening blocking."
With Raft, the process changes dramatically:
- Order comes into the system
- Before processing, ensure quorum (e.g., 3 out of 5 machines) has received the request
- Process the order deterministically
- Replicate the result to followers asynchronously
"If you've got a really fast consensus implementation, then you can basically handle orders as they come in, only deal with persisting those, and then your business logic can basically be processing concurrently with the durability process," Yu explains.
Why Five Nodes, Not Three?
"We've definitely been very happy that we ran with five, and not three. Because if you ran with three, you can only suffer the failure of one machine before you lose that replication. If you run with five, two machines have to die for you to be scared. You can suffer the loss of two, and then three machines have to die for you to be broken."
System Layout: Components and Interactions
The Coinbase International Exchange architecture consists of several key components:
- API Gateways: Stateless components that handle incoming requests
- Matching Engine Cluster: Raft cluster of 5 nodes running the core matching logic
- Replicas: Strongly consistent replicas that can serve queries without perturbing the writer
- Disaster Recovery Site: Can be promoted to primary when needed

Message Flow and Optimization
Yu highlights an important optimization enabled by determinism:
"What if, instead of doing that, my system is deterministic, why don't I just replicate the request, have the request percolate to my downstream systems that care about the output, and then I just rerun my matching logic? Now I can send that mouthful of stuff over IPC, no network cost."
This approach significantly reduces network egress costs. Instead of replicating the output events (which can be large), the system replicates the small input requests and regenerates outputs locally.
"It also allows us to have what's being transported as our request stream and not the output. I'll give you another example here. What if that was a SQL and the red stuff we were sending was the write-ahead log? What if someone submitted a request to update blah, blah, blah where ID equals anything? One request comes in and may create thousands or tens of thousands of write-ahead log entries. If your system is deterministic, I'll just replicate that really annoying query instead of suddenly having a spike in my change data capture."
Performance Optimization: Making the Hot Path Fast
The single-threaded architecture means that scalability is directly tied to the performance of the hot path. "The faster your single thread runs, the more orders you can handle," Yu emphasizes.
Key Optimization Techniques
- Avoid Blocking the Hot Path: Minimize any reason for the processor to wait
- Co-locate Components: Place all hot path components as close as possible
- Simple Binary Encoding: Use flat binary formats instead of nested protocols
- Pre-allocation: Avoid dynamic memory allocation in the hot path
- CPU Pinning: Prevent the OS from moving the critical thread
- Avoid Unbounded Operations: Use indexed access instead of full scans
Data Representation
"Our internal representation of messages, we use simple binary encoding. It's basically just a bunch of bytes next to each other. The CPU knows that 6 bytes after the front of the message, that's the message type. It's super quick to decide if I even care about this message. The next 8 bytes, that's the ID of the thing I want to trade. The next 8 byte is the price. The next 8 bytes are the quantity."
This approach contrasts with more complex protocols like Protocol Buffers: "Don't even bother with UUIDs, just use Snowflake IDs. You can fit a bunch of globally unique IDs that are sortable by timestamp in 64 bits. This is much better than having deeply nested protobufs, where to even figure out if you should handle that message, you've got to remarshal it, build up this whole palace of Legos that represent your highly nested message before you even can tell what the thing is."
Memory Management
"The garbage collectors on JVMs pause for milliseconds at a time. How are you building these super responsive systems on the JVM? Your garbage collector won't run if you never call new. What we're saying is, provision all of your stuff ahead of time. You should not be allocating new memory on an order by order basis."
Yu recommends using off-heap data structures: "We do things like basically put SBE structs in an off-heap map, so that we don't need to go back and forth and deal with spooky action from a distance from the garbage collector."
Deployment and Operations: Zero-Downtime Rolling Updates
The deterministic architecture enables zero-downtime deployments through rolling updates. "If you're running in a cluster like this, you can do rolling deployments. You can deploy your code without downtime that impacts your persistence SLA."
The Process
- Deploy new code to a follower node
- Once verified, promote it to leader
- Repeat for other nodes
- Finally, update the original leader
"This effectively unifies your resiliency with your deployment. It makes the unexpected normal. You can think of this as like, ok, let's shut this down, start a new version up. There's a new leader. Then I can shut another follower down, and then I'll restart the leader last. That's how you can do blue-green deployments and have effectively no downtime, on your exchange."
Why This Matters
"We have to get to 24/7. It's just so funny, not being 24/7 seems like such an anachronism in backend computing. If you're not dealing with hot trading, presumably your systems run 24/7. Any sort of website, plenty of things with much less engineering investment run 24/7."
Long maintenance windows create financial discontinuities: "When you have these long downtimes, you create weird financial discontinuities. Users who are trying to use the exchange aren't able to submit orders and do stuff, and then they have to rely on some weird side channels or whatever to manage their own financial risk. It adds complication and adds financial risk for everyone participating on the market."
Production Results and Benefits
With this architecture, Coinbase has achieved impressive results:
- Spikes up to six figures transactions per second with no issues
- P99 response times under one millisecond
- Zero downtime deployments
- Perfect reproduction of production issues for debugging
Debugging Superpower
Yu highlights a unique benefit of the deterministic architecture: "My favorite is, if something's weird, I can do full reproduction of the functional issues because everything fits on one thread. That means the entire memory set fits on one machine. It means I can run it on my laptop. If I go ahead and find something weird in the market, I can go download the logs, the request log, replay it, and replay all of that production logic in a debugger. I will literally see what each register is and all of that."
Testing and Experimentation
"The other really cool thing, I'm going to belabor this because I really like the log replay stuff, is you can stream your request log to another stack and perturb that thing. You can run experiments on your production data and do pre-production testing of your rolling deployments, pre-production testing of your configuration changes, all because you put all your logic on one nice little request stream."
This enables safe experimentation with complex topology changes: "This gives us the ability to do complicated topology changes in the cloud, which is terrifying to anybody, but we're able to literally run that with live production streaming load. That has been a superpower for a lot of us."
Trade-offs and Considerations
While this architecture delivers exceptional performance and reliability, it comes with important trade-offs:
Resource Utilization
"The one that's hot is pegged, yes," Yu confirms when asked about CPU utilization. "The only thing that needs to spin is the core logic that is handling orders as fast as possible. Everything else is just doing what normal CPUs do, which is sit idle."
This approach prioritizes predictable latency over efficient resource utilization, which is appropriate for financial systems where latency outliers can have significant financial consequences.
Vertical Scaling
Yu advocates for vertical scaling over horizontal scaling for the core matching engine: "Yes. You should vertically scale your core logic boxes. Do you just go big on number of CPUs and memory? Do you go beefy on the machines to reduce the number of machines you need to run?"
This contrasts with the horizontal scaling favored by many large internet companies: "Which is a bit of an antipattern from how Google did things. Also, CPUs are super cheap now, and so is memory."
The reasoning is simple: "What you can do is you care about clock speed, but you don't care about a lot of cores. You care about clock speed, so you can just have a small number of really hot machines, more or less."
System Complexity
While the core matching engine is intentionally simple, the overall system includes multiple components that add complexity:
- Raft consensus implementation
- API gateways with rate limiting
- Multiple replicas for different purposes
- Cross-region replication for disaster recovery
The key is to keep the hot path simple while allowing flexibility in supporting components.
Conclusion: The Engineering Ideal of Simplicity
"All of this is basically chasing. Everything you get here is chasing the Eng ideal of simplicity. Because if you're simple, it's really easy to make that simple thing stable because you can just do a lot of different things with it. You don't have to worry about all these different variables," Yu concludes.
The presentation demonstrates how focusing on determinism through single-threaded architecture, combined with consensus for durability, creates a foundation for building extremely reliable and performant financial systems. By simplifying the core logic and carefully managing the hot path, the system achieves sub-millisecond latencies while maintaining 24/7 availability.
"The thing you get is if you want to just remove a lot of the stuff in your architecture diagram, there's probably easy 10X opportunities in your system right now. That's really it. Simplify your systems, everybody."

Comments
Please log in or register to join the discussion