Form3's journey to active/active/active multi-cloud reveals both the technical breakthroughs and business realities that determine when triple-cloud architectures make sense.

Form3's Multi-Cloud Journey: Technical Innovation Meets Market Reality

At QCon London 2026, Kevin Holditch and Ross McFarlane from Form3 delivered a candid presentation about their experience running a payments platform simultaneously across AWS, Google Cloud, and Azure. The talk revealed both the technical innovations required for triple-cloud architectures and the market realities that sometimes make them unnecessary.

The Regulatory Push That Started It All

The UK's banking regulator raised concerns about cloud concentration risk around 2021, worried that too many financial institutions depending on a single cloud provider could create systemic vulnerabilities. One of Form3's largest banking customers demanded a multi-cloud strategy, and that requirement cascaded down to Form3.

Form3 processes account-to-account payments for major UK banks, handling billions of pounds in annual transaction volume. Their original architecture was deeply coupled to AWS, relying on ECS, SQS, and RDS. The team had intentionally embraced that coupling when they were a handful of engineers who needed to ship fast.

Building V2: The Triple-Active Platform

Their V2 platform runs Kubernetes clusters independently in each of the three clouds, connected via private network links. The team chose NATS JetStream as a cross-cloud message broker and CockroachDB for distributed data storage, both selected specifically because they could operate as single logical clusters spanning all three environments.

They also migrated from Java to Go for their microservices, citing smaller deployment footprints and better readability across repositories.

Three Engineering Challenges That Proved Stubborn

Holditch highlighted three technical hurdles that required creative solutions:

Bootstrapping CockroachDB Across Independent Kubernetes Clusters

The team invented a clever DNS hack: a pseudo-suffix scheme that inserts the cloud name into Kubernetes DNS addresses, with forwarding and rewrite rules to route queries between clusters. This allowed the distributed database to discover and connect across cloud boundaries.

Protecting Database Quorum During Node Maintenance

They built a custom operator called XPDB (cross-cluster pod disruption budget) that enforces disruption limits across all three clouds rather than within each one individually. This ensures that node maintenance in one cloud doesn't accidentally break the database quorum.

Keeping Node Pools Updated Across Multiple Clouds

A painful day-two problem led them to build the Cluster Lifecycle Operator, which consolidated hundreds of pull requests into one per platform. This operator handles node pool updates across multiple clouds, environments, and geographies.

The Real-World Payoff

During a major Google Cloud outage last summer, Holditch described checking his laptop and finding only a low-severity alert about some crash-looping pods in GCP, while payments continued to flow through the other clouds without interruption. The triple-active architecture worked exactly as designed.

When Multi-Cloud Falls Flat: The US Expansion

The second half of the talk took an unexpected turn. When Form3 expanded into the US market, their state-of-the-art triple-active setup fell flat.

American customers expected geographical resilience, East Coast primary with West Coast disaster recovery, and found the multi-cloud pitch unfamiliar. Latency was also a hard constraint: spreading CockroachDB quorum across the continent would burn through SLAs on every write.

So Form3 stepped backward. They built an active-standby architecture with AWS on the East Coast and GCP on the West Coast, relying on backup-and-restore rather than real-time replication.

Learning from Failure

Their first real incident came just two weeks after go-live, when an AWS outage knocked out their VPN connection to the payment scheme. The team debated failing over but ultimately waited for AWS to recover, the right call in hindsight, though McFarlane admitted it didn't feel like it at the time.

They're now working to close the gap: adding CockroachDB logical replication between clouds and replicating NATS event streams to the standby site, which should dramatically reduce recovery time. They're also building per-customer failover capability so individual tenants can rehearse disaster recovery without disrupting others on the shared platform.

Three Pillars That Made Multi-Cloud Work

Holditch closed with three pillars that made multi-cloud work in the UK:

Cloud-agnostic technology choices
Single logical data stores across clouds
Treating each cloud provider as an availability zone

But he was equally direct about when not to bother. If your market doesn't value it, if your budget can't sustain it, or if you lack a strong platform engineering team to run it, triple-active multi-cloud is probably not worth the effort.

As he put it: "bankruptcy is kind of incompatible with uptime."

The Bottom Line

Form3's journey demonstrates that multi-cloud architectures are achievable with the right technology choices and engineering discipline. However, the US expansion story serves as a crucial reminder that architectural decisions must align with market expectations and business realities, not just technical ideals.

The team's willingness to share both their successes and their missteps provides valuable lessons for any organization considering multi-cloud strategies: sometimes the most sophisticated architecture isn't the right one for your customers.

#Multi-Cloud #Kubernetes #payments #Disaster Recovery #Go

QCon London 2026: How To Run on Three Clouds at Once, and When Not To