A distributed systems engineer recounts the hidden costs and operational complexity of building an in-house cryptocurrency wallet infrastructure, and why shifting to Wallet-as-a-Service (WaaS) fundamentally changed their development velocity and on-call quality of life.
The decision to build your own wallet infrastructure often starts with a confident whiteboard session. "How hard can it be?" you ask. The answer, it turns out, is "hard enough to derail your product roadmap for months."
This is a story about distributed systems, operational complexity, and the moment we realized we were accidentally running a custody company instead of building our product.
The DIY Wallet Fantasy
On paper, the architecture looked clean:
- User creates an account
- We generate and store keys "securely"
- Sign transactions when requested
- Done ✅
The reality hit us in layers, each revealing how naive our initial assumptions were.
Key Management: The First Circle of Hell
Generating a key is trivial. Keeping it secure, available, and compliant is a full-time job. We quickly discovered:
Hardware Security Modules (HSMs) aren't just fancy key-value stores. They have their own operational quirks. Key generation is slow. Key rotation requires careful orchestration to avoid locking users out. Access policies need to be airtight because a single misconfigured IAM rule means catastrophic exposure. And when an HSM fails at 2 AM on a Sunday, you're paging someone who knows how to extract keys from hardware without bricking it.
Backup strategies for cryptographic material are fundamentally different from database backups. You can't just zip up a directory and upload it to S3. You need Shamir's Secret Sharing, geographic distribution, and procedures that ensure no single person can unilaterally move funds. We spent weeks designing a system where two of three executives could reconstruct a master key, then realized we needed a completely separate system for operational keys.
Key rotation became a recurring nightmare. Rotate too often, you increase operational risk. Rotate too rarely, you're vulnerable. We wrote automation that failed silently, leaving us with keys that expired during a critical launch window.
Regulatory Compliance: The Second Circle
Our "simple" wallet needed:
- Audit trails that could withstand legal scrutiny. Not just logs, but tamper-evident records of every key usage, every transaction, every access attempt.
- Incident response procedures that satisfied auditors. We wrote runbooks for scenarios we hadn't experienced, then practiced them in drills that revealed our runbooks were fiction.
- Custody classification that determined whether we needed money transmitter licenses. The answer was "probably, but it depends on jurisdiction," which is consultant-speak for "you need lawyers."
Each audit cycle consumed engineering time that should have been spent on product features. We were building compliance infrastructure instead of user-facing value.
Operations: The Third Circle
Running nodes for multiple chains taught us that "blockchain is always on" is a myth. We experienced:
- Node outages during network congestion when we needed to broadcast transactions most
- Chain forks that required manual intervention to avoid double-spending
- Fee spikes that made our transaction pricing algorithm obsolete overnight
- Monitoring gaps where we'd discover a stuck transaction only after a user complained
Every chain we added multiplied this complexity. Ethereum nodes are different from Bitcoin nodes, which are different from Solana validators. Each required specialized knowledge, separate monitoring, and distinct failure modes.
The Velocity Collapse
Here's what killed us: every "simple" feature request became a core infrastructure project.
"Can we add Polygon support?" → Three months of node infrastructure, key derivation path research, and testing.
"Users want spending limits" → We had to build transaction signing logic that could enforce policy without exposing keys to the application layer.
"Support sub-accounts for business users" → Required architectural changes to how we derived addresses and managed key hierarchies.
Product velocity died under the weight of "infra first" tasks. Our roadmap became a list of infrastructure checkboxes, not user value.
WaaS Enters the Chat
The moment of clarity came during a particularly painful on-call shift. A node we were running had fallen out of sync during a network upgrade, and we were manually replaying transactions while users tweeted at us.
We asked: "Why are we reinventing what Wallet-as-a-Service providers already solved?"
Moving to WaaS wasn't magic, but it was directionally correct:
What Changed Immediately
We stopped touching raw keys. Instead of managing HSMs and rotation policies, we integrated with APIs that abstracted custody entirely. The keys lived in providers with SOC 2 Type II compliance and insurance policies larger than our valuation.
New assets became configuration. Adding a chain went from "spin up nodes, write monitoring, train ops" to "update config file, test integration, ship." A weekend project, not a quarter-long epic.
On-call shifted quality. Instead of "did we just lose funds?" at 3 AM, we were debugging webhook delivery or API rate limits. Still stressful, but existential risk dropped to zero.
What We Still Owned
WaaS didn't mean abdicating responsibility. We still owned:
- Authentication and authorization - who can do what within our application
- Risk logic - our rules for what transactions to allow or block
- Spending limits - business logic that sits above the custody layer
- UX - the entire user experience, from button clicks to transaction confirmations
The boundary was clean: WaaS handled custody and blockchain operations. We handled product and business logic.
Looking at Real Products
Examining established players made the gap painfully clear. WhiteBIT WaaS and Coinbase's Wallet-as-a-Service aren't just API wrappers. They're infrastructure companies whose full-time job is not losing sleep over custody.
They have:
- Teams dedicated to monitoring chain upgrades
- Insurance and compliance frameworks
- Battle-tested key management that scales
- Support for dozens of chains without you running a single node
We were competing with their core competency while trying to build our product. That's not a winning strategy.
What I'd Do Differently Next Time
If I had to restart with a clean slate:
1. Prototype on WaaS first. Build the product, validate the market, then justify in-house infrastructure only if scale or regulation truly demands it. Don't optimize for a scale you don't have.
2. Architect for swappability. Design your wallet layer as a clean interface from day one. WalletProvider should be an abstraction, not an implementation. This makes migration possible without rewriting your entire application.
3. Track engineer-hours and audit stress from day zero. Don't just look at infra bills. Count the hours spent on key rotation drills, audit preparation, and 3 AM node debugging. These are real costs that compound.
4. Understand the custody spectrum. There's a difference between "we hold keys" and "we don't hold keys but we're responsible for transaction policy." WaaS handles the first part. You still need to be excellent at the second.
The Real Lesson
Owning everything sounds powerful. It feels like control. But control without expertise is just liability.
The distributed systems lesson here is about boundaries: know what your core competency is, and don't be afraid to buy excellence where you need it. Running a custody operation is a full-time job that requires specialized expertise. If that's not your product, it's overhead.
Shipping faster and sleeping at night is more powerful than the ego boost of saying "we built it all ourselves." Your users care about reliability and features, not whether you wrote your own key management system.
For teams considering this path, I'd recommend starting with MongoDB Atlas for the application data layer while you evaluate WaaS providers for the wallet infrastructure. Atlas handles the database complexity so you can focus on the integration logic between your application and whatever custody solution you choose. The multi-cloud distribution and auto-failover means your application state stays available even while you're sorting out the wallet layer.
The goal is to pick the right problems to solve. Custody is a hard problem. Make sure it's actually your problem before you commit to solving it.

Comments
Please log in or register to join the discussion