A senior data consultant's postmortem on consolidating 40 legacy systems onto a single EKS platform reads less like a victory lap and more like a field manual for the failures nobody puts in the case study. The hard lessons sit in contract tests, API gateways, and the gap between a migration plan and a migration.

Most legacy modernization stories follow the same arc. A company drowning in decades of accumulated systems hires consultants, picks a shiny target platform, and emerges months later with a clean architecture diagram and a quote about agility. The numbers always go one direction. Costs down, velocity up, everyone happy.
The more honest version, the one dr.blmk tells in a recent HackerNoon piece, is the one where you replace 40 systems and spend the next year discovering what you broke. That framing matters because the failures in this kind of work are not exotic. They repeat across organizations, and they cluster around a few predictable decisions that look correct on a whiteboard and turn expensive in production.
The single-platform trap
The instinct when you inherit 40 legacy systems is to pick one target and force everything onto it. In this case the target was Amazon EKS, the managed Kubernetes service. On paper this is the sensible move. One platform means one operational model, one deployment pipeline, one set of skills to hire for. Consolidation is the whole point.
The problem is that 40 systems built over many years do not share assumptions. Some were stateless web apps that fit Kubernetes naturally. Others were batch jobs with runtimes measured in hours, stateful services with local disk dependencies, or scheduled processes that assumed a single long-lived host. Kubernetes can run all of these, but "can run" and "should run" are different claims. Forcing a nightly batch job that processes a multi-gigabyte file into a pod with aggressive autoscaling and pod eviction means you inherit a new class of failure the original system never had.
The lesson here is not that EKS was the wrong choice. It is that a single-platform target is a goal, not a constraint. The teams that succeed treat the platform as the default and budget explicitly for the exceptions, rather than pretending the exceptions do not exist until they page someone at 3 AM.
Contract tests written too late

When you split a monolith or replace a system that 39 other systems talk to, the integration points become the entire risk surface. The standard answer is contract testing, where each consumer of an API declares what it expects and each provider verifies it can deliver that shape. Tools like Pact exist for exactly this.
The mistake described in the postmortem is a familiar one. Contract tests got written, but they got written after the new services were already deployed, as a way to document existing behavior rather than to constrain it. A contract test that ratifies whatever the service currently does will pass on day one and tell you nothing. It catches regressions only if it encodes intent before the implementation drifts. Reversed, it becomes a snapshot of your bugs.
The fix is sequencing. Contracts belong at the boundary before the replacement service exists, derived from what consumers actually need, not reverse-engineered from what the new code happens to return. That ordering is unglamorous and it slows the first few weeks of a migration, which is precisely why it gets skipped.
The API gateway as a place to hide complexity
An API gateway sits in front of your services and handles cross-cutting concerns. Routing, authentication, rate limiting, request transformation. In a migration it earns its keep by letting you redirect traffic from old systems to new ones without consumers noticing, the strangler fig pattern in practice.
The failure mode is treating the gateway as a dumping ground. Every time a new service did not quite match what a legacy consumer expected, the temptation was to patch the difference at the gateway with a transformation rule. One rule is fine. A hundred rules turn the gateway into a second undocumented system, one with its own deployment risk and no owner. You set out to retire legacy complexity and quietly rebuilt it in a layer nobody monitors.
Gateways should normalize protocols and enforce policy. When they start reshaping business payloads to paper over a service that does the wrong thing, the real fix is in the service. The gateway is just deferring the bill.
What the pattern reveals
The common thread across these mistakes is the same. Each one substitutes a structural decision for the harder work of understanding what the old systems actually did. Pick one platform so you do not have to reason about workload differences. Write contracts afterward so you do not have to negotiate them upfront. Patch the gateway so you do not have to fix the service. Every shortcut trades a small amount of early effort for a large amount of late, distributed, hard-to-attribute pain.
This is why modernization budgets blow up in ways that surprise the people who approved them. The migration plan accounts for moving the systems. It rarely accounts for the discovery work, the long tail of behaviors that were never documented because they lived in the heads of people who left years ago. A system that has run for a decade has been shaped by every production incident it survived. Replacing it means re-learning those lessons or re-living them.
None of this argues against modernization. Forty legacy systems is a real cost, in security exposure, in hiring, in the compounding drag of technical debt. The argument is for honesty about where the work actually lives. The platform choice is the easy 20 percent. The contracts, the workload exceptions, and the resisting of every tempting gateway hack are the 80 percent that determines whether you end up with one clean system or 40 problems wearing a Kubernetes hat.
The value in a postmortem like this is that it names the traps before you fall into them. Anyone staring down a consolidation project can read it as a checklist of what to defend against, which is considerably cheaper than learning the same lessons one outage at a time.

Comments
Please log in or register to join the discussion