Sergiu Petean explains how a regulated European insurer transformed a legacy DevOps organization into a cloud‑native platform engineering group, using dynamic reference architectures, stakeholder‑driven KPIs, and open‑source tooling to improve safety, performance and innovation sovereignty.

From Legacy to Sovereignty: Platform Engineering Lessons for Insurance

Why platform engineering matters for safety and performance

In heavily regulated domains such as European insurance, the cost of a single security breach or a compliance miss can dwarf any development budget. Moving from a monolithic DevOps setup to a platform engineering model gives teams three concrete safety/performance benefits:

Isolation of risk – By exposing self‑service APIs and encapsulating shared services (IAM, observability, compliance checks) behind a well‑defined contract, accidental configuration drift is prevented. The platform owns the guardrails; developers only consume them.
Predictable delivery – Platform‑wide DORA metrics (lead time, change failure rate, MTTR, deployment frequency) become a shared baseline. When the platform enforces a minimum test coverage and automated security scanning, the variance in reliability across squads drops dramatically.
Cost transparency – A per‑change cost model (e.g., €49 → €13 per change) ties resource consumption directly to business outcomes, allowing finance leaders to audit spend without sacrificing safety.

Dynamic reference architectures replace static blueprints

Petean described the failure of a single, static reference diagram. Instead, the organization built clustered reference architectures that vary by:

Regulatory zone (GDPR, Solvency II, local data‑ residency)
Talent profile (in‑house Rust expertise vs. outsourced Go teams)
Cloud topology (public‑cloud, private‑cloud, hybrid)

Each cluster is expressed as a set of reusable Helm charts and Terraform modules stored in a version‑controlled platform catalog. The catalog is consumed via a GitHub‑based CI pipeline that validates every change against a policy engine (OPA) before it reaches production. This approach keeps the architecture evolvable while guaranteeing that every new service inherits the same safety checks.

Stakeholder‑driven KPIs bridge the board and the codebase

Traditional DevOps KPIs focus on engineering output. Petean extended the set to include:

Stakeholder	KPI	Safety/Performance impact
COO	Cost per change	Drives FinOps discipline, forces teams to eliminate waste
CFO	Compliance coverage ratio	Guarantees that 100 % of releases pass mandatory audits
CISO	Vulnerability‑to‑remediation time	Reduces exposure window for critical CVEs
CTO	Feature‑to‑market latency	Encourages rapid, yet safe, experimentation

The platform surfaces these metrics on a unified dashboard (Grafana + Prometheus) and feeds them into the quarterly business review. Because the numbers are tied to concrete platform services, the board can ask “What would happen to our compliance ratio if we cut the security scanning budget?” and receive an immediate, data‑driven answer.

Open‑source tooling as the backbone of innovation sovereignty

To avoid vendor lock‑in, the team built the stack almost entirely from CNCF projects:

Kubernetes for workload orchestration
Tekton for CI/CD pipelines (later migrated to GitHub Actions where stability improved)
OPA for policy‑as‑code
Prisma Wiz for vulnerability scanning, wrapped in a custom vulnerability‑management service that de‑duplicates alerts and pushes actionable tickets to Opsgenie.

Because the tooling is open source, the organization retains the ability to relocate workloads from a hyperscaler to a private cloud with only a few configuration changes. This innovation sovereignty is a strategic asset when regulatory pressure demands data‑locality.

Reducing cognitive load with custom team topologies

Petean introduced a Distributed DevOps (DDO) model that rotates engineers through short‑term, cross‑functional squads (API, observability, security, storage). Each DDO lasts three to six months, after which the engineer moves to a new domain. Benefits include:

Focused expertise – Engineers avoid the “jack‑of‑all‑trades” trap and become deep specialists in a single subsystem.
Knowledge diffusion – Rotations spread best practices across the organization, lowering the overall cognitive load.
Clear ownership – The platform catalog records the responsible DDO for every service, simplifying incident triage.

Federated SRE as a service (OrgOps)

When the organization grew to 2 000 engineers, a single centralized SRE team became a bottleneck. The solution was a federated SRE model:

A core OrgOps team defines standards, provides shared tooling, and runs the compliance‑as‑code pipeline.
Individual squads adopt the standards and run their own “SRE‑as‑a‑service” instances, reporting back via shared Service Level Objectives (SLOs).
A production manager role aggregates incident data, ensuring that only alerts with runbooks reach the on‑call rotation.

This hierarchy preserves high reliability (SLO breach rate < 0.1 %) while keeping the cost of on‑call duty proportional to the value delivered.

Measuring impact: the extended DORA framework

Beyond the classic four DORA metrics, the team added two financial dimensions:

Cost per deployment – calculated from cloud‑resource usage and CI minutes.
Revenue impact per change – derived from A/B test results on new pricing models.

The extended dashboard showed a steady decline in cost per change (from €49 to €13) while the number of daily deployments rose from 3 to 12. This quantitative evidence convinced the CFO that the platform was a profit centre rather than a cost centre.

Key takeaways for Rust‑focused insurers

Treat the platform as a safety contract – expose only verified, Rust‑compiled binaries through the platform’s API gateway; the compiler enforces memory safety before code ever reaches production.
Automate compliance at the compiler level – use cargo-audit and custom lint rules to embed regulatory checks into the build pipeline.
Expose platform KPIs as Rust metrics – implement metrics crates that push data to Prometheus, allowing the same safety‑first mindset to extend to observability.
Invest in open‑source – the ability to move a Rust‑based microservice from AWS to an on‑prem Kubernetes cluster without rewriting code is the core of innovation sovereignty.

Closing thoughts

The journey from legacy DevOps to a sovereign, safety‑centric platform engineering organization is not a single project but a series of incremental experiments. By anchoring every decision in stakeholder‑driven KPIs, dynamic reference architectures, and open‑source tooling, an insurer can achieve both regulatory compliance and the performance needed to compete in the AI‑driven future.

From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering - InfoQ

#Platform Engineering #Insurance #regulation #Open Source #Rust

From Legacy to Sovereignty: Platform Engineering Lessons for Insurance

From Legacy to Sovereignty: Platform Engineering Lessons for Insurance

Why platform engineering matters for safety and performance

Dynamic reference architectures replace static blueprints

Stakeholder‑driven KPIs bridge the board and the codebase

Open‑source tooling as the backbone of innovation sovereignty

Reducing cognitive load with custom team topologies

Federated SRE as a service (OrgOps)

Measuring impact: the extended DORA framework

Key takeaways for Rust‑focused insurers

Closing thoughts

Comments