CI/CD Expert Robert Erez on Kubernetes, GitOps, and the Future of Software Delivery
#DevOps

CI/CD Expert Robert Erez on Kubernetes, GitOps, and the Future of Software Delivery

DevOps Reporter
6 min read

Robert Erez of Octopus Deploy joins The Pragmatic Engineer to break down why continuous deployment overshoots most teams' needs, how GitOps misleads organizations into cramming secrets into repos, and why AI will flip CI/CD priorities from speed to risk.

Robert Erez spent years building deployment infrastructure at Skype, where canary releases taught him to ship changes to small user slices before full rollouts. He watched deployments shift from manual processes to automated pipelines, then to Kubernetes-driven systems that pull desired state from version control. Now a principal engineer at Octopus Deploy, Erez sees the same patterns repeating across the industry: teams chasing continuous deployment when continuous delivery would serve them better, adopting GitOps dogma without questioning whether Git belongs at the center, and overlooking the on-prem realities that institutions like banks and governments still face.

He sat down with Gergely Orosz to talk through what actually works in production. The conversation covers Kubernetes, feature flags, progressive delivery, ephemeral environments, and the ways AI agents will reshape deployment priorities.

Roll Forward, Never Backward

When Skype ran Kubernetes clusters for canary deployments, Erez learned a hard lesson about stateful systems. Rollbacks sound safe until you realize the code at v2 expects a different database schema than v1. You cannot roll code backward and hope the schema reverts with it. The fix: roll forward to v3, patch the bug, and keep the schema intact.

This principle holds across most systems that carry state. Erez recommends treating every production failure as a signal to advance, not retreat. Feature flags make this easier. Toggle a broken feature off instead of redeploying an entire release at 2 a.m. You stop the bleeding, diagnose calmly, and ship the fix when you are ready.

GitOps Is Not About Git

Erez pushes back on the industry's obsession with cramming everything into repositories. GitOps rests on four pillars: declarative configuration, versioned and immutable state, pulled (not pushed) delivery, and continuous reconciliation. Git fits under these constraints, but nothing requires it. The term itself has created dogma. Teams force secrets into Git repos when they should not, and they treat Git as the only source of truth when other systems fit the model.

At scale, Git becomes a bottleneck. Some companies run thousands of independent Kubernetes clusters pulling state from a single repo. The repo gets throttled. Workarounds multiply. Pull-based GitOps does not scale infinitely for free.

Continuous Delivery Beats Continuous Deployment

Shipping every commit to production is overkill for most teams. Erez argues continuous delivery holds more value. Changes flow through automated tests, the deployment pipeline itself gets validated, and a human decides when to push to production. You can click a button once a week or automate the push. The choice stays yours.

Continuous deployment makes sense only when your test suite and monitoring are mature enough to catch issues before users do. Most teams lack that confidence, and pretending otherwise invites incidents.

Kubernetes Won for a Reason

Kubernetes succeeded because it gave teams a standard interface for orchestrating containers across machines. Before Kubernetes, each cloud provider and each orchestrator had its own API. Developers wrote adapters for every platform. Kubernetes became the Linux of orchestration: a common layer that abstracted the underlying infrastructure.

Erez notes Kubernetes did not win because it was simple. It won because it solved real problems at scale, and enough companies contributed to its development that it stayed vendor-neutral. On-prem institutions, including banks and government agencies, adopt Kubernetes because they need control over hardware, upgrades, and downtime windows. Cloud-native SaaS does not fit their regulatory requirements.

Platform Teams Earn Their Keep at Scale

Platform teams make sense when multiple engineering groups share infrastructure. In a startup with one team, a platform team is overhead. In a company with dozens of teams building microservices, a platform team provides standardized tooling, deployment pipelines, and guardrails.

Erez sees platform teams as force multipliers. They reduce duplicated effort, enforce consistency, and free product teams to focus on business logic. The catch: platform teams must ship things that product teams actually want to use. Forcing adoption breeds resentment and workarounds.

Feature Flags: Addictive and Necessary

Feature flags let you decouple deployment from release. You ship code to production behind a flag, then enable it for specific users or cohorts. When something breaks, you flip the flag off instead of rolling back an entire release.

The danger: flags accumulate. Teams add flags for every small change, then forget to remove them. Dead code piles up behind dormant flags, and the codebase becomes harder to reason about. Erez compares flag cleanup to gardening. You must regularly pull weeds from the codebase or the garden overruns.

Ephemeral Environments Replace Static Staging

A few testers used to fight over a handful of static test environments. Today, spinning up a full environment for a feature branch takes minutes. The environment exists while you develop, runs automated tests, and gets torn down after merge.

Ephemeral environments speed up feedback loops. You test against real infrastructure instead of shared staging servers that other developers may be modifying. The tradeoff: cost. Running parallel environments for every branch adds compute bills. Teams must set policies around environment lifetime and cleanup.

AI Flips CI/CD Priorities from Speed to Risk

Today, shaving ten minutes off a CI build matters because human developers sit idle waiting for results. AI agents do not context-switch. They write code and babysit slow pipelines without complaint. Speed becomes less important.

Risk becomes the priority. An AI agent shipping a bug to production can cascade through systems faster than a human developer making the same mistake. Erez predicts teams will run slower, more thorough test suites, including tests that take hours, because the cost of missing a bug outweighs the cost of waiting. The calculus changes when the actor writing code does not care about wait times.

Getting Started with Progressive Delivery

Erez recommends starting small. Pick one service, add feature flags, and deploy changes behind a toggle. Monitor the service for errors. Gradually expand the rollout to more users. Once you trust the process, add canary deployments: route a percentage of traffic to the new version and compare metrics.

Do not adopt progressive delivery across your entire fleet at once. Teams that try to transform everything overnight burn out and revert to old habits. Incremental adoption builds muscle memory and trust in the tooling.

Self-Hosted CI/CD Still Matters

Some organizations cannot use cloud-based CI/CD services. Banks, government agencies, and defense contractors require full control over their infrastructure. Data cannot leave their networks. Self-hosted tools like Octopus Deploy and Jenkins give these teams the control they need.

Erez expects this segment to persist. Cloud providers will not meet every regulatory requirement, and some teams will always prefer owning their deployment stack.


Key Takeaways

  • Roll forward, not backward. When systems carry state, patching ahead beats reverting code and hoping the database follows.
  • GitOps four pillars do not require Git. Declarative, versioned, pulled, and reconciled describes a pattern, not a specific tool.
  • Continuous delivery is more practical than continuous deployment. Validate the pipeline, decide when to push, and avoid the overhead of shipping every commit.
  • Feature flags decouple deployment from release. Use them to stop the bleeding, but clean up dead flags regularly.
  • Ephemeral environments speed feedback. Spin up per-branch, tear down after merge, and watch your cost.
  • AI agents change the CI/CD equation. Speed matters less when machines wait; risk mitigation matters more.

Resources

Twitter image

Comments

Loading comments...