Data Engineering as Applied Software Engineering: Core Principles for Modern Architectures

Industry experts at QCon London 2026 reframe data engineering as rigorous software engineering applied to data systems, emphasizing foundational principles over tools in evolving architectures.

At QCon London 2026, panelists Fabiane Nardon (TOTVS), Matthias Niehoff (codecentric AG), Adi Polak (Confluent), and Sarah Usher articulated a paradigm shift in data engineering: it is software engineering applied to data, not a distinct discipline. This distinction matters fundamentally for system reliability and long-term maintainability. When organizations treat data pipelines as software artifacts – with version control, testing, and CI/CD – they prevent costly data corruption and processing failures.

Software Rigor in Data Systems

Matthias Niehoff emphasized that data engineering requires the same discipline as software development: "Your practices from software engineering – testing, CI/CD – are actually valuable in the data world." Sarah Usher demonstrated this with her approach to test-driven development (TDD) for data pipelines: "I do outside-in TDD for data pipelines, applying the same logical rigor as service development." This methodology prevents schema drift and data quality regressions that cascade through downstream systems.

The Fallacy of Role Separation

Panelists challenged rigid role definitions. Usher stated: "I believe in data engineering, not data engineers. Distributed systems principles apply whether building product services or data pipelines." Attempts to silo "platform engineers" from "data modelers" create architectural fragility. Adi Polak noted that production emergencies reveal why deep system understanding matters: "Who solves a Kafka broker failure at 2 AM? The engineer who understands distributed systems internals, not just pipeline configuration."

Panel: Modern Data Architectures - InfoQ

AI's Impact on Data Foundations

The AI boom intensifies pressure on data platforms. Polak observed: "AI applications demand low-latency data access – suddenly five-minute SLAs aren't good enough." This forces reevaluation of indexing, caching, and streaming architectures. However, panelists cautioned against chasing AI hype without data fundamentals. Fabiane Nardon warned: "Without quality data, agents hallucinate. Regulated industries especially must prioritize encryption and governance before generative AI."

Architectural Guardrails for Evolution

When evaluating technologies, Niehoff advocated for standards-based flexibility: "Use open formats like Apache Iceberg so you're not locked into vendors." For hybrid cloud environments, he emphasized cost-aware design: "Cloud egress fees can bankrupt poorly architected systems. Query pushdown to on-prem nodes often beats bulk data transfer."

In regulated sectors (healthcare, finance), Polak noted tradeoffs: "Open source lacks embedded governance controls. Enterprise platforms offer column masking and RBAC, but require careful vendor due diligence." This aligns with Rust's philosophy – explicit tradeoffs yield safer systems than implicit "magic."

Sustainable Skill Investment

With rapid tool proliferation, the panel advocated for foundational skills over specific frameworks. Usher stated: "Upskilling existing teams beats chasing scarce specialists. Teach distributed systems principles, not just Spark APIs." Polak reinforced this with career advice: "After junior level, impact matters more than toolkit depth. Can you navigate organizational constraints to deliver business value?"

Conclusion: Data as Core Architecture

The panel consensus: data can no longer be an application byproduct. Niehoff predicted convergence: "Data platforms become central, with applications and agents connecting to them." This demands engineers who transcend traditional boundaries – equally comfortable optimizing SQL queries, debugging Kubernetes networking, and implementing row-level security. As with memory-safe languages, the future belongs to those who architect systems where correctness is inherent, not bolted-on.

For further exploration:

#Data Engineering #Software Engineering #distributed systems #CI/CD #AI