Virtual Panel - AI in the Trenches: How Developers Are Rewriting the Software Process
#AI

Virtual Panel - AI in the Trenches: How Developers Are Rewriting the Software Process

Serverless Reporter
7 min read

A panel of engineering leaders discusses how AI tools are fundamentally changing software development roles, from code authorship to orchestration, and explores the practical realities of integrating AI into production workflows.

Featured image

The integration of AI-assisted tools into software development has moved beyond hype into daily practice, fundamentally reshaping how engineering teams build, review, and maintain systems. A recent virtual panel featuring Mariia Bulycheva (Intapp), Phil Calçado (Outropy), Andreas Kollegger (Neo4j), and May Walter (Hud.io) revealed that the most significant changes aren't in the code itself, but in the roles, processes, and architectural thinking required to make AI a productive partner rather than a source of technical debt.

From Authors to Orchestrators: A Fundamental Role Shift

The panelists unanimously observed that developers are transitioning from being sole authors of code to becoming orchestrators of AI-generated output. This shift introduces a new architectural concern: "context engineering"—designing the inputs, scaffolding, and guardrails that generative agents need to produce production-ready code.

May Walter, CTO of Hud.io, explains this evolution: "Pre-AI, architecture was about ownership between teams and scalable interfaces. This introduces a new dimension: context architecture. Designing the inputs, scaffolding, and guardrails an agent needs to generate production-ready code is becoming a core part of the system, which streamlines the ability to build fast in complex environments like distributed and event-based systems."

This architectural shift becomes particularly evident in large-scale systems. Phil Calçado, CEO of Outropy, shared a concrete example from their consumer engagement platform: "We needed to change how we handle time zones in scheduling. The code change itself was maybe ten lines, but the real work was spelunking through hundreds of places that touch scheduling, figuring out each one's assumptions, and slapping unit tests to assert that the call site wasn't going to break but the change in behavior."

With tools like Cursor and Claude Code, what was projected as a six-month project was dramatically reduced. The AI helped surface all impacted locations, generate unit tests for each, and split the rollout into small, subsystem-grouped PRs with context-aware descriptions for each owning team.

The Context Problem: AI's Fundamental Limitation

A recurring theme across the panel was AI's struggle with specialized, domain-specific code. Andreas Kollegger, Senior Developer Advocate at Neo4j, notes: "Large language models (LLMs) struggle with highly specialized code requiring deep domain expertise and a holistic view of global architecture. Our codebase alone exceeds the capacity of any LLM context window, and the models themselves haven't been trained on the unique complexity within it. Simply put, AI cannot invent what it doesn't understand."

This limitation creates a new architectural pattern: context curation. Mariia Bulycheva, Senior Machine Learning Engineer at Intapp, emphasizes that "AI is only as effective as the context you provide (codebase, documentation, architecture, experimental setup for the online test). In large systems, this means curating not just code snippets but model performance data, logs, and experiment history to guide AI tools effectively."

The solution isn't better models, but better context engineering. May Walter argues: "What most teams underestimate is that the models are already good enough (and getting better) - the missing ingredient is organizational context. Waiting for 'better models' is a distraction. The real challenge is designing systems that provide the context needed to generate production-grade code: your architecture, coding standards, data boundaries, and business priorities."

Onboarding Acceleration and Its Hidden Costs

AI tools have dramatically lowered the barrier to contribution, particularly for junior developers. Phil Calçado observed this with their summer interns: "Dropping into a decade-old Rails codebase with thousands of moving parts is intimidating. But being able to say to Cursor or Claude Code, 'I'm a third-year student who knows Python and C++, explain this Rails code to me using parallels to what I know' meant they could get productive in weeks instead of burning them just figuring out the basics."

However, this acceleration comes with trade-offs. The panelists noted that while AI helps juniors navigate unfamiliar codebases faster, trust and long-term skill growth depend on mentorship, runtime feedback, and a strong ownership culture. The risk is creating "shallow" understanding—developers who can generate code without understanding why it behaves the way it does.

May Walter counters this concern: "When code generation is paired with runtime feedback, junior developers gain exposure to systems thinking from the start: how architecture behaves under load, how dependencies interact, and how changes ripple into business outcomes. Instead of spending months grinding through low-value work, they're now able to tackle more of the team's load. Done well, this doesn't skip steps - it accelerates them."

Measuring Productivity: The Vanity Metric Trap

One of the most significant insights from the panel was how AI inflates traditional productivity metrics, making them misleading. May Walter explains: "Accepted lines, commits, PRs: AI inflates those instantly, but they're vanity metrics for engineering productivity. The real signals live downstream. Release stability, incident frequency, time spent on-call, and even code churn tell us whether we're actually moving faster or just generating more fragility."

Phil Calçado was even more direct: "Not formally. And frankly, I don't buy most of the 'productivity' numbers being thrown around. In software you can massage metrics until they say whatever you want, and the AI hype cycle has made that worse. The fact that people are seriously counting lines of code again just to juice a funding round or goose a stock price is embarrassing."

The panelists agreed that meaningful metrics must be rethought. AI shifts velocity to the front of the pipeline, but unless validation loops are tight, the debt surfaces later—in bugs, regressions, and burned-out teams.

Cultural and Ethical Guardrails

The technical integration of AI tools requires equally important cultural shifts. Mariia Bulycheva notes: "The biggest change was in mindset. Teams had to move away from expecting AI suggestions to be 'correct' and instead treat them as starting points that require thorough validation, discussion, and testing."

This cultural shift extends to accountability. Phil Calçado emphasizes: "Culturally, we set expectations early: just because an AI tool wrote the change doesn't mean it isn't your code. You still own it, and you need to treat every line as if you typed it yourself."

Ethical considerations are also paramount. Andreas Kollegger shares that Neo4j established an AI Ethics Board early on: "All technology can be a force of good, yet it also requires intentional thought, action, and guidance. Because we're trusted with customer data, our developers need to apply a heightened sensitivity into any area where AI is introduced as an assistant."

Trust Through Runtime Validation

Perhaps the most important insight is how trust in AI output is established. May Walter argues: "Trust in AI output has to be earned, and the only way to earn it is with context. Every AI-generated change goes through the same standards as human-written code - reviews, tests, validation - but with one extra bar: it has to prove itself once it runs."

This runtime validation creates a feedback loop where AI becomes a partner that can be trusted because it's reasoning with the same signals engineers rely on. The panelists emphasized that AI should be treated as "first draft code" that always goes through unit tests and peer review, with the developer who submits the code remaining accountable regardless of AI assistance.

The Underutilized Frontier: Runtime-Aware Tooling

While raw code generation gets most of the attention, the panelists identified runtime-aware tooling as the underutilized frontier. May Walter explains: "The underutilized frontier isn't writing code faster, it's building validation loops and runtime-aware tooling that increase certainty before those changes ever get deployed."

This includes AI-assisted debugging, experiment setup, and documentation of complex workflows. Mariia Bulycheva notes that these areas could "drastically reduce long-term maintenance costs" but remain underutilized compared to flashy code generation demos.

Conclusion: Architecture as the Sustaining Pillar

The panel's conclusion is clear: AI tools are a multiplier, not a silver bullet. They amplify productivity only when paired with strong organizational context, clear architectural patterns, and robust validation processes. The winners in the AI race will be those who integrate it into team-level processes with accountability, trust, and systems that can evolve together in a responsible manner.

As May Walter summarizes: "AI shifts velocity to the front of the pipeline, but unless validation loops are tight, the debt surfaces later. The lesson is that AI productivity requires a learning curve and iterative approach. Once measured, adoption can be improved iteratively to capture the upside - while avoiding the trap of shipping faster but suffocating with stability issues."

The craft of software development isn't being replaced—it's being enhanced. Critical thinking and architecture awareness are more important than ever. The developers who thrive will be those who master the art of context engineering, runtime validation, and orchestration, turning AI from a productivity hack into a sustainable architectural advantage.

Author photo

Author photo

Author photo

Author photo

Comments

Loading comments...