Modern software engineering requires observability that evolves with serverless and event-driven architectures. OpenTelemetry provides a vendor-agnostic approach to collecting high-quality telemetry data, while tools like Weaver help establish shared vocabularies. When implemented as a core development practice rather than an operations task, observability significantly improves system reliability, debugging speed, and developer productivity.

How Observability and Telemetry Can Enhance the Practice of Software Engineering

Modern software architectures have evolved significantly from traditional monolithic systems to serverless, event-driven, and cell-based designs. This evolution demands a corresponding transformation in how we approach observability and telemetry. As Martin Thwaites explained in his talk at GOTO Copenhagen, "We're now building Serverless, Event Driven, Cell-based architectures, therefore the way we think about the telemetry, and ultimately observability around them, should also change."

The Evolution of Observability

Observability must adapt to match the complexity of modern systems. Where traditional monitoring focused primarily on infrastructure metrics, modern observability needs to capture the intricate interactions between distributed services, event flows, and dynamic scaling patterns. The shift from static, predictable systems to dynamic, responsive architectures requires new approaches to understanding system behavior.

"Modern observability is tightly coupled to the definitions 'modern' systems, 'modern' development processes, and 'modern' architecture," Thwaites noted. This connection means that observability practices can't remain static—they must evolve alongside our architectural patterns and development methodologies.

OpenTelemetry: Vendor-Agnostic Telemetry Collection

OpenTelemetry has emerged as a critical component in modern observability stacks. It serves as "the glue that sits between your systems, documenting what's happening (emitting their telemetry), and the system (or potentially systems plural) that help you make sense of that data," according to Thwaites.

The key advantage of OpenTelemetry is its vendor neutrality. This decoupling makes it "a developer-focused tool" that allows teams to "concentrate on producing the best telemetry you can, instead of tailoring it to make it work within your current product." This approach frees developers from being locked into specific observability vendors while ensuring consistent data collection across the organization.

What Constitutes Good Telemetry?

Good telemetry goes beyond simple metrics and logs. It should focus on "describing how the system 'works' in production," Thwaites explained. By "works" in this context, we're referring to how each service serves a particular request or interaction within the system.

Effective telemetry should help you:

Understand what makes one interaction different from another
Trace the specific database calls or unique code paths executed
Identify patterns in system behavior that aren't apparent during development

When implemented consistently, this approach makes debugging production issues "amazingly simple and quick," Thwaites concluded. The ability to trace a request through the entire system and understand exactly what happened at each step dramatically reduces mean time to resolution (MTTR).

The Importance of Consistency

As system complexity increases, consistency in telemetry becomes increasingly important. "The lack of consistency in how people talk about their systems performance has become more important as the complexity of those systems has increased," Thwaites noted.

Inconsistent terminology and approaches to observability create friction when teams need to collaborate on debugging or performance optimization. When different services use different naming conventions, attribute structures, or semantic meanings for similar metrics, it becomes difficult to build a comprehensive understanding of system behavior.

Weaver: Establishing Shared Vocabularies

Weaver addresses this challenge by providing a way to document telemetry with shared vocabularies. It "goes beyond the standard attributes you might expect like HTTP or gRPC" and "allows teams to define a shared vocabulary of telemetry in a way that observability backends, AI tooling, and ultimately humans, can use to understand that complex system."

Weaver offers several key capabilities:

Live checking and exception tracking against telemetry to ensure approved conventions are followed
Code generation to make adoption easier
Documentation of telemetry that can be understood by both humans and automated systems

By establishing consistent terminology and semantics across the organization, Weaver creates a common language for discussing system behavior, which improves collaboration and accelerates problem resolution.

Telemetry as a Development Practice

Perhaps the most significant shift in observability thinking is treating telemetry as a core development practice rather than an operations concern. "Producing good telemetry is the single greatest thing that will move the needle in how your team can support the production systems," Thwaites argued.

The most effective teams "have spent as much time curating the telemetry they output as they have writing the code that performs the business outcome." This represents a fundamental change in how organizations approach software development—viewing observability data as a critical output of the development process, not an afterthought.

When teams embrace telemetry as part of their development workflow, the benefits extend across multiple dimensions:

Reduced mean time to detection (MTTD)
Faster mean time to resolution (MTTR)
Improved developer happiness
Lower defect rates
Better system reliability

Observability for AI Applications

Observability takes on additional importance in AI applications, where system behavior can be unpredictable and context-dependent. "Observability is designed as a means to ask questions of your production system that you didn't know that you needed to ask while you were writing the code, which is exactly what we need when a system can use AI to perform tasks," Thwaites explained.

With AI systems, we often don't know how the system will react to a given input, and these inputs can change as users interact with the system. This unpredictability makes robust telemetry even more critical. "It's now even more important that we get robust telemetry, that includes our unique business context, out of our systems so that we answer those weird and wonderful questions," Thwaites noted.

Telemetry and Test-Driven Development

Telemetry and test-driven development (TDD) are closely related. "Telemetry is a core output of our applications; it's how we understand how an action from a user did the right thing," Thwaites explained.

When teams adopt TDD and incorporate telemetry into their tests, they produce code that's designed to be observable from the start. This approach ensures that observability isn't an afterthought but an integral part of the development process. By writing tests that verify both functional correctness and telemetry quality, teams build systems that are easier to understand and maintain throughout their lifecycle.

Practical Implementation Strategies

Implementing effective observability requires deliberate effort and strategic planning. Here are some practical approaches:

1. Start with Business Context

Begin by identifying the key business transactions and user journeys that matter most. Design telemetry specifically to capture the data needed to understand these interactions. This focus on business context ensures that observability efforts align with organizational priorities.

2. Implement Semantic Conventions

Adopt semantic conventions like those defined by OpenTelemetry. These standardized approaches to naming and structuring telemetry data ensure consistency across services while providing meaningful context about system behavior.

3. Integrate Observability into CI/CD

Make telemetry quality gates part of your continuous integration pipeline. Automated checks can verify that new services emit the expected telemetry with proper attributes and structure before they're deployed to production.

4. Create a Telemetry Champion Network

Designate team members as observability champions who can help spread best practices and ensure consistency across services. This network can provide guidance, review telemetry designs, and share lessons learned.

5. Invest in Training and Documentation

Ensure that developers understand not just how to emit telemetry, but why it matters and how to design effective instrumentation. Create documentation that explains your organization's observability practices, standards, and tools.

Conclusion

Observability and telemetry have evolved from operational necessities to core development practices that significantly impact system reliability and developer productivity. By adopting modern approaches like OpenTelemetry, establishing consistent vocabularies with tools like Weaver, and treating telemetry as a fundamental part of the development process, teams can build systems that are easier to understand, debug, and maintain.

As software systems continue to grow in complexity and dynamism, observability will only become more critical. Organizations that invest in observability as a core competency will be better positioned to deliver reliable, performant systems while maintaining high developer satisfaction and efficiency.

About the Author

Ben Linders runs a one-person business in Agile, Lean, Quality and Continuous Improvement. Author of several books including "Getting Value out of Agile Retrospectives" and "What Drives Quality", he helps organizations by deploying effective software development and management practices. He focuses on continuous improvement, collaboration and communication, and professional development. Ben is an active member of networks on Agile, Lean and Quality, and a frequent speaker and writer. He shares his experience in a bilingual blog (Dutch and English) and as an editor for Agile at InfoQ. Follow him on twitter: @BenLinders.

Author photo

How Observability and Telemetry Can Enhance the Practice of Software Engineering

How Observability and Telemetry Can Enhance the Practice of Software Engineering

The Evolution of Observability

OpenTelemetry: Vendor-Agnostic Telemetry Collection

What Constitutes Good Telemetry?

The Importance of Consistency

Weaver: Establishing Shared Vocabularies

Telemetry as a Development Practice

Observability for AI Applications

Telemetry and Test-Driven Development

Practical Implementation Strategies

1. Start with Business Context

2. Implement Semantic Conventions

3. Integrate Observability into CI/CD

4. Create a Telemetry Champion Network

5. Invest in Training and Documentation

Conclusion

About the Author

Comments