Prometheus Co-Founder Warns: OpenTelemetry Metrics Come at a Cost
Share this article
As OpenTelemetry (OTel) gains momentum as a unified observability framework, Prometheus co-founder Julius Volz cautions against using its SDKs for metrics collection in Prometheus environments. In a detailed analysis, Volz outlines why Prometheus' native client libraries—not OTel—deliver superior reliability, performance, and usability for teams invested in the Prometheus ecosystem.
The Monitoring Model Clash
At the heart of Volz's argument is a philosophical divide: Prometheus is a full monitoring system, while OTel focuses on telemetry generation and transport. This distinction manifests critically in target health monitoring. Prometheus' pull-based model combined with service discovery generates an up metric, enabling instant detection of missing or failing targets:
alert: TargetDown
expr: up{job="demo"} == 0
for: 5m
OTLP's push-based approach severs this feedback loop. "You lose the ability to detect if expected metrics sources vanish," warns Volz. Teams must manually correlate OTel data with infrastructure state—a complex and often neglected safeguard.
Naming, Labels, and Query Headaches
Translating OTel metrics to Prometheus introduces syntactic friction:
- Character Set Incompatibility: OTel allows dots/dashes in names (e.g.,
http.server.duration), forcing underscores in pre-3.0 Prometheus (http_server_duration). While Prometheus 3.0 supports UTF-8, queries become cumbersome:
{"http.server.duration", "http.method"="GET"}
- Mandatory Suffixes: OTel omits units/types from metric names, so translation layers append them (e.g.,
k8s.pod.cpu.time→k8s_pod_cpu_time_seconds_total). Native Prometheus instrumentation avoids this indirection.
Label semantics also diverge. OTel's verbose "resource attributes" (e.g., SDK versions) are relegated to a sparse target_info metric, requiring joins for contextual queries:
rate(http_request_count[5m])
* on(job, instance) group_left(k8s_cluster_name)
target_info
Prometheus' target labels, derived from service discovery, attach directly to all metrics.
Operational and Performance Tax
Adopting OTel demands Prometheus-side compromises:
- Security/Config Overhead: Enabling OTLP ingestion (
--web.enable-otlp-receiver) exposes new attack surfaces and requires out-of-order writes:
storage: tsdb: out_of_order_time_window: 30m
SDK Performance Penalties: Benchmarks of Go SDKs reveal stark differences. Incrementing a cached counter with labels under load:
SDK Throughput (ops/ns) Prometheus Native 0.35 OpenTelemetry 0.0066 Prometheus was ~53x faster in this test. Volz attributes this to OTel's abstraction layers and allocations per operation.
The Open Standard Misconception
While OTel is touted as vendor-neutral, Volz notes Prometheus' protocols (PromQL, remote write) are de facto standards with simpler integrations: "Prometheus' text format can be implemented in a bash script. OTLP requires Protocol Buffers and SDK complexity."
The Pragmatic Path Forward
Volz concedes OTel excels for traces/logs or multi-backend pipelines. But for metrics in Prometheus-centric environments, native instrumentation preserves core benefits:
- Automatic target health checks
- Consistent naming and labeling
- Optimized performance
- Minimal configuration
"You risk throwing away features that define Prometheus," he concludes. For teams prioritizing operational clarity and efficiency, the native path remains compelling—even amid OTel's rising tide.
Source: Why I recommend native Prometheus instrumentation over OpenTelemetry by Julius Volz