Prometheus Co-Founder Warns: OpenTelemetry Metrics Come at a Cost

Julius Volz, co-creator of Prometheus, argues that using OpenTelemetry for metrics introduces critical tradeoffs when paired with Prometheus monitoring. He details six operational pitfalls—from lost health monitoring to performance overheads—that undermine Prometheus' core strengths. For teams committed to Prometheus, native instrumentation remains the recommended path for reliability and efficiency.

As OpenTelemetry (OTel) gains momentum as a unified observability framework, Prometheus co-founder Julius Volz cautions against using its SDKs for metrics collection in Prometheus environments. In a detailed analysis, Volz outlines why Prometheus' native client libraries—not OTel—deliver superior reliability, performance, and usability for teams invested in the Prometheus ecosystem.

The Monitoring Model Clash

At the heart of Volz's argument is a philosophical divide: Prometheus is a full monitoring system, while OTel focuses on telemetry generation and transport. This distinction manifests critically in target health monitoring. Prometheus' pull-based model combined with service discovery generates an up metric, enabling instant detection of missing or failing targets:

alert: TargetDown
expr: up{job="demo"} == 0
for: 5m

OTLP's push-based approach severs this feedback loop. "You lose the ability to detect if expected metrics sources vanish," warns Volz. Teams must manually correlate OTel data with infrastructure state—a complex and often neglected safeguard.

Naming, Labels, and Query Headaches

Translating OTel metrics to Prometheus introduces syntactic friction:

Character Set Incompatibility: OTel allows dots/dashes in names (e.g., http.server.duration), forcing underscores in pre-3.0 Prometheus (http_server_duration). While Prometheus 3.0 supports UTF-8, queries become cumbersome:
```
{"http.server.duration", "http.method"="GET"}
```
Mandatory Suffixes: OTel omits units/types from metric names, so translation layers append them (e.g., k8s.pod.cpu.time → k8s_pod_cpu_time_seconds_total). Native Prometheus instrumentation avoids this indirection.

Label semantics also diverge. OTel's verbose "resource attributes" (e.g., SDK versions) are relegated to a sparse target_info metric, requiring joins for contextual queries:

rate(http_request_count[5m])
* on(job, instance) group_left(k8s_cluster_name)
  target_info

Prometheus' target labels, derived from service discovery, attach directly to all metrics.

Operational and Performance Tax

Adopting OTel demands Prometheus-side compromises:

Security/Config Overhead: Enabling OTLP ingestion (--web.enable-otlp-receiver) exposes new attack surfaces and requires out-of-order writes:
```
storage:
  tsdb:
    out_of_order_time_window: 30m
```
SDK Performance Penalties: Benchmarks of Go SDKs reveal stark differences. Incrementing a cached counter with labels under load:

SDK Throughput (ops/ns)

Prometheus Native 0.35

OpenTelemetry 0.0066

Prometheus was ~53x faster in this test. Volz attributes this to OTel's abstraction layers and allocations per operation.

SDK	Throughput (ops/ns)
Prometheus Native	0.35
OpenTelemetry	0.0066

The Open Standard Misconception

While OTel is touted as vendor-neutral, Volz notes Prometheus' protocols (PromQL, remote write) are de facto standards with simpler integrations: "Prometheus' text format can be implemented in a bash script. OTLP requires Protocol Buffers and SDK complexity."

The Pragmatic Path Forward

Volz concedes OTel excels for traces/logs or multi-backend pipelines. But for metrics in Prometheus-centric environments, native instrumentation preserves core benefits:

Automatic target health checks
Consistent naming and labeling
Optimized performance
Minimal configuration

"You risk throwing away features that define Prometheus," he concludes. For teams prioritizing operational clarity and efficiency, the native path remains compelling—even amid OTel's rising tide.

Source: Why I recommend native Prometheus instrumentation over OpenTelemetry by Julius Volz