Polar Signals has unveiled Off-CPU profiling for its eBPF-based Parca Agent, enabling developers to track why processes are idling off the CPU during I/O waits, locks, or scheduling delays. The feature introduces intelligent stack filtering and runtime-specific presets to eliminate noise from systems like Go's GC or Rust's Tokio runtime. This breakthrough provides critical visibility into latency drivers beyond traditional CPU profiling.

For performance engineers optimizing latency-sensitive systems, understanding what happens when processes aren't running on CPUs has long been a blind spot. Polar Signals tackles this gap with Off-CPU profiling for its eBPF-based Parca Agent, finally letting developers quantify time spent waiting for I/O, network responses, and other non-CPU bottlenecks.
Why Off-CPU Matters
While On-CPU profiling reveals compute-bound inefficiencies, it ignores critical latency sources like:
- Disk I/O contention
- Network call delays
- Lock synchronization waits
- Scheduling pauses
"When optimizing latency, it's crucial to understand why our process isn't performing work on the CPU," notes Polar Signals' engineering team. Without this data, engineers miss systemic slowdowns invisible to traditional profilers.
Under the Hood: Kernel Tracing & Sampling
Implementing Off-CPU required novel instrumentation:
- Tracepoint Hooks: Leveraging Linux's
sched:sched_switchto detect when tasks leave the CPU - Kprobe Tracking: Using
finish_task_switch.isra.0to measure off-CPU duration - Sampling Throttle: The
--off-cpu-thresholdflag controls overhead by sampling events (e.g., 50/1000) to avoid flooding systems
"Kernel scheduling events can occur thousands of times per second. Without sampling, overhead becomes prohibitive," explains developer Florian Lehner, who spearheaded the data collection.
Cutting Through Runtime Noise
Initial deployments revealed a surprise: Runtime systems dominated off-CPU traces. In Go, runtime.usleep and garbage collection pauses appeared as top offenders, while Rust's Tokio runtime generated similar noise.
Polar Signals responded with a filtering toolkit:
- Stack Exclusion: "Not contains" filters remove known runtime patterns
- Multi-Filter Support: Combine exclusions (e.g., GC + timers)
- Runtime Presets: Preconfigured filters for Go and Tokio
// Example: Applying Go runtime preset
offcpu.FilterPreset("go-runtime-expected")
After filtering, true culprits emerged—like network I/O stalls in Prometheus servers where EpollWait and syscall.Write dominated latency.
The New Optimization Workflow
- Capture On-CPU profile to optimize compute paths
- Run Off-CPU analysis to identify I/O/scheduling bottlenecks
- Apply runtime presets to eliminate noise
- Triage remaining stacks (e.g., database calls, filesystem syncs)
"We now see how allocation-heavy code triggers both CPU costs and scheduling penalties via GC," observes the team. This dual visibility is revolutionary for tuning cloud-native systems.
Future Extensions
Polar Signals invites community input to expand presets for additional runtimes (Java, Node.js, .NET). Early adopters can test the feature in Parca v0.22.0+ and share feedback via Discord.

Comments
Please log in or register to join the discussion