Detecting Hidden Loops in LLM‑Driven Agentic Systems: An Unsupervised Cycle‑Detection Framework

Large‑language‑model (LLM) agents can slip into silent execution cycles that drain resources without raising errors. A new study introduces a hybrid, unsupervised method that combines call‑stack analysis with semantic similarity to uncover both obvious and subtle loops, achieving a remarkable F1 score of 0.72 on a stock‑market LangGraph prototype.

The Problem: Invisible Cycles in Agentic LLM Applications

Agentic applications—those that autonomously plan, decide, and act—are increasingly powered by large language models. While these systems promise unprecedented flexibility, they also introduce non‑deterministic execution paths that can form hidden cycles. Unlike traditional programming loops, these cycles may not be represented by explicit while or for constructs; instead, an LLM can repeatedly generate similar content or re‑enter a state through semantic similarity, quietly consuming compute and memory.

Conventional observability stacks (logs, traces, metrics) focus on explicit control‑flow anomalies and often miss these subtle, content‑based loops. The result is silent degradation: a user-facing interface that lags, a cloud bill that spikes, and a debugging nightmare.

A Hybrid, Unsupervised Approach

Felix George, Harshit Kumar, Divya Pathak, Kaustabha Ray, Mudit Verma, and Pratibha Moogi tackle the problem in their 2025 arXiv paper Unsupervised Cycle Detection in Agentic Applications (arXiv:2511.10650). Their method marries two complementary signals:

Temporal Call‑Stack Analysis – A lightweight, structural scan that flags explicit loops by tracking repeated stack traces across an agent’s trajectory.
Semantic Similarity Analysis – A content‑centric layer that compares generated text snippets using embeddings (e.g., Sentence‑Transformers) to detect redundant or looping output that the call stack alone would miss.

By combining these layers, the framework can surface both overt and covert cycles without requiring labeled training data.

How It Works

# Pseudocode for the hybrid cycle detector
for trajectory in trajectories:
    stack_cycles = detect_stack_cycles(trajectory)
    semantic_cycles = detect_semantic_cycles(trajectory, threshold=0.85)
    cycles = merge(stack_cycles, semantic_cycles)
    report(cycles)

The detect_semantic_cycles function computes cosine similarity between consecutive LLM outputs; if similarity exceeds a tuned threshold, a potential cycle is flagged. The merge step consolidates overlapping detections, reducing false positives.

Empirical Results

Evaluated on 1,575 trajectories from a LangGraph‑based stock‑market trading bot, the hybrid model achieved:

Precision: 0.62
Recall: 0.86
F1 Score: 0.72

For context, the structural‑only baseline scored an F1 of 0.08, while the semantic‑only baseline scored 0.28. The dramatic lift underscores the necessity of a multi‑signal approach.

Why It Matters for Developers

Resource Efficiency – Detecting silent cycles allows teams to prune unnecessary compute, directly impacting cloud spend.
Robust Observability – Integrating this detector into existing tracing pipelines can surface hidden bugs that traditional logs miss.
Model‑agnostic – The method relies on stack traces and text embeddings, so it works across LLMs (GPT‑4, Llama‑2, etc.) and agentic frameworks (LangGraph, ReAct, etc.).

Limitations and Future Work

The authors acknowledge that the semantic similarity threshold is tuned per domain; a one‑size‑fits‑all threshold may not generalize. Moreover, the approach currently operates post‑hoc on recorded trajectories, which limits real‑time intervention. Future research could explore online cycle detection and adaptive thresholds driven by reinforcement signals.

Takeaway

Hidden execution cycles are a growing threat to the reliability and cost‑efficiency of LLM‑powered agentic systems. By fusing structural call‑stack analysis with semantic similarity, the framework presented in arXiv:2511.10650 offers a pragmatic, unsupervised path forward. As agentic applications proliferate, such hybrid observability tools will become essential for maintaining performance and predictability.