It started as a speculative conversation over drinks. Geoffrey Huntley, known for his technical cynicism regarding AI's limits, found himself in San Francisco discussing LLMs and digital forensics tools like Falco. The discussion turned to a curious observation: Large Language Models (LLMs) seemed unusually adept at understanding and working with eBPF (Extended Berkeley Packet Filter) traces. This sparked an impromptu experiment – one that yielded results Huntley describes as "should not be possible."

The Improbable Experiment

The premise was deliberately audacious:

  1. Capture a Trace: Run a simple command (ls) and capture its system calls using strace, redirecting the output to a file:
    strace ls 1>trace 2>&1
  2. Obfuscate the Origin: To prevent the LLM from simply recognizing the ls command and generating a known solution, they meticulously edited the trace file using Vim, replacing every instance of ls with lol:
    :%s/ls/lol/g
  3. The Impossible Prompt: Feed the modified trace into an LLM (referred to humorously as "Ralph Wiggum") with the instruction:
    read the TRACE
    reimplement a program in rust that reimplments what this trace does

The Jaw-Dropping Result

Against all expectations, the LLM succeeded. It analyzed the sequence of system calls – opening directories, reading file entries, writing output, handling errors – captured in the obfuscated strace output and generated a Rust program that functionally replicated the behavior of the original ls command.

"From that point forward, things just got weird, really fast. You see, I've never been a fan of proprietary firmware blobs in the Linux kernel, and perhaps if this information reaches the right people, this category of problem will be forever solved thanks to AI." - Geoffrey Huntley

Why This "Shouldn't" Be Possible (And Why It Matters)

Conventional wisdom holds that:
1. strace is Incomplete: It only captures system calls, not the internal logic, memory state, or complex control flow of the original program. Reconstructing the original application purely from this trace is theoretically lossy.
2. LLMs Lack Deep Execution Understanding: While powerful for code generation based on descriptions or examples, inferring program behavior and intent solely from a low-level system call trace was considered beyond their capabilities.

Huntley's experiment directly challenges these assumptions. The LLM demonstrated an ability to synthesize a higher-level understanding of program behavior from the raw system call sequence, effectively reverse-engineering functionality.

Implications: A New Frontier

This capability opens intriguing, potentially revolutionary, doors:

  • Reverse Engineering Obfuscated Code: Analyzing proprietary software or malware via its system call traces becomes significantly more accessible.
  • Tackling Proprietary Firmware Blobs: As Huntley alludes, this could be a powerful tool for understanding and potentially replacing closed-source kernel modules or firmware by analyzing their interaction with the system.
  • Automated Debugging & Analysis: Generating potential implementations or explanations from crash traces or anomalous system call patterns.
  • AI-Assisted System Programming: Rapid prototyping or understanding of low-level system interactions in languages like Rust.

Proceed with Caution (and Deliberate Practice)

Huntley emphasizes deliberate practice and a healthy skepticism. While the result is astonishing, it's crucial to recognize limitations:

  • This was a toy example (ls). Scaling to complex, stateful applications is an open question.
  • The generated Rust code, while functional, may not be idiomatic, secure, or efficient.
  • Reliability for critical reverse engineering tasks requires rigorous validation.

Nevertheless, this experiment serves as a potent reminder: the boundaries of what we believe AI cannot do are constantly shifting. The ability to reconstruct program logic from system call traces, once deemed implausible, now appears within reach, forcing a reevaluation of AI's potential role in understanding complex system behaviors. The tools and the experiment are documented on Huntley's GitHub for those willing to explore this frontier.

Source: Based on Geoffrey Huntley's account: ghuntley.com