AI-Generated Documentation: Promise and Pitfalls in Legacy Code Analysis

A veteran programmer puts AI documentation generation to the test on two personal projects, finding impressive capabilities alongside significant limitations that raise questions about the technology's readiness for complex, legacy systems.

In the evolving landscape of software development, artificial intelligence continues to push boundaries, with the latest frontier being automated documentation generation. A recent experiment by a seasoned programmer offers valuable insights into the current capabilities and limitations of this emerging technology.

The journey began with a serendipitous discovery of DeepWiki, a service promising to generate comprehensive documentation for any GitHub repository. Despite the programmer's admitted indifference to Boston (the namesake of their blog "The Boston Diaries"), they were intrigued by the potential of AI to tackle one of software development's most persistent challenges: maintaining accurate documentation.

"I can't say I've been impressed with LLMs generating code, but what about documentation?" the programmer mused. "I haven't tried that yet. Let's see how well Roko's basilisk dances!"

Initial Impressions: Surprisingly Detailed

The first test subject was mod_blog, a codebase the programmer had maintained for 26 years—an ideal candidate for evaluating documentation accuracy given their intimate familiarity with the system.

"After submitting the repository URL and waiting for the notification email, I was quickly amazed!" the programmer reported. "Nearly 30 pages of documentation, and the overview was impressive. It picked up on tumblers, the storage layout, the typical flows in adding a new entry. It even got the fact that cmd_cgi_get_today() returns all the entries for a given day of the month throughout the years."

The system demonstrated an ability to understand complex code structures and relationships, identifying architectural patterns and data flows that might elude human reviewers unfamiliar with the codebase.

The Devil in the Details

However, the initial enthusiasm gave way to more critical evaluation as the programmer examined the documentation more closely.

"But there was one bit that was just a tad bit off," they noted. "It stated '[t]he system consists of three primary layers' but the following diagram showed five layers, with no indication of what three were the 'primary layers.' I didn't have a problem with the layers it did identify—just that it seems to have a problem counting to three."

This discrepancy highlights a fundamental challenge with current AI systems: they can generate plausible-sounding content that contains factual errors, potentially misleading developers who rely on the documentation.

Interface and Usability Concerns

Beyond the content itself, the programmer raised concerns about DeepWiki's user interface:

"The menu on the left is longer than it appears, given that scroll bars seem oh so last century. Also, the diagrams are very inconsistent, and often times, way too small to view properly, even when selected. Then you'll get the occasionally gigantic diagram. The layouts seem arbitrary—some horizontal, some vertical, and some L-shaped. And it repeats itself excessively."

These usability issues compound the technical challenges, potentially hindering developers' ability to effectively utilize the generated documentation.

A More Complex Test Case

To further evaluate DeepWiki's capabilities, the programmer applied it to a09, their 6809 assembler—a project of similar size (9,500 lines compared to mod_blog's 7,400) but with higher complexity and less evolutionary refinement.

"This, in my mind, is a much worse job than it did for mod_blog," the programmer concluded. "I suspect it's due to the cyclomatic complexity being a bit higher in a09 than in mod_blog due to the cross-cutting nature of the code. And that probably causes the LLM to run up to, if not a bit over, its context window, thus causing the confabulations."

The results suggest that as code complexity increases, the reliability of AI-generated documentation may decrease significantly, potentially leading to more serious errors and misunderstandings.

The Legacy Code Dilemma

The programmer expressed particular concern about applying such technology to legacy systems:

"I fear that this is meant to be used for legacy code with little or no documentation, and if it does this poorly on a moderately complex but small codebase, I don't want to contemplate what it would do for a larger, older, and gnarlier codebase."

This raises a critical question: if AI documentation generation struggles with moderately complex modern code, can it be trusted to reverse-engineer and document decades-old legacy systems where even the original developers may have moved on?

The Maintenance Challenge

Beyond initial generation, the programmer identified a significant maintenance challenge:

"Another issue are updates to the repo. The site sells itself as a wiki, so I suppose another aspect to this is you spend the time going through the generated 'documentation' and fixing the errors, and then keep it up to date as the code changes."

The prospect of continuously updating AI-generated documentation as code evolves presents a substantial burden, potentially negating much of the time-saving benefits of automated generation in the first place.

Unexpected Benefits

Despite the criticisms, the experiment yielded some unexpected benefits. The programmer discovered two issues in their codebase—one actual bug and one area where a literal constant was used instead of a defined constant—through the process of reviewing the AI-generated documentation.

"At least I'm glad for finding those two issues, even if they haven't been an actual exploitable bug yet (as I think I'm the only one using mod_blog)," they noted.

This serendipitous discovery highlights a potential secondary benefit of AI documentation tools: serving as a form of automated code review that can identify issues developers might otherwise overlook.

The Verdict

In conclusion, the programmer's assessment of DeepWiki was cautiously optimistic yet ultimately skeptical:

"Overall, this was less obnoxious than having the LLMs write code, but I feel it's still too inaccurate to be let loose on unfamiliar codebases, which I suspect is the selling point."

As AI continues to infiltrate various aspects of software development, the documentation frontier represents both promise and peril. While the technology shows potential for generating useful overviews and identifying code issues, its current limitations in accuracy, particularly with complex systems, suggest that human oversight remains essential.

For developers working with legacy codebases or complex architectures, AI documentation tools may serve as supplementary aids rather than replacements for human expertise—at least for the foreseeable future.

Source: The Boston Diaries (https://boston.conman.org/2025/12/02.1)

#AIDocumentation #SoftwareDevelopment #LegacyCode