Gabriella Gonzalez introduces a prototype semantic navigator that clusters and labels code/files by meaning, demonstrating how non-chat interfaces can provide more intuitive exploration of complex repositories.

The dominance of chat-based AI interfaces in developer tools has obscured alternative approaches that might better leverage large language models. Gabriella Gonzalez's semantic-navigator prototype offers a compelling vision: replacing hierarchical file browsing with meaning-based navigation through recursive clustering and semantic labeling.
Beyond Chat Limitations
Current chat-based developer tools suffer from inherent limitations when navigating codebases:
- Information overload requiring prose sifting
- Interaction friction from manual query formulation
- Verification challenges in assessing completeness
The semantic navigator addresses these by visualizing repository structure through automatically generated clusters. When run against Gonzalez's Grace programming language repository, it organizes files by conceptual relationships rather than directory paths—revealing connections invisible in traditional explorers.
Technical Implementation
At its core, the tool employs:
- Semantic vectorization of all files
- Recursive spectral clustering to build hierarchical groups
- Context-aware labeling using LLMs
Spectral clustering was chosen for its parameter-light implementation and mathematically verifiable behavior. Unlike distance-based algorithms, it suggests natural cluster counts while accommodating tuning-free variations—critical for minimizing configuration overhead.
Labeling Innovations
Early labeling attempts produced generic, repetitive cluster names. Significant improvements came from:
- Sibling-aware labeling: Presenting all sibling clusters simultaneously to the LLM yields distinctive comparative labels
- Structured reasoning: Requiring LLMs to generate
overarchingThemeanddistinguishingFeaturefields before producing the final label creates better conceptual separation - Length constraints: Enforcing 2-word cluster labels and 3-7-word file labels forces meaningful compression
Unexpectedly, displaying path patterns (like */Condition.dhall) proved doubly valuable. Beyond aiding users, feeding these patterns back into the labeling process significantly improved the LLM's accuracy—demonstrating how human-centric design benefits model performance.
Scaling and Applications
The tool handles ≈10,000 files within minutes on modern hardware. While optimized for code, it generalizes elegantly to other domains:
- Gonzalez's blog repository organized posts by content themes
- A personal meme library (converted to text via multimodal AI) revealed unexpected thematic groupings
This flexibility suggests applications in:
- IDE integration replacing traditional file trees
- Document management systems for legal or research repositories
- Multimedia indexing using multimodal embeddings
Counterpoints and Future Directions
Two limitations merit consideration:
- The 20-file minimum cluster threshold risks overlooking granular relationships in small projects
- Ultra-large repositories (>10k files) require algorithmic optimizations
Potential enhancements include incremental clustering for versioned repositories and hybrid filesystem-semantic navigation. Crucially, this prototype demonstrates that thoughtfully designed interfaces can surpass chat-based interactions for spatial understanding of complex information landscapes.
The project exemplifies how constraining LLMs within structured workflows—rather than conversational freeform—yields more reliable and navigable representations of knowledge. As Gonzalez notes: "We can still use large language models, but we can build much better interfaces to them."

Comments
Please log in or register to join the discussion