Haskell's Cabal Evolves: Preserving Comments with Exact Parsing
Share this article
For years, Haskell developers have wrestled with Cabal's frustrating limitation: Any modification to .cabal package manifests would obliterate comments, reformat whitespace, and desugar complex conditionals. This limitation hindered automation tools and forced manual interventions for basic tasks like adding modules or dependency bounds. As Léana Jiang details in their original blog post, this changes with a new comment-preserving parser now underway.
The Exact Printing Imperative
The core solution lies in implementing exact printing – a technique ensuring byte-for-byte identical output after parsing and reprinting. Formally defined as:
forall cabalFile.
IsValid cabalFile =>
exactPrint (exactParse cabalFile) == cabalFile
This requires preserving every syntactic detail: comments, comma styles, whitespace, and conditional structures. Without it, tools like automated dependency bound generation or module list updates remain impractical.
Engineering the Transformation
Cabal's parser comprises three key components:
1. Alex Lexer: Generates tokens from source
2. Field Parser: Builds Field ann structures from tokens
3. Field Grammar Parser: Handles semantic rules
Previously, the lexer silently discarded comments. The overhaul makes two critical changes:
1. Annotations That Carry Comments
Instead of adding a disruptive Comment constructor to the Field type, Jiang's team redesigned the annotation parameter:
data WithComments ann = WithComments
{ justComments :: ![Comment ann]
, unComments :: !ann
}
This wraps existing Position annotations, storing comments without altering downstream code expecting traditional Field structures. Crucially, it avoids forcing all existing code to handle comment cases.
2. Lexer State Management
The lexer uses state transitions ("start codes") to handle different contexts. Jiang meticulously mapped these states to identify where comments could appear (highlighted in yellow). The lexer now emits comment tokens in these states instead of dropping them.
Navigating Hidden Pitfalls
The solution revealed subtle complexities. Field ann is a Functor, meaning mapping over annotations affects all nested elements. As Jiang notes:
"If we
fmapand attach comments to aField, its first and second arguments will all have the same comments attached!... You must have a clear understanding of the instances of the data types you're using."
Future Horizons
While comment preservation is complete, full exact printing requires further work:
- Preventing common stanza merging
- Preserving trivia (like spaces in dependency bounds)
- Finalizing the AnnotatedGenericPackageDescription type
This work, supported by the Haskell Foundation and building on Jappie Kloosterman's prototype, fundamentally enables reliable Cabal file tooling. As Jiang emphasizes: "Adding an exact printer will allow Cabal to modify cabal files automatically and loselessly!"
Source: Based on the original technical deep dive by Léana Jiang at the Haskell Blog.