Mark Pilgrim, creator of chardet, claims maintainers illegally relicensed the project after an AI-assisted rewrite, violating LGPL terms.
The chardet/chardet repository is facing a licensing dispute after maintainers released version 7.0.0 under the MIT license, claiming it was a "complete rewrite" of the original code. Mark Pilgrim, the original author of chardet and creator of "Dive Into Python," has publicly objected to this relicensing, stating that the maintainers have no legal right to change the license from LGPL to MIT.
In an issue opened on GitHub, Pilgrim explained that the claim of a "complete rewrite" is irrelevant since the maintainers had ample exposure to the originally licensed code. He emphasized that adding a code generator does not grant additional rights, and that licensed code, when modified, must be released under the same LGPL license.
"I respectfully insist that they revert the project to its original license."
- Mark Pilgrim
The controversy centers on version 7.0.0, which was reportedly created with AI assistance. Multiple commenters on the issue have pointed out that the agent involved in the rewrite was looking at some of the original files during the process. One user shared a code snippet showing the agent referencing the original charsets.py file from chardet 6.0.0 as an authoritative reference.
Some community members have suggested that a fork of the project before the AI-assisted rewrite might be necessary. Others have noted that all versions before v7.0.0 are still accessible through the repository's tags, with version 6.0.0 being specifically mentioned as available.
The legal implications of this case have sparked discussion about copyright law and fair use. One commenter referenced the Google LLC v. Oracle America, Inc. case, suggesting that relicensing an API-compatible "rewrite" without the copyright holder's permission could be illegal in the US unless it qualifies as fair use. However, others have argued that this situation is more about derivative works than APIs specifically.
Some developers have expressed support for having an MIT-licensed implementation of chardet for practical reasons, while acknowledging the legal complexities. One commenter stated they "welcome an MIT (or public domain) implementation of chardet" but also noted the need for evidence that this is truly a clean room implementation.
The dispute highlights the growing challenges around AI-assisted development and open source licensing, particularly when AI tools are trained on copyrighted code. It raises questions about what constitutes a "clean room" implementation in the age of AI and whether exposure to original source code during AI training creates derivative works.
The chardet project, originally created as a "Universal Character Encoding Detector," has been a widely-used tool in the Python ecosystem for detecting the encoding of text files. The outcome of this licensing dispute could have significant implications for how open source projects handle AI-assisted rewrites and license changes in the future.

Featured image: The chardet repository's main page showing the licensing controversy

Comments
Please log in or register to join the discussion