Chardet dispute shows how AI will kill software licensing

A dispute over the chardet Python library's license change, enabled by AI-assisted development, has sparked debate about the future of software licensing and copyright in the age of AI.

The open source community is grappling with a fundamental question about the future of software licensing after a dispute over the chardet Python library has exposed how AI-assisted development could render traditional copyright protections obsolete.

Earlier this week, Dan Blanchard, maintainer of chardet—a Python character encoding detection library—released version 7.0 under an MIT license, replacing the previous GNU Lesser General Public License (LGPL). This seemingly routine license change has ignited a firestorm of debate about whether AI-generated code can bypass copyleft requirements.

The controversy centers on Blanchard's claim that he used Anthropic's Claude AI to create what amounts to a "clean room" implementation of chardet. Clean room implementations involve rewriting code without directly copying the original, typically to avoid copyright infringement. Blanchard argues that the AI-generated version bears minimal similarity to previous releases—less than 1.3 percent maximum similarity according to JPlag analysis—making it a fundamentally new work.

However, Mark Pilgrim, the original creator of chardet, disputes this characterization. An individual claiming to be Pilgrim opened an issue on GitHub arguing that the maintainers had no right to change the license, stating that "licensed code, when modified, must be released under the same LGPL license." The argument hinges on whether AI assistance constitutes sufficient transformation to create new copyrightable material.

Blanchard defends the license change on multiple grounds. First, he needed to improve chardet's performance and accuracy to get it added to the Python standard library—a goal he'd pursued for over a decade. Second, he claims the AI-generated version achieved a 48x increase in detection speed, benefiting millions of users who download the package approximately 130 million times monthly. Third, he argues that the LGPL requirement to maintain the same license doesn't apply to what he considers a complete rewrite.

This dispute highlights a broader existential crisis for software licensing. Armin Ronacher, creator of Flask and longtime open source developer, suggests that copyleft licenses like GPL and LGPL depend heavily on copyright enforcement mechanisms that may no longer be viable. "Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it's fundamentally in the open, with or without tests, you can trivially rewrite it these days."

The question of AI's role in copyright creation remains unsettled. The US Supreme Court recently refused to reconsider Thaler v. Perlmutter, which upheld that AI-generated images cannot be copyrighted. This precedent raises questions about how much human involvement is required for AI-assisted code to retain copyright protection.

Zoë Kooyman, executive director of The Free Software Foundation, expressed concern about the implications. "There is nothing 'clean' about a Large Language Model (LLM) which has ingested the code it is being asked to reimplement." She argues that undermining copyleft through AI assistance is "highly antisocial" and that free software communities need stronger protections as machine learning creates new ways to circumvent copyright.

Bruce Perens, who wrote the original Open Source Definition, has issued what he calls a "fire alarm" about the broader implications. "The entire economics of software development are dead, gone, over, kaput!" Perens argues that AI's ability to clone software so easily threatens both proprietary and open source business models. He describes using AI to create an SRE platform in minutes after training it on existing platforms' documentation—a process that previously would have taken weeks or months.

The economic consequences could be profound. Proprietary software companies that rely on keeping their code secret may find their competitive advantages evaporating when AI can reproduce functionality from documentation alone. Open source projects face similar challenges, as every open source program potentially becomes AI training data that can be used to generate competing implementations.

Perens suggests we may be at an inflection point similar to the printing press's impact on knowledge dissemination or the scientific method's effect on research. "I wonder if knowledge got to a critical mass, and this is the inflection point where all of the processes around it changes."

The chardet dispute may be just the beginning of what promises to be a long and contentious debate about software ownership in the AI era. As AI models become increasingly capable at code generation, the traditional foundations of software licensing—built on copyright law and human authorship—face an uncertain future. The question isn't just whether AI can help write code faster, but whether it can effectively nullify the licensing frameworks that have governed software development for decades.

Chardet dispute shows how AI will kill software licensing

Comments