Simon Willison Confronts Ethical Questions in LLM-Assisted Code Porting
#LLMs

Simon Willison Confronts Ethical Questions in LLM-Assisted Code Porting

AI & ML Reporter
2 min read

Simon Willison addresses ethical implications of using LLMs to port open source projects, examining copyright, ecosystem impact, and responsible publishing practices.

Simon Willison has published his responses to ethical questions raised by his experiment porting the JustHTML library from Python to JavaScript using large language models. In a detailed follow-up to his original post, Willison examines the legal and philosophical implications of using AI coding assistants like OpenAI's Codex CLI, GPT-5.2, Anthropic's Claude, and Opus 4.5 to create derivative works.

Willison's original experiment demonstrated how an LLM could port a substantial codebase (with 9,200 tests) in hours rather than weeks. This prompted him to confront several critical questions:

  1. Copyright Violation Concerns: Willison concludes that retaining the original Python library's open source license and copyright notice makes the ported work legally compliant as a derivative work under open source principles.

  2. Ethical Considerations: He argues the practice is ethical when proper attribution is given, drawing parallels to students forking GitHub projects. "Open source allows and encourages derivative works," Willison states, though he acknowledges language ports feel distinct from feature additions.

  3. Ecosystem Impact: Willison identifies complex tradeoffs. While some maintainers might abandon open source over LLM training concerns, he suggests this might be offset by increased contributions from those who previously lacked time: "If 'they might train on my code' drives you away, your open source values differ enough from mine that I'll invest in welcoming newcomers instead."

He raises a more significant concern about demand erosion, citing Tailwind's reported documentation traffic decline as an example. "LLMs make building 'good-enough' versions of components easy enough that people bypass existing libraries," he observes, noting his own shift from searching for a Go cron parser to generating one.

  1. Copyright Over AI Output: Willison admits uncertainty about copyright claims on LLM-generated code but hypothesizes his "creative control in directing models" likely constitutes sufficient human intervention under US law.

  2. Responsible Publishing: He advocates publishing such projects with clear expectations, introducing "alpha slop" as a provisional classification: "I'll remove the alpha label once I've used them in production enough to stake my reputation on them."

  3. Quality Comparison: Responding to his own provocative question about expert-crafted alternatives, Willison notes that for well-specified tasks like HTML parsing with comprehensive test suites, LLM results may rival hand-crafted implementations without the "hundreds of thousands of dollars" cost.

Willison's analysis highlights fundamental tensions in the evolving open source landscape. While LLMs dramatically lower barriers to creating derivative works, they simultaneously reduce demand for specialized libraries and introduce novel questions about authorship and value attribution. His pragmatic approach—prioritizing transparency through licensing, attribution, and version labeling—offers a potential framework for navigating these changes.

Comments

Loading comments...