Some interesting reflections from the Fediverse post of d@nny disc@ (@hipsterelectron@circumstances.run):
curious points about LLM bans:
- it’s important for the policy to specify the basis of the ban (on copyright alone, or copyright+ethical grounds) so that it can be applied to other technologies. this in fact sets a precedent, and you are acting as judges in your project’s courtroom
as a result, it’s ideal if the ban directly results from appeals to previously adjudicated concerns. this is why copyright is effective, because by selecting a license for the project you are already requiring that all contributions fall under a compatible license!
if you want to introduce new grounds for rejection such as ethical concerns, it’s worth codifying those explicitly, and marking them as a new policy of the project. this will make it easier to litigate borderline cases in the future
but i’d also encourage considering the gnulib approach https://circumstances.run/@hipsterelectron/116121894991198670 where the enforcement of rejecting LLM contributions is stated in terms of your existing policies (in particular, copyright and review procedures).
i argue this is not a “cop-out” by any means, although it does avoid a direct ban. i’ll explain why i believe this to be the ideal approach:
first, explicitly stating your requirements to avoid copyright liability helps to shield your project later if someone contributes code that was taken from elsewhere (LLM or not). if you have precise requirements that each contribution must conform to, then the contributor generally becomes liable for misrepresenting their contribution, instead of “poisoning” your whole project. this is related to why the fsf famously requires copyright assignment statements
that may not be terribly interesting to most people, but i argue that this reframing is actually where you can do the work of convincing anyone on the fence (contributors, users, lurkers) that LLMs are harmful to them. if you go through the process (like gnulib) of describing the necessary caveats to make an LLM contribution allowable, i believe you also produce a persuasive document enumerating all the liabilities LLM users incur.
1) Code included in this package that comes from a single LLM prompt must be limited in size: it must be at most 5 lines long.
Rule 1 guarantees that the LLM generated code size is smaller than the “legally significant for copyright purposes” threshold, see
https://www.gnu.org/prep/maintain/html_node/Legally-Significant.htmlthis is not an outright ban! openslopware might write nasty things about you! but it is obviously and inarguably a de facto ban, and it goes even further by identifying LLM output as an inherent copyright liability, which is one of the actual dangers to your project that your policies protect against.
i’m aware most people aren’t interested in copyright. here’s the part that’s much more interesting:
2) As a submitter, you assert that you have reviewed such code that you submit.
Rule 2 encourages you to not submit unreviewed garbage.
LLM contributions of any size are impossible to review, because they weren’t actually written and the submitter has no theory of its operation. this makes LLMs part of the DDoS technique against unpaid maintainers, which can be weaponized by nation-state actors as in the xz-utils attack from 2023.
the review angle has two important corollaries:
(a) this gives you an opportunity to describe your policies around code review, including:
- what prerequisites contributors must be able to provide,
- what constitutes grounds for closing a diff thread,
- how contributors should escalate if they can’t understand how to proceed.
(b) it completely circumvents and discards any process for identifying LLM output not declared as such. this is incredibly important because debates over whether something is LLM output can be wielded manipulatively and disingenuously. in general, anyone accusing someone of “secretly” using an LLM should be understood as engaging in toxic behavior, and moderated as such. the LLM DDoS on maintainers and contributors also includes infighting spurred by distrust.
if someone “secretly” generates LLM output, and then goes through the review process and incorporates feedback while explaining behavior, then the result will likely end up constituting their own original work by the end. i strongly doubt any LLM user would be willing to do that work for code they didn’t write, but even if they do, you will have obviated the latent concerns about copyright as well as incorrect behavior.
if they can’t do that, and instead start whining, then their contribution won’t be accepted. in order to make it easy for maintainers to handle these situations, your policy for code reviews should give them clear conditions that justify closing a proposed contribution. this is a necessary defense against concern trolls. contributors should have an escalation path in case a mistake is made, but this should be extremely rare.
the final point i’d like to make on all this, and the reason i made this thread, is because the question of enforcement of an LLM ban can be contentious (thanks @mildsunrise for raising this here https://tech.lgbt/@mildsunrise/116582201650521369). in particular, i believe it’s ridiculously important to avoid the misconception that LLM output can be identified by looking at it. in fact, that is LLM industry propaganda, and snake oil LLM output recognizers form another product vertical for many LLM vendors!
the distinction between LLMs and other tooling to synthesize or generate code is specifically that there’s no theory of its operation, because it was always created as a complex way to steal someone else’s labor. like when astral stole credit for my zip file tricks without hiring me and created a CVE, the violence and the risk of labor theft comes from decoupling the skilled expert from the application domain.
since LLMs can and do just output verbatim copyrightable work, it’s not really true to say their output is distinguishable from real code—it is real code, just taken from somewhere else. so enforcement of LLM bans can’t whatsoever expect to identify LLM output, and enabling accusations of LLM usage enables toxicity and even xenophobia.instead, this is an opportunity to identify processes for code review that make maintainers’ and contributors’ jobs easier, which is broadly applicable beyond LLMs.
personally, i don’t consider it a benefit that the strategy described here avoids taking a stance on LLMs as an ethical matter. in fact, i personally would amend this to incorporate a non-binding statement in project policies describing LLM vendors as “immoral”, “extractive”, “imperalist”, and perhaps even “rapacious”, because i don’t want to have to deal with contributors who get upset by that.the main issue (evidenced by the wildly toxic and irresponsible openslopware list) is that accusations of LLM usage waste your maintainers’ time and can deeply hurt potential contributors—these should not be allowed whatsoever.
the point of policies and processes is to avoid having to adjudicate each individual contribution. when considering the LLM ban, you should use it as an opportunity to strengthen your governance model, so that you can approach new problems in the future without a BDFL.
I copied it there because far too often things tends to be deleted “out there”