It is mentioned later in the article but I think it's important to clearly draw a distinction between cases where a) The "offender" is using the licensed work within the letter of the license but not the spirit b) The "offender" has broken both the letter and spirit of the license.
I've licensed multiple repositories under MIT, written under CC-BY, and published games under ORC. All of those licenses require attribution, something that AI, for example, explicitly ignores. In those situations "Wait, no, not like that" isn't "I didn't expect you'd use it this way" it's "you weren't authorized to use it this way."
Listing all of the creators from whose attribution-licensed works an LLM (potentially) derived an output would seem to satisfy the letter of such licenses, but it is not clear that such would satisfy the spirit (which seems to assume a stronger causal link and a more limited number of attributions). If creators can be grouped outside of the creator naming explicitly associated with the works, this could degrade into "this work is derived from the works of humanity"; however, listing all human beings individually does not seem _meaningfully_ different and seems to satisfy the attribution requirement of such licenses.
From what little I understand of LLMs, the weight network developed by training on a large collection of inputs is similar to human knowledge in that some things will be clearly derived (at least in part) from a limited number of inputs but others might have no clear contributor density. If I wrote a human "superiority" science fiction story, I could be fairly confident that Timothy Zahn and Gordon R. Dickson "contributed"; however, this contribution would not be considered enough to violate copyright and require licensing. Some LLM outputs clearly violate copyright (e.g., near verbatim quotation of significant length), but other outputs seem to be more broadly derived.
If the law treats LLMs like humans ("fairly"), then broad derivation would not seem to violate copyright. This seems to lead toward "AI rights". I cannot imagine how concepts of just compensation and healthy/safe working conditions would apply to an AI. Can a corporation own a computer system than embodies an AI or is that slavery?
If the law makes special exceptions for LLMs, e.g., adjusting copyright such that fair use and natural learning only apply to human persons, then licensing would be required for training. However, attribution licenses seem to have the above-mentioned loophole. (That this loophole is not exploited may be laziness or concern about admitting that following the license is required — which makes less openly licensed/unlicensed works poisonous.)
If the purpose of copyright is to "promote the useful arts", then the law should reflect that purpose. Demotivating human creators and the sharing of their works seems destructive to that purpose, but LLMs are also enabling some creativity. Law should also incorporate general concepts such as equality under the law. LLMs also seem to have the potential for power concentration, which is also a concern for just laws.
Perhaps I am insufficiently educated on the tradeoffs to see an obvious solution, but this seems to me like a difficult problem.
From Holub on Patterns (2004), patterns are discovered, not invented. The implementation of patterns is the idiom, which may or may not be idiomatic based on a given community of practice.
If it can be shown multiple people independently created something, the artifact is not the pattern itself--but the pattern is recognized because of so many similar implementations.
LLMs create (or re-create) and derive idioms of things, based on weights which are idioms themselves (probabilistic patterns). Then we can only say AI may understand patterns of color theory, or idiomatic execution (art style)--but that is all.
---
1. They are willful, purposeful creatures who possess selves.
2. They interpret their behavior and act on the basis of their interpretations.
3. They interpret their own self-images.
4. They interpret the behavior of others to obtain a view of themselves, others, and objects.
5. They are capable of initiating behavior so as to affect the view of others have of them and that they have of themselves.
6. They are capable of initiating behavior to affect the behavior of others toward them.
7. Any meaning that children attach to themselves, others, and objects varies with respect to the physical, social, and temporal settings in which they find themselves.
8. Children can move from one social world to another and act appropriately in each world.
Big tech has no respect for licenses, or the law itself with Uber and the like.
People talking about licenses like they have some courtroom legitimacy is hilarious. Licenses are like patents, they are weapons of the large corporation to be used against other large corporations or the people.
Of course you can try to litigate for ten years against a big corporation with lawyers on retainer, good luck. Might even get an "Erin Brockovich" movie script out of it, but we're seeing the rule of law and legitimacy of the courts degrade rapidly and become increasingly corrupt, once over the years, but now by the day
The internet as an open resource has been a tremendous boon to society as a whole. AI, likewise, acts as a force multiplier on top of this knowledge. You can learn anything. But AI also depletes the resources on which it builds.
An obvious way to counteract this is for the AI companies to give back generously through monetary donations or, at the very least, attributions. But unfortunately we see exactly the opposite.
> But AI also depletes the resources on which it builds.
The article author is clearly assuming that, but I'm not sure whether it's even true to any meaningful extent. How many contributors to free/open content resources are even bothered that their work might end up being used for AI training?
Also, maybe AI firms should pay for the scanning and OCR text-extraction of existing paper-backed resources. There's a lot of, e.g. old academic research that's still not meaningfully available online, much of it is even free of copyright worldwide. If you care about ensuring that your AI's are adequately well-read, this is a bottleneck that might be especially easy to address.
I don’t think that’s what this would look like. It’s more like: people won’t even know Wikipedia exists or that there is something to contribute to, or that the model gained its knowledge from this resource. That’s why this pattern is so disingenuous in my opinion. When I was young we were surprised that Wikipedia existed. Future generations might not know that Wikipedia exists, or that you can contribute to it.
Current frontier models couldn’t exist in this form or wouldn’t exist at all without people putting in the work of writing down what they knew for free. And the model creators not even paying lip service to this, and instead saying they will replace Wikipedia, is hubristic and a very clear example of the tragedy of the commons.
> And the model creators not even paying lip service to this, and instead saying they will replace Wikipedia
What model creators are saying this? It's very easy to put this assertion to the test btw: just pick your favorite Wiki article, ask a LLM to "improve" the writing, and check how much stuff it ends up rewriting in a confusing way, or even gets outright wrong. Compared to your average high-quality Wiki article, a LLM is just a very confused parrot.
I think in a year LLMs are better at writing Wiki articles than the p90 quality Wikipedia page today.
They have access to _huge_ volumes of information—books, research papers, PhD dissertations, newsletters from experts, peer-reviewed articles. They're getting better every day at organizing it. Claude has a (beta) citations API, which forces the LLM to directly cite sources and use direct quotations.
They'll get better and better at managing translations—maybe not them translating, but them finding resources in other languages, using a translation tool, then pulling in that information.
Will they be perfect? No, but good lord is Wikipedia not either. Will they hallucinate, probably sometimes, but, again, the median Wikipedia author does not have a New Yorker fact checker at their side.
I think we overestimate how good humans are at this. And it's far easier to validate cited sources than it is to consume and craft the original writing.
It's not trained on "the commons", it's trained on a large number of individual products with their own ownership and licensing. If copyrights are being violated the restitution will go to those owners, not to "everyone".
This is a perfect modern example of the Tragedy of the Commons, where the absence of governance mechanisms or social contracts around a shared resource (open knowledge) leads to exploitation that threatens the resource's sustainability for everyone.
If implementing a full game theory solution is challenging, a minimal viable approach could combine:
- Wikimedia's Enterprise API model for high-volume users
- Technical measures to identify and throttle non-contributing scrapers
- Public transparency reports on AI company usage and contributions
- Industry certification program for "commons-friendly" AI development
This hybrid approach uses game theory principles to realign incentives while being practically implementable with current technologies and organizational structures.
> absence of governance mechanisms or social contracts around a shared resource (open knowledge) leads to exploitation that threatens the resource's sustainability for everyone.
an AI training on a source does not deplete the source in the digital realm. It's not like bits run out or rot.
The training potentially could remove a source of commercial revenue, such as advertising, which is only tangentially related to the knowledge/source. And it is not the commercial revenue which drove the initial development of said knowledge, even if said creator(s) of the knowledge got paid after the fact (let alone not paid - such as wikipedia editors).
Therefore, it's incorrect to associate tragedy of the commons to ai training draining the commons.
If you care about software freedom, you need to make all your software AGPL.
MIT is the "wait, no, not like that" license and GPL is a half-measure.
Non-commercial licenses are fine if you also provide a commercial option - who cares what the OSI thinks. (And you might want to look up who's a member of the OSI)
GenAIs don't care if your scrapables were licensed under AGPLv3 or GFDLv1 or were even proprietary, they ingest and become at-most MIT, then spits out its lesser parody under public domain.
Under some versions of copyright laws and interpretations conceived before emergence of generative AI, this is either considered fair use and/or legally exempted as scientific research, because use of even copyrighted data were highly transformative; AI even capable of regurgitating dataset were previously unheard of.
This nature of generative types of AIs are controversial, but progress of discussions in that regard - whether it's to be ultimately upheld or not - had been moving at a glacial pace(as in slow).
>Restricting what people can do with the software you write doesn't sound very free
It depends on POV, user freedom or developer/publisher freedom.
To make it easy to understand, I remember a quote from some book where a french and USAian were debating about what country is fmore free and slavery and the USAian says "We are more free since we are free to own slaves"
If you care about software, why would non-commercial licenses be fine under any condition?
First rule of free software is any goal, including commercial ones.
> MIT is the "wait, no, not like that" license
I don't get this. Someone who releases their under MIT, GPL or AGPL allows selling a copy on amazon and could think "wait, no, not like that" regardless.
I think the reasonable stance is to (kindly, humanly) ask, that one doesn't do such or such thing even if it's legal.
"First rule of free software is any goal, including commercial ones."
Yeah and that's what we disagree with. :P If Creative Commons has NC licences for art, why shouldn't software? Apart from programmers being too meek to assert their rights.
And yes, the OSI is mostly hyperscalers, that's fairly commonly known. They aren't interested in protecting the commons the slightest.
Even the JSON licence ("don't be evil") and stuff like that are better than MIT and friends - gives enough of a headache for bigcorp lawyers to think twice before stealing your stuff.
You were at least consistent up until this point, but this phrasing runs directly counter to the rest of your thesis. It's not stealing to use software that you put out as FOSS to be used by anyone for any purpose. If you didn't intend to do that, that's on you to have picked a license that reflected your intentions and to accept the consequences of that choice (mostly likely less interest in using it from everyone, not just megacorps).
There's no moral imperative to respect terms of use that existed only in the head of the developer—on the contrary, it's an immoral bait-and-switch to release your code as FOSS and then throw a fit when someone uses it to make money.
MIT explicitly gives the right to sublicense. That means you give certain rights to Alice, who gives fewer rights to Bob. Bob isn't allowed to give away copies of the software Alice gives her, because Alice, being a smart businesswoman, made sure Bob's license agreement is a proprietary one. She's not violating MIT, because she put your name and a copy of the MIT license in notices.txt, but the license doesn't actually apply to the software.
I think when you take something from the public domain and make it proprietary, that is close enough to stealing that it's appropriate to use the word colloquially.
Only if what you took from the public domain is actually taken—as in, no longer available to the public. The scarcity model that we use for the literal commons (the shared fields in a village) doesn't apply when we're talking about bits.
I think TFA is onto something by pointing out that there are very real ways that some of the recent use of the commons falls into the realm of abuse that harms the commons. But most of the kinds of usage that has people giving up on FOSS doesn't actually fall into that category.
You are entirely right about my word choice, good point! (it's a bit ironic in retrospect)
However, let's not pretend that choosing a licence is a fully informed decision free of any kind of pressure. If you pick a non-OSI licence, that has social costs (as you said, less interest from other developers for example)
The problem is two-sided: both those companies exploiting the FOSS landscape and the participants in the FOSS landscape more concerning themselves with uploading the status quo than to try to do something about the problem.
P.S.: The OSI are not even sellouts - they mostly consist of exactly those corporations themselves. The FSF are much better but the FSF's philosophy was mostly informed by RMS not being able to fix his printer in the 80's.... times change, y'know? A philosophy/worldview which worked for the 80s and the 90s might not be appropriate for the 2020s.
> If you pick a non-OSI licence, that has social costs (as you said, less interest from other developers for example)
Yes. People make a decision to choose from the FOSS licenses because being seen as FOSS is valuable to them. That comes with tradeoffs, and it's unethical to expect to receive the benefits but not the drawbacks of your chosen license.
> P.S.: The OSI are not even sellouts - they mostly consist of exactly those corporations themselves.
I know. I find the bellyaching about the "spirit of Open Source" to be pretty ironic given that Open Source exists to "dump the moralizing and confrontational attitude that had been associated with "free software" in the past and sell the idea strictly on the same pragmatic, business-case grounds that had motivated Netscape" [0].
> those companies exploiting the FOSS landscape
Exploiting how? What harm does $MEGACORP using a FOSS project do to the FOSS project? Does it hurt that it gives them more credibility? For there to be exploitation there has to be a quantifiable harm to the exploited victim—it has to be win-lose, not win-neutral or win-win. So where's the loss to the FOSS maintainers as victims?
If it's GPL, or AGPL, someone can sell a copy on Amazon, but it's pretty pointless because the person who buys it is allowed to give away more copies for free.
If it's MIT, the person who buys it ISN'T allowed to give away more copies for free if the seller doesn't want them to. MIT means SOMEONE ELSE can (for all practical purposes) copyright YOUR software, and all you get is your name in a notices.txt file.
They should have put Free Software in capitals to be more clear, but they're referring to the Four Essential Freedoms as defined by the Free Software Foundation [0]:
> The freedom to run the program as you wish, for any purpose (freedom 0). ... “Free software” does not mean “noncommercial.” On the contrary, a free program must be available for commercial use, commercial development, and commercial distribution. This policy is of fundamental importance—without this, free software could not achieve its aims.
You're welcome to come up with a different set of coherent rules for an ethical model of software development, but to avoid confusion it would be best to use a different label than "free software" so we don't overload the term with conflicting definitions.
What I worry about is once the robots become the main source of information then how will the corps restrict what information is fed into them to support whatever bias they have.
For example, just yesterday someone posted "noam chomsky is a genocide denier" so I went internet sleuthing to see what they were talking about. I first asked google and then ended up on the "Bosnian genocide denial" wikipedia page. I read the argument and checked the sources and concluded that, maybe, someone could make that claim.
Today, in response to TFA, I asked deepseek and received a well-rounded and, IMHO, unbiased response to the same question which summarized the arguments from both sides. The only problem is they cite no sources so you just have to trust the response or do as I did yesterday and go to the google.
Personally, if someone makes an extraordinary claim I'm going to go digging to find out what they're talking about, if their argument is based on fact and if you can draw their conclusion from the facts. Take that ability away and we're just a bunch of sheep for the Silicon Valley Billionaires Club to fleece.
It is mentioned later in the article but I think it's important to clearly draw a distinction between cases where a) The "offender" is using the licensed work within the letter of the license but not the spirit b) The "offender" has broken both the letter and spirit of the license.
I've licensed multiple repositories under MIT, written under CC-BY, and published games under ORC. All of those licenses require attribution, something that AI, for example, explicitly ignores. In those situations "Wait, no, not like that" isn't "I didn't expect you'd use it this way" it's "you weren't authorized to use it this way."
Listing all of the creators from whose attribution-licensed works an LLM (potentially) derived an output would seem to satisfy the letter of such licenses, but it is not clear that such would satisfy the spirit (which seems to assume a stronger causal link and a more limited number of attributions). If creators can be grouped outside of the creator naming explicitly associated with the works, this could degrade into "this work is derived from the works of humanity"; however, listing all human beings individually does not seem _meaningfully_ different and seems to satisfy the attribution requirement of such licenses.
From what little I understand of LLMs, the weight network developed by training on a large collection of inputs is similar to human knowledge in that some things will be clearly derived (at least in part) from a limited number of inputs but others might have no clear contributor density. If I wrote a human "superiority" science fiction story, I could be fairly confident that Timothy Zahn and Gordon R. Dickson "contributed"; however, this contribution would not be considered enough to violate copyright and require licensing. Some LLM outputs clearly violate copyright (e.g., near verbatim quotation of significant length), but other outputs seem to be more broadly derived.
If the law treats LLMs like humans ("fairly"), then broad derivation would not seem to violate copyright. This seems to lead toward "AI rights". I cannot imagine how concepts of just compensation and healthy/safe working conditions would apply to an AI. Can a corporation own a computer system than embodies an AI or is that slavery?
If the law makes special exceptions for LLMs, e.g., adjusting copyright such that fair use and natural learning only apply to human persons, then licensing would be required for training. However, attribution licenses seem to have the above-mentioned loophole. (That this loophole is not exploited may be laziness or concern about admitting that following the license is required — which makes less openly licensed/unlicensed works poisonous.)
If the purpose of copyright is to "promote the useful arts", then the law should reflect that purpose. Demotivating human creators and the sharing of their works seems destructive to that purpose, but LLMs are also enabling some creativity. Law should also incorporate general concepts such as equality under the law. LLMs also seem to have the potential for power concentration, which is also a concern for just laws.
Perhaps I am insufficiently educated on the tradeoffs to see an obvious solution, but this seems to me like a difficult problem.
From Holub on Patterns (2004), patterns are discovered, not invented. The implementation of patterns is the idiom, which may or may not be idiomatic based on a given community of practice.
If it can be shown multiple people independently created something, the artifact is not the pattern itself--but the pattern is recognized because of so many similar implementations.
LLMs create (or re-create) and derive idioms of things, based on weights which are idioms themselves (probabilistic patterns). Then we can only say AI may understand patterns of color theory, or idiomatic execution (art style)--but that is all.
---
1. They are willful, purposeful creatures who possess selves.
2. They interpret their behavior and act on the basis of their interpretations.
3. They interpret their own self-images.
4. They interpret the behavior of others to obtain a view of themselves, others, and objects.
5. They are capable of initiating behavior so as to affect the view of others have of them and that they have of themselves.
6. They are capable of initiating behavior to affect the behavior of others toward them.
7. Any meaning that children attach to themselves, others, and objects varies with respect to the physical, social, and temporal settings in which they find themselves.
8. Children can move from one social world to another and act appropriately in each world.
-- The Private Worlds of Dying Children (1978)
Think of it in the positive way. They also don't attribute all rights reserved works.
Big tech has no respect for licenses, or the law itself with Uber and the like.
People talking about licenses like they have some courtroom legitimacy is hilarious. Licenses are like patents, they are weapons of the large corporation to be used against other large corporations or the people.
Of course you can try to litigate for ten years against a big corporation with lawyers on retainer, good luck. Might even get an "Erin Brockovich" movie script out of it, but we're seeing the rule of law and legitimacy of the courts degrade rapidly and become increasingly corrupt, once over the years, but now by the day
This reminds me of “Fuck you, pay me” a talk[0] given by Mike Monteiro on contract work (I believe the title is based on a quote from Goodfellas[1]).
[0] https://m.youtube.com/watch?v=jVkLVRt6c1U
[1] https://m.youtube.com/watch?v=P4nYgfV2oJA
Great post. I love the vampirism metaphor.
The internet as an open resource has been a tremendous boon to society as a whole. AI, likewise, acts as a force multiplier on top of this knowledge. You can learn anything. But AI also depletes the resources on which it builds.
An obvious way to counteract this is for the AI companies to give back generously through monetary donations or, at the very least, attributions. But unfortunately we see exactly the opposite.
> But AI also depletes the resources on which it builds.
The article author is clearly assuming that, but I'm not sure whether it's even true to any meaningful extent. How many contributors to free/open content resources are even bothered that their work might end up being used for AI training?
Also, maybe AI firms should pay for the scanning and OCR text-extraction of existing paper-backed resources. There's a lot of, e.g. old academic research that's still not meaningfully available online, much of it is even free of copyright worldwide. If you care about ensuring that your AI's are adequately well-read, this is a bottleneck that might be especially easy to address.
> How many contributors to free/open content resources are even bothered that their work might end up being used for AI training?
StackOverflow’s response is a clear counterpoint.
I don’t think that’s what this would look like. It’s more like: people won’t even know Wikipedia exists or that there is something to contribute to, or that the model gained its knowledge from this resource. That’s why this pattern is so disingenuous in my opinion. When I was young we were surprised that Wikipedia existed. Future generations might not know that Wikipedia exists, or that you can contribute to it.
Current frontier models couldn’t exist in this form or wouldn’t exist at all without people putting in the work of writing down what they knew for free. And the model creators not even paying lip service to this, and instead saying they will replace Wikipedia, is hubristic and a very clear example of the tragedy of the commons.
> And the model creators not even paying lip service to this, and instead saying they will replace Wikipedia
What model creators are saying this? It's very easy to put this assertion to the test btw: just pick your favorite Wiki article, ask a LLM to "improve" the writing, and check how much stuff it ends up rewriting in a confusing way, or even gets outright wrong. Compared to your average high-quality Wiki article, a LLM is just a very confused parrot.
I think in a year LLMs are better at writing Wiki articles than the p90 quality Wikipedia page today.
They have access to _huge_ volumes of information—books, research papers, PhD dissertations, newsletters from experts, peer-reviewed articles. They're getting better every day at organizing it. Claude has a (beta) citations API, which forces the LLM to directly cite sources and use direct quotations.
They'll get better and better at managing translations—maybe not them translating, but them finding resources in other languages, using a translation tool, then pulling in that information.
Will they be perfect? No, but good lord is Wikipedia not either. Will they hallucinate, probably sometimes, but, again, the median Wikipedia author does not have a New Yorker fact checker at their side.
I think we overestimate how good humans are at this. And it's far easier to validate cited sources than it is to consume and craft the original writing.
If you train your AI on the commons, everything it generates should be in the commons. And all your profits should be shared with everyone.
If running your AI is incompatible with respecting copyright and intellectual property then you should not get to own a single bit of its output.
It's not trained on "the commons", it's trained on a large number of individual products with their own ownership and licensing. If copyrights are being violated the restitution will go to those owners, not to "everyone".
>If you train your AI on the commons, everything it generates should be in the commons.
Isn't AI output not copyrightable already?
This is a perfect modern example of the Tragedy of the Commons, where the absence of governance mechanisms or social contracts around a shared resource (open knowledge) leads to exploitation that threatens the resource's sustainability for everyone.
If implementing a full game theory solution is challenging, a minimal viable approach could combine:
- Wikimedia's Enterprise API model for high-volume users - Technical measures to identify and throttle non-contributing scrapers - Public transparency reports on AI company usage and contributions - Industry certification program for "commons-friendly" AI development
This hybrid approach uses game theory principles to realign incentives while being practically implementable with current technologies and organizational structures.
> absence of governance mechanisms or social contracts around a shared resource (open knowledge) leads to exploitation that threatens the resource's sustainability for everyone.
an AI training on a source does not deplete the source in the digital realm. It's not like bits run out or rot.
The training potentially could remove a source of commercial revenue, such as advertising, which is only tangentially related to the knowledge/source. And it is not the commercial revenue which drove the initial development of said knowledge, even if said creator(s) of the knowledge got paid after the fact (let alone not paid - such as wikipedia editors).
Therefore, it's incorrect to associate tragedy of the commons to ai training draining the commons.
I always appreciate a good post by Molly White :D
This is such good writing, and manages to offer a nuanced and informative new angle on an issue which has already been discussed at great length.
If you care about software freedom, you need to make all your software AGPL.
MIT is the "wait, no, not like that" license and GPL is a half-measure.
Non-commercial licenses are fine if you also provide a commercial option - who cares what the OSI thinks. (And you might want to look up who's a member of the OSI)
GenAIs don't care if your scrapables were licensed under AGPLv3 or GFDLv1 or were even proprietary, they ingest and become at-most MIT, then spits out its lesser parody under public domain.
Under some versions of copyright laws and interpretations conceived before emergence of generative AI, this is either considered fair use and/or legally exempted as scientific research, because use of even copyrighted data were highly transformative; AI even capable of regurgitating dataset were previously unheard of.
This nature of generative types of AIs are controversial, but progress of discussions in that regard - whether it's to be ultimately upheld or not - had been moving at a glacial pace(as in slow).
Restricting what people can do with the software you write doesn't sound very free
Restricting people's freedom to restrict other people's freedom ensures maximum freedom overall. That's why we have prisons for thieves and killers.
>Restricting what people can do with the software you write doesn't sound very free
It depends on POV, user freedom or developer/publisher freedom. To make it easy to understand, I remember a quote from some book where a french and USAian were debating about what country is fmore free and slavery and the USAian says "We are more free since we are free to own slaves"
If you care about software, why would non-commercial licenses be fine under any condition?
First rule of free software is any goal, including commercial ones.
> MIT is the "wait, no, not like that" license
I don't get this. Someone who releases their under MIT, GPL or AGPL allows selling a copy on amazon and could think "wait, no, not like that" regardless.
I think the reasonable stance is to (kindly, humanly) ask, that one doesn't do such or such thing even if it's legal.
"First rule of free software is any goal, including commercial ones."
Yeah and that's what we disagree with. :P If Creative Commons has NC licences for art, why shouldn't software? Apart from programmers being too meek to assert their rights.
And yes, the OSI is mostly hyperscalers, that's fairly commonly known. They aren't interested in protecting the commons the slightest.
Even the JSON licence ("don't be evil") and stuff like that are better than MIT and friends - gives enough of a headache for bigcorp lawyers to think twice before stealing your stuff.
> before stealing your stuff.
You were at least consistent up until this point, but this phrasing runs directly counter to the rest of your thesis. It's not stealing to use software that you put out as FOSS to be used by anyone for any purpose. If you didn't intend to do that, that's on you to have picked a license that reflected your intentions and to accept the consequences of that choice (mostly likely less interest in using it from everyone, not just megacorps).
There's no moral imperative to respect terms of use that existed only in the head of the developer—on the contrary, it's an immoral bait-and-switch to release your code as FOSS and then throw a fit when someone uses it to make money.
It's a derivative work and stripping the license violates it. Why do people repeat this stupid corporate propaganda?
I'm really confused—are you talking about AI training? I'm talking about corporations building systems on top of FOSS.
Also, why did you feel the need to create a throwaway for this comment?
MIT explicitly gives the right to sublicense. That means you give certain rights to Alice, who gives fewer rights to Bob. Bob isn't allowed to give away copies of the software Alice gives her, because Alice, being a smart businesswoman, made sure Bob's license agreement is a proprietary one. She's not violating MIT, because she put your name and a copy of the MIT license in notices.txt, but the license doesn't actually apply to the software.
I think when you take something from the public domain and make it proprietary, that is close enough to stealing that it's appropriate to use the word colloquially.
Only if what you took from the public domain is actually taken—as in, no longer available to the public. The scarcity model that we use for the literal commons (the shared fields in a village) doesn't apply when we're talking about bits.
I think TFA is onto something by pointing out that there are very real ways that some of the recent use of the commons falls into the realm of abuse that harms the commons. But most of the kinds of usage that has people giving up on FOSS doesn't actually fall into that category.
You are entirely right about my word choice, good point! (it's a bit ironic in retrospect)
However, let's not pretend that choosing a licence is a fully informed decision free of any kind of pressure. If you pick a non-OSI licence, that has social costs (as you said, less interest from other developers for example)
The problem is two-sided: both those companies exploiting the FOSS landscape and the participants in the FOSS landscape more concerning themselves with uploading the status quo than to try to do something about the problem.
P.S.: The OSI are not even sellouts - they mostly consist of exactly those corporations themselves. The FSF are much better but the FSF's philosophy was mostly informed by RMS not being able to fix his printer in the 80's.... times change, y'know? A philosophy/worldview which worked for the 80s and the 90s might not be appropriate for the 2020s.
> If you pick a non-OSI licence, that has social costs (as you said, less interest from other developers for example)
Yes. People make a decision to choose from the FOSS licenses because being seen as FOSS is valuable to them. That comes with tradeoffs, and it's unethical to expect to receive the benefits but not the drawbacks of your chosen license.
> P.S.: The OSI are not even sellouts - they mostly consist of exactly those corporations themselves.
I know. I find the bellyaching about the "spirit of Open Source" to be pretty ironic given that Open Source exists to "dump the moralizing and confrontational attitude that had been associated with "free software" in the past and sell the idea strictly on the same pragmatic, business-case grounds that had motivated Netscape" [0].
> those companies exploiting the FOSS landscape
Exploiting how? What harm does $MEGACORP using a FOSS project do to the FOSS project? Does it hurt that it gives them more credibility? For there to be exploitation there has to be a quantifiable harm to the exploited victim—it has to be win-lose, not win-neutral or win-win. So where's the loss to the FOSS maintainers as victims?
[0] http://web.archive.org/web/20071115150105/https://opensource...
If it's GPL, or AGPL, someone can sell a copy on Amazon, but it's pretty pointless because the person who buys it is allowed to give away more copies for free.
If it's MIT, the person who buys it ISN'T allowed to give away more copies for free if the seller doesn't want them to. MIT means SOMEONE ELSE can (for all practical purposes) copyright YOUR software, and all you get is your name in a notices.txt file.
Maybe that's a rule you have but don't speak for the rest of us.
They should have put Free Software in capitals to be more clear, but they're referring to the Four Essential Freedoms as defined by the Free Software Foundation [0]:
> The freedom to run the program as you wish, for any purpose (freedom 0). ... “Free software” does not mean “noncommercial.” On the contrary, a free program must be available for commercial use, commercial development, and commercial distribution. This policy is of fundamental importance—without this, free software could not achieve its aims.
You're welcome to come up with a different set of coherent rules for an ethical model of software development, but to avoid confusion it would be best to use a different label than "free software" so we don't overload the term with conflicting definitions.
[0] https://www.gnu.org/philosophy/free-sw.en.html#four-freedoms
What I worry about is once the robots become the main source of information then how will the corps restrict what information is fed into them to support whatever bias they have.
For example, just yesterday someone posted "noam chomsky is a genocide denier" so I went internet sleuthing to see what they were talking about. I first asked google and then ended up on the "Bosnian genocide denial" wikipedia page. I read the argument and checked the sources and concluded that, maybe, someone could make that claim.
Today, in response to TFA, I asked deepseek and received a well-rounded and, IMHO, unbiased response to the same question which summarized the arguments from both sides. The only problem is they cite no sources so you just have to trust the response or do as I did yesterday and go to the google.
Personally, if someone makes an extraordinary claim I'm going to go digging to find out what they're talking about, if their argument is based on fact and if you can draw their conclusion from the facts. Take that ability away and we're just a bunch of sheep for the Silicon Valley Billionaires Club to fleece.