and we should probably look at alcoholic liver disease as an expression of capitalism.
data is bytes. period. your suggestion rests on someone else seeing how it is the case and dealing with it to provide you with ways of abstraction you want. but there is an infinity of possible abstractions – while virtual memory model is a single solid ground anyone can rest upon. you’re modeling your problems on a machine – have some respect for it.
in other words – most abstractions are a front-end to operations on bytes. it’s ok to have various designs, but making lower layers inaccessible is just sad.
i say it’s the opoposite – it’s 2025, we should stop stroking the imaginaries of the 80s and return to the actual. just invest in making it as ergonomic and nimble as possible.
i find it hard understand why some programmers are so intent on hiding from the space they inhabit.
these comparisons of llms with human artists copying are just ridiculous. it’s saying “well humans are allowed to break twigs and damage the planet in various ways, so why not allow building a fucking DEATH STAR”.
abstracting llms from their operators and owners and possible (and probable) ends and the territories they trample upon is nothing short of eye-popping to me. how utterly negligent and disrespectful of fellow people must one be at the heart to give any credence to such arguments
The problem isn't that people aren't aware that the scale and magnitude differences are large and significant.
It's that the space of intellectual property LAW does not handle the robust capabilities of LLMs. Legislators NEED to pass laws to reflect the new realities or else all prior case law relies on human analogies which fail in the obvious ways you alluded to.
If there was no law governing the use of death stars and mass murder, and the only legal analogy is to environmental damage, then the only crime the legal system can ascribe is mass environmental damage.
Why do you think the obvious analogy is LLM=Human, and not LLM=JPEG or LLM=database?
I think you're overstating the legal uniqueness of LLMs. They're covered just fine by the existing legal precedents around copyrighted and derived works, just as building a death star would be covered by existing rules around outer space use and WMDs. Pretending they should be treated differently is IMO the entire lie told by the "AI" companies about copyright.
The google news snippets case is, in my non-lawyer opinion, the most obvious touch point. And in that case, it was decided that providing large numbers of snippets in search results was non-infringing, despite being a case of copying text from other people at-scale... And the reasons this was decided are worth reading and internalizing.
There is not an obvious right answer here. Copyright rules are, in fact, Calvinball, and we're deep in uncharted territory.
Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material. And the output directly competes against the copyrighted source materials.
The fact they're smudgy and non-deterministic doesn't change how they relate to the rights of authors and artists.
Nothing in copyright law talks about 'semantic meaning' or 'character of the source material'. Really, quite the opposite - the 'expression-idea dichotomy' says that you're copyrighting the expression of an idea, not the idea itself.
https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...
(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)
And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.
>Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material.
That sounds like you're arguing that they should be legal. Copyright law protects specific expressions, not handwavy "smudgy and non-deterministic" things.
I'll remind you that all fanart is technically in a gray area of copyright infringement. Legally speaking, companies can take down and charge infringement for anything using their IP thars not under fair use. Collages don't really pass that benchmark.
Yoinnking their up and mass producing slop sure is a line to cross, though.
I'm not an expert, but I thought fan art that people try to monetize in some form is explicitly illegal unless it's protected by parody, and any non commercial "violations" of copyright is totally legal. Disney can't stop me from drawing Mickey in the privacy of my own house, just monetizing/getting famous off of them.
The difference is we're humans, so we get special privileges. We made the laws.
If we're going to be giving some rights to LLMs for convenient for-profit ventures, I expect some in-depth analysis on whether that is or is not slavery. You can't just anthropomorphize a computer program when it makes you money but then conveniently ignore the hundreds of years of development of human rights. If that seems silly, then I think LLMs are probably not like humans and the comparisons to human learning aren't justified.
If it's like a human, that makes things very complicated.
Scales of effect always come into play when enacting law. If you spend a day digging a whole on the beach, you're probably not going to incur much wrath. If you bring a crane to the beach, you'll be stopped because we know the hole that can be made will disrupt the natural order. A human can do the same thing eventually, but does it so slowly that it's not an issue to enforce 99.9% of the time.
That's just the usual hand-wavy, vague "it's different" argument. If you want to justify treating the cases differently based on a fundamental difference, you need to be more specific. For example, they usually define an amount of rainwater you can collect that's short of disrupting major water flows.
So what is the equivalent of "digging too much" in a beach for AI? What fundamentally changes when you learn hyper-fast vs just read a bunch of horror novels to inform better horror novel-writing? What's unfair about AI compared to learning from published novels about how to properly pace your story?
These are the things you need to figure out before making a post equating AI learning with copyright infringement. "It's different" doesn't cut it.
If they were a database, they would be unquestionably legal, because they're only storing a tiny fraction of one percent of the data from any document, and even that data is not any particular replica of any part of the document, but highly summarized and transformed.
> these comparisons of llms with human artists copying are just ridiculous.
I've come to think of this as the "Performatively failing to recognize the difference between an organism and a machine" rhetorical device that people employ here and elsewhere.
The person making the argument is capable of distinguishing the two things, they just performatively choose not to do so.
>The person making the argument is capable of distinguishing the two things, they just performatively choose not to do so.
I think that sort of assumption of insincerity is worse than what you're accusing them of. You might not like their argument, but it's not inherently incorrect for them to argue that because humans have the right to do something, humans have the right to use tools to do that something and humans have the right to group together and use those tools to do something at a large scale.
My issue is that your rhetoric of "performatively conflating an organism and a machine" doesn't address the core issue of "humans can learn from art why can't machines". You're essentially saying that you don't like the question so you're refusing to answer it. There is nothing inherently wrong with training machines on existing data, if you want us to believe there is, you need to have some argument about what that would be the case.
Is your argument simply about your interpretation of copyright law and your mentality being that laws are good and breaking them is bad? Because that doesn't seem to be a very informed position to take.
My stated opinion is anyone who comes to an AI conversation and says "I can't tell the difference between organisms and computers" or some variation thereof does in fact have no trouble in practice distinguishing between between their child/ mom/ dad/ BFF and ChatGPT as is in fact questioning from a position of bad faith.
"There is nothing inherently wrong with training machines on existing data..." doesn't really conflate a machine with an organism and isn't what I'm talking about.
If you instead had written "I can read the Cat in the Hat to teach my kid to read why can't I use it to train an LLM?"
Then I do think you would be asking with a certain degree of bad faith, you are perfectly capable of distinguishing those two things, in practice, in your everyday life. You do not in fact see them as equivilent.
Your rhetorical choice to be unable to tell the difference would be performative.
You seem to think I'm arguing copyright policy. I really am discussing rhetoric.
It's a very consistently Silicon Valley mindset. Seems like almost every company that makes it big in tech, be it Facebook and Google monetizing our personal data, or Uber and Amazon trampling workers' rights, makes money by reducing people to objects that can be bought and sold, more than almost any other industry. No matter the company, all claimed prosocial intentions are just window dressing to convince us to be on board with our own commodification.
That's also why I'm really not worried about the "AI singularity" folks. The hype is IMO blatantly unsubstantiated by the actual capabilities, but gets pushed anyway only because it speaks to this deep-seated faith held across the industry. "AI" is the culmination of an innate belief that people should be replaceable, fungible, perfectly obedient objects, and such a psychosis blinds decision-makers to its actual limits. Only trouble is whether they have the political power to try to force it anyway.
> That's also why I'm really not worried about the "AI singularity" folks. The hype is IMO blatantly unsubstantiated by the actual capabilities, but gets pushed anyway only because it speaks to this deep-seated faith held across the industry. "AI" is the culmination of an innate belief that people should be replaceable, fungible, perfectly obedient objects, and such a psychosis blinds decision-makers to its actual limits. Only trouble is whether they have the political power to try to force it anyway.
I'm worried because decision-makers genuinely don't seem to be bothered very much by actual capabilities, and are perfectly happy to trade massive reductions in quality for cost savings. In other worse, I don't think the limits of LLMS will actually constrain the decision-makers.
It will when it inevitably hits their wallets. Be it via the public rejection of a lower quality product, or court orders. But both sentiments move slow, so we're in here for a while.
Even with NFTs it still was a full year+ of everyone trying to shill them out before the sentiment turned. Machine learning, meanwhile, is actually useful but is being shoved into every hole.
there’s 16 contiguously stored pointers to 16 non-contiguously stored tstructs (they may be contiguously stored, but you can’t make this assumption from this type). there’s 16 contiguously stored tstructcontainingnode’s.
a little good faith and a little less grump would go a long way. not the first, nor second time you’re being harsh on your peers for no apparent reason, reading shit into what they say in a most inhospitable way, presenting yourself as some clairvoyant into intentions of others. it’s just not healthy, man. no one’s attacking you or your precious visions here.
it’s not at all “clear” from the comment that the author thinks what you say he does. what was said in the original comment aligns well with zig’s “only one (obvious) way to do things” and its explicitness. other languages offer a much broader vocabularies, higher-level overlapping toolsets, while zig is much more constrained and requires the user to do the work herself, hence “Zig fans like to wrestle with the features of Zig to figure out how to fit their solutions within the constraints of the language”, which is an objective fact.
reply