Hacker News new | past | comments | ask | show | jobs | submit login
Artificial Intelligence and Copyright: Request for comments (federalregister.gov)
213 points by cpeterso 8 months ago | hide | past | favorite | 308 comments



I believe we first need to answer the question of whether the copyright of the AI model’s source text or images affects the output.

My opinion — and note I’m a software engineer, not a lawyer — is that an AI, being a statistical model and not generally intelligent, should not be allowed to disregard the copyright of its source material. This would, I think, require the AI’s creator to secure a license for all of its sources that allows this sort of transformation and presentation. And further, a user of the AI would themselves require a license to use the output.

The alternative seems to be “anything goes”.


I don’t think it makes sense for both model builders and the model’s users to separately obtain licenses for the same works used in the training set.

A model trained on several copyrighted data sources cannot somehow be used in a way depending on a subset of those sources.

So all parameters of usage and compensation should be settled by contract between the model builder and copyrighted data supplier, before the copyrighted material is used.

Or to put it simply: using copyrighted material to create a model would NOT be considered fair use.

That’s it. That’s the standard. No complicated new laws required.

Model builders obtain permission to use copyrighted material from copyright holders based on any terms both agree to.

Terms might involve model usage limits, term limits, one time compensation, per use compensation, data source credits, or anything else either party wants.

The likely result will be some standard sets of terms becoming popular and well known. But nobody has to agree to anything they don’t want to.


I slightly disagree, in that I think the person using the tool should bear the burden of copyright. I.e. if the model outputs something under copywrite it merely can't be republished. In this same way, i can use Photoshop on proprietary data but I can't necessarily sell the results.


I'm so torn. On one hand, what you suggest seems to be a nearly ideal balance between advancing scientific progress and legal liability. By placing the legal burden to publish generated works on the person actually trying to publish, it allows for a more nuanced legal approach (i.e. the difference between "there are similarities to this work, but it's murky" or "you %100 stole that work").

On the other hand, is the company running the model themselves not already publishing all of that work and profiting from it? It seems unfair that their bottom line gets to be bolstered because they can produce work based on any artist, whereas the consumers of that work may need to end up walking on egg shells in order to publish them.

Like I said, I'm torn as far as how it "should be". I know how I want it to be though. I would love if AI continued training unabated. The results have been amazing, and I believe it would be a shame if the effort was slowed down by legislation.


> is the company running the model themselves not already publishing all of that work and profiting from it?

no, because the model is transformative enough that it cannot be said to be a derivative works of the training set.

The model is in essence a form of distilled information, extracted from the training set. Information cannot be copyrighted - only expressions can.

Therefore, a model producer should have the right to use any pre-existing work, in the same way a person can, to study and internally memorize and extract information.

The reproduction of any of the training set data constitutes a copyright violation, but this is not done by the owner of the model, but by an end user of the model.


My point is that if a court finds that a generated image is indeed similar enough to constitute an infringement when a subscriber of for instance MidJourney attempts to publish it, has that work not already been "published" to the subscriber? And has MidJourney not profited by gaining a subscriber based on the work of others?


I wonder if that analogy represents the same thing. Speaking purely from a non-legal perspective on the ethics in my mind:

When you use Photoshop on propriety data you're providing the original data and choosing what manipulation to make (i.e. what tool) and directly creating the output. It makes sense that if you redistribute this it may be copyright violation.

When you use Copilot or ChatGPT for programming you're typically asking a non-proprietary question or accepting suggestions it's making based on non-proprietary (or proprietary to you) code in the file. You also don't dictate the manipulation process a black box deep learning model does (i.e. I haven't asked it to do something that could be reasonably thought to be a copyright violation).

Am I then responsible for the fact that Copilot is fooling me with effectively copy-pasted copyrighted code when it's being presented to me as generated by the software and I haven't instructed the software to commit a copyright violation? I'm not sure if intent matters for copyright, I assume it doesn't but perhaps that's a missing piece to this.

Diffusion models are gray to me, if you're asking/prompting with "Mickey Mouse riding a horse" I can see the argument that the prompt itself can be interpreted as asking the model to commit copyright violation and the user is just hiding behind a layer of abstraction. If I ask the model to spit out "a picture of a smiling cartoon woman" and it generates a Betty Boop lookalike is that still the users fault?

It seems to me like passing the burden to the user could be reasonable but would need some safe harbor type of exception. It'll be really interesting to see what the courts decide.


I see 2 problems with that.

(1) how do you know if the image that just generated is substantially similar to an existing copyright work? Maybe if some registration tool existed, but other wise the burden is too great

(2) what is stopping someone from generating millions of images and copy righting all the "unique" ones? Such that no one can create anything without accidental collisions.


> how do you know if the image that just generated is substantially similar to an existing copyright work?

This is already a problem with biological neural nets (i.e. humans). I remember as a teenager writing a simple song on the piano, and playing it for my mom; she said, "You didn't write that -- that's Gilligan's Island!" And indeed it was. If I had made a record and sold it, whoever owned the rights to the Gilligan's Island theme song could have sued me for it, and they would (rightly) have won.

There's already loads of case law about this; the same thing would apply to AI.

> what is stopping someone from generating millions of images and copy righting all the "unique" ones? Such that no one can create anything without accidental collisions.

Right now what's stopping it is that only humans can make copyrightable material; whatever is spat out from a computer is effectively public domain, not copyrighted.


1. lots of established law and case law (at least in the US), this is already a well-settled problem and folks have the tools and proper venue to bring infringement claims. Yes, federal copyright infringement litigation is prohibitively expensive for many issues. There is a now a "small claims court" for smaller issues. [1]

2. Those works cannot be copyrighted (at least in the US). [2]. And hey, someone already tried copyrighting every song melody [3]

[1]: https://copyright.gov/about/small-claims/

[2]: https://www.federalregister.gov/documents/2023/03/16/2023-05...

[3]: https://www.youtube.com/watch?v=sJtm0MoOgiU


But that problem is already solved.

Copyright holders are already protected from (I.e. can legally prohibit) distribution of obvious copies, or clearly derivative works.

Regardless of how they were produced by hand, copy machine, Photoshop or with a model.

The new problem is that artists styles are being “stolen” by incorporating their copyrighted work into models without their permission.

And that problem can easily be solved if using copyrighted material to create models is declared NOT fair use.

Artists could still allow models to be built from their work, but on their terms. If they wish to do that.

A famous artist, that doesn’t mind being commercial, could sell their own unique model to let fans create art in that artist’s style, while not having their style “ripped” by others.

Or just keep their style to themselves, for their own work, as artists have done for centuries.

(Of course, with greater effort, their style could still be recreated - styles are not protected unless they are trademarked - but the recreation would have to be done without using the artist’s copyrighted works.)


This is probably a somewhat unpopular opinion on HN, but it is where many of the artists I work with are generally trying to get to. Consent, compensation, and credit.


> Consent, compensation, and credit.

I just want to quote you. Nothing I need to say. That’s it.


This is the best path forward I think. And it will become increasingly sensible as things continue to evolve. AI wasn't necessary to violate copyright before, and it isn't necessary today.

The determination of copyright violation should be made against the output of the model in the event that someone uses it for commercial purposes.

If the models have a risk of generating copyrighted content, it will be up to the consumers of the system to mitigate that risk through manual review or automated checks of the output.


A divergence, but I see a lot of posters asserting that "humans learn by copying other people, but we don't call that a violation of copyright when they draw"

People casually asserting that software is equivalent to humanity will be a non-negligible thing to consider, as irritating and poorly-founded as it seems.

If the reproduction isn't pixel-perfect, but merely obvious and overwhelming, how do you refute that philosophically to people who refuse a distinction between 50GB and a human life?


> People casually asserting that software is equivalent to humanity will be a non-negligible thing to consider, as irritating and poorly-founded as it seems.

> If the reproduction isn't pixel-perfect, but merely obvious and overwhelming, how do you refute that philosophically to people who refuse a distinction between 50GB and a human life?

Software equivalence to humanity is a very philosophical question that many sci-fi writers have approached. But our primary issue related to this technology does not depend on anyone making a determination there.

The challenge is that losses to livelihood from this technology are going to come from far broader impacts than copyright alone. Copyright disputes are just the first things to get everyone's attention.

Let's say we err on the side of protection of copyright, and all training data must be fully licensed, in addition to users being responsible for ensuring outputs did not accidentally reproduce something similar to a copyrighted work, even if it was part of the licensed training dataset. Great! This fixes the problem of lost value for the owners of copyrights. Companies will face a slight delay and slightly increased costs as they license content; however, in the end, model capabilities will be the same and continue to increase.

The number of jobs that actually cannot be performed without humans will continue to dwindle — livelihoods will be lost at essentially the same scale despite upholding copyrights.

The only way we can handle a technology capable of reducing most need for human labor is by focusing on planning and executing a smooth transition toward an economy with more people than jobs — aiming for minimal human suffering during this process.

A mass loss of human jobs does not need to mean a mass loss of livelihood if our society is prepared to transition to a universal basic income. After all, human life is far more than just a job. We have the opportunity for much more fulfilling lives if we plan this transition well. We must understand that this is a far larger issue than copyright - copyright disputes are just one of the first symptoms of this disruptive process.


A human is still entering the prompt to generate the possibly copyrighted image/text. I don't think copyright law should care about the implementation. It's ok to copy a style if you use paint brushes or photo shop. But not ok if you use a statistic model?


Apply for a copyright on your human authored prompt then. That's the extent of human authorship.


> Or to put it simply: using copyrighted material to create a model would NOT be considered fair use.

The more I think about it, the more something along these lines seems like it might be the right way to think about it.

When you play a DVD, for example, you copy the bits off the DVD, into the memory of your DVD player, and onto your screen; this is all explicitly considered "fair use" copying. But if you then copied those fair-use bits off the screen onto a thousand other screens, that violates copyright.

When you, as the human watch the DVD, bits of it get copied into your brain; but you don't then copy the bits of your brain to millions of other people -- they each have to make their own copy.

We could make the law for LLMs follow a similar logic: That having an LLM watch a video or read a text is similar to having a DVD player read a DVD or a web browser copy information from a website. It's good for that limited use case, but the resulting copy cannot be copied again without a license.

This would allow (say) researchers, or even individuals, to do their own training and so on without a license; but when anyone wanted to create something that they wanted to scale up, they'd have to get licenses for everything.

That would fundamentally keep things balanced as they are now with creators and other creators. The big problem isn't that a handful of other creators may be copying their style; that growth in competition is limiting because of the expense of duplication. It's that millions of electronic engines can copy their style.


> When you, as the human watch the DVD, bits of it get copied into your brain; but you don't then copy the bits of your brain to millions of other people -- they each have to make their own copy.

If you ripped The Little Mermaid, redrew every frame to combine it with The Fresh Prince of Bell-Air and moved things around in scenes to make it look like Ariel is Will Smith responding to sit-com dialogue, then it'd be fair use, regardless of how many people you show this new version to.

Fair use isn't about how or why you're doing with something. The definitions for fair use are very clearly laid out at https://www.law.cornell.edu/uscode/text/17/107


> I don’t think it makes sense for both model builders and the model’s users to separately obtain licenses for the same works used in the training set.

I'm torn on who should pay, and where and when. In the world of patents, there's often an option/split. Say a chip manufacturer wants to build H265 decoding into their hardware. The chip manufacturer could buy the license. Or the purchaser (who probably is building some sort of board or device around the chip) could pay for the license. Or they could disable that functionality in the end product, and the consumer could pay for a license (or not, if they don't care about that feature).

The most common is usually the middle option: the end-device manufacturer (or brand that eventually sells the product) will pay for the license.

But I'm not sure if this works all that well for an AI model. With hardware, the license is usually paid per unit. It's easy to see that one chip = one license. If the model builder buys a license, that model could be used one time or 100 million times. Tracking use like that probably isn't all that practical, but I think it's safe to say that a 100-million-use model should probably pay more for a license than a single-use model.

So maybe the model builder should be responsible for attaching a comprehensive "copyright history" to the model, and users should have to pay for a license based on their use? Again, not sure how to track that. But I guess general software licensing has similar problems when you can "hide" usage.


Yes, someone using a model can’t know if the generated text/image/sound is a nearly identical copy of the original material they don’t recognize. If use of the output of these systems comes at significant legal risk then then such systems become nearly useless.


> if the generated text/image/sound is a nearly identical copy of the original material they don’t recognize

how does the industry today deal with artists that "copy" off some other works? This isn't a problem with AI at all - just that AI provides a tool to generate such works faster.


Someones comes to me to ask for a drawing of Batman or to write an erotic story around Supergirl. I can do it, but I cannot claim ownership over the characters. And I think I will quickly get a letter from DC or Marvel if I try to do this at scale.


> I can do it, but I cannot claim ownership over the characters.

of course not. But you can claim ownership if you don't call those characters their original names, and make sufficient changes to the design (how sufficient is determined by a court of law - thus expenses).

> DC or Marvel if I try to do this at scale.

The show 'invincible'[1] has a character that is a basic copy of superman. And yet, you will find that they don't get a letter from DC.

[1] https://en.wikipedia.org/wiki/Invincible_(TV_series)


> make sufficient changes to the design

I think that’s one of the issue. The transformation done by these tools are mechanical. Even if it may be extensive. The human input is too small. Omniman may have similarities with Superman, but he is not him in the larger context of the story. LLMs can not yet be that consistent for marketable output that deserves to be copyrightable.

I’m perfectly fine for LLMs to aid with spell checking and alternative phrasing (image is a grayer area). Bu the ideas of prompts and prompt output being copyrightable is something I oppose.


> The human input is too small.

That's a huge assumption, especially for image generation models.


Why shouldn't a prompt output be copyrightable?


Because prompts lack sufficient creative control.

Typing a search sting into Google doesn’t provide copyright over its output.


> lack sufficient creative control.

the prompts have become somewhat creative these days. If you have a look at the prompts on https://civitai.com for example, you can argue they are a form of creative expression. Just like hand rolling assembly code might be.

Edit: an example one - https://civitai.com/images/2268828?collectionId=107&period=A...

and the associated prompt:

  High detail, dynamic action pose, masterwork, professional, fantasy, neo classical fine art, of a beautiful, primordial and fierce, ((angel-winged-woman,:1.9)), archangel, (MiddleEastern:1.6), with very long, flowing, wavy white hair, peach colored streaks, with a sexy, slender, fit body, wearing an ethereal, light violet, light aqua, faded gold, tie-dye, linen and Chantily lace, (knee length:1.5), strapless dress with a tattered hem, a Platinum and gold Cuirass, platinum vambraces, platinum and lace Gladiator Boots,  long broadsword in a Baldric, at night, in a metropolis warzone, during a thunderstorm, dimly lit, thin, vibrant streaks of crimson light, outlining her body, fantasy illustration,  in the style of Osamu Tezuka, George Edward Hurrell, Albert Witzel, Hiromitsu Takeda, Clarence Bull, Gil Elvgren, Ruth Harriet Louise, Takaki, Milton Greene, Huang Guangjian, and Cecil Beaton,, High detail, dynamic action pose, masterwork, professional, fantasy, neo classical fine art, of a beautiful, primordial and fierce, ((angel-winged-woman,:1.9)), archangel, (Columbian:1.6), with very long, flowing, wavy white hair, peach colored streaks, with a sexy, slender, fit body, wearing an ethereal, light violet, light aqua, faded gold, tie-dye, linen and Chantily lace, (knee length:1.5), strapless dress with a tattered hem, a Platinum and gold Cuirass, platinum vambraces, platinum and lace Gladiator Boots,  long broadsword in a Baldric, at night, in a metropolis warzone, during a thunderstorm, dimly lit, thin, vibrant streaks of crimson light, outlining her body, fantasy illustration,  in the style of Osamu Tezuka, George Edward Hurrell, Albert Witzel, Hiromitsu Takeda, Clarence Bull, Gil Elvgren, Ruth Harriet Louise, Takaki, Milton Greene, Huang Guangjian, and Cecil Beaton,


That’s a perfect example, they said “during a thunderstorm” does that image look like it’s in a thunderstorm? Sure, the output of the prompt relates to what was said, but they influenced the output rather than controlled it.

Further, it’s well known that simply telling an artist what you want even including quite detailed descriptions isn’t enough to get copyright over the resulting image.


The difference is the artists assertion that it’s either original or a copy from something else. DALLE 2 can’t tell you if it’s original or not. These AI’s have no idea and the company or group that created them doesn’t review individual output so they can’t say either.


> DALLE 2 can’t tell you if it’s original or not

whoever pressed the button to run DALLE will make the assertion, just like whoever was running photoshop to make the image today would make the same assertion.


Based on what?

A photoshop user controls what data photoshop uses, a DALLE user doesn’t. Even a prompt as generic as “Cat” could be producing an obviously derivative work if you compare it to the original. This is true for all prompts.


> A photoshop user controls what data photoshop uses

the point was that the user of the program is making their declaration, whether it's photoshop or DALLE. How does the business verify that their staff artists aren't producing copyright infringing material, just from memory?

The liability falls to them to verify the copyright status of the output they're asked to make. A business paying a photoshop user to produce a picture has just as much (or as little) trust in them as the button presser for DALLE.


This gets complicated, having no reason to know that something is copyrighted is a defense.

So if your employee installed pirated 3rd party software you’re facing strict liability. However, if a third party is reproducing their collage roommates drawing from memory then it’s effectively impossible for you to verify if something is a derivative work.

Dalle is effectively Getty images, if you’re buying works from them you can only assume it’s free of copyright issues.


The generated content is a derivative work of each piece of the material the model was trained on. That material can be listed.


So your suggestion is to list 100’s of millions of works and have users manually review them? I don’t think that’s going to work.


Problem is, how can you determine if the model contains copyrighted material? The laws governs copyright through ownership, so in order to claim copyright infringement you have to be able pinpoint a specific person and prove that their work is somehow embedded in the gradients, which is not practically possible at the point. It's just like how you can't practically enforce copyright on encrypted data unless you ban encryption altogether.


1. If you know your copyrighted material was in the training dataset is that not sufficient?

2. From a legal perspective do you actually have to prove it's embedded in the gradients? If I draw an exact copy of Mickey Mouse from memory and sell it I didn't think Disney had to prove I've ever actually seen Mickey Mouse before or point to where the image of him is embedded in my brain.


Disney has a trademark on Mickey mouse, but that does not mean that they automatically get copyright on all pictures of Mickey Mouse drawn by others (they don't)


Bad example on my part in that case. I thought some art is copyrighted or am I mistaken? If so replace Mickey Mouse with something copyrighted


My opinion as a SWE who is dating a lawyer (joke, not a serious qualification but it does provide some insight):

Generative models traverse and interpolate high dimensional state spaces. These state spaces are created from input data.

I would argue people do the exact same thing - the first main difference is we can use novel inputs (e.g. we can use images or words to develop our music/temporal state spaces and vice versa). People also are recursive and self referential in a way that doesn't collapse.

Until we solve the interpretability problem (e.g. can you decode the feature space of a neural network into something we can comprehend) there is no good solution. Either traditional copyright wins and we get even more draconian policies (think Disney and their desire to never put anything in the public domain), or we have a free for all (which I don't think is bad for creative works, but certainly for more practical things like stock photos or nonfiction).


I can appreciate how this line of thinking might be attractive.

But IMO the human<>machine comparison doesn't lend itself much credence. We shouldn't assume that just because a human is allowed to do something, a machine is automatically allowed to do the same thing, too. I think some care should be taken when considering if we allow machines to have the same privileges as humans.


> We shouldn't assume that just because a human is allowed to do something, a machine is automatically allowed to do the same thing, too

There are no sentient machines (at least yet). Your position is one where you are actually limiting what other humans can do, limiting which tools can other humans have access to. Also, the parameter – according to the law – was always "the same". For instance, there is nothing preventing you from making your own chess league where computers are allowed to compete. FIDE is free to ban you from compete own their leagues or to ban anyone associate with your league or whatever, but there is nothing in the law preventing you.

I have been saying this from the day one: this whole debate it's mainly white-collar workers negatively impacted by automation making up any excuse they can to say why their job should be protected, somehow, for some reason, but not the one of coal miners or what have you.

A human downloads a photo to learn how to draw. Another human downloads a photo to teach their computer how to draw. No difference, no need to obtain any license in any of the cases.


> We shouldn't assume that just because a human is allowed to do something, a machine is automatically allowed to do the same thing, too.

Generally speaking, even one machine can do something, it doesn't automatically mean another machine is allowed to do that.

For example you can drive car with a normal driving license, but not a truck. In some states you can own a pistol but no automatic rifle.


It also depends on where this happening. For instance, you don't need a license to drive a car inside your own private propriety. You need a license to drive it on public streets because society needs some assurance that you know what you are doing. So in many cases the laws and restrictions also happen in relation to a given scenario.


copyright exists among other things to "promote the progress of science and useful arts".


That section is written in parallel verse, with copyright <> science, and patent <> useful arts. This sounds weird, now, but it's consistent with the use of the words at the time, which is the reverse of how they are used today, where paintings etc are considered art, and inventions are considered science. So, it's not that copyright exists to promote science and art (as we call them today) but only just the arts. Patents are for science. Authorship reflects copyright and invention reflects patent:

> Congress shall have the power... To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.”


A machine is just a tool. It is the creator and the user of the machine that has the privileges he uses the tool with. I think we should be careful not to anthropomorphize, attribute agency, responsibility and autonomy to something that is essential a better photoshop plugin.


I don’t think parent anthropomorphizing anything. The ones who anthropomorphize are saying that machines should be covered by fair use, because they have similarities with humans.

This is not about the rights of a machine but about how one human product is consumed by another human product. This is just a commercial supply chain: if you make a model, you need human data. You generally need to compensate your suppliers of “raw material”.


Its not the tool that is covered by fair use. It is the creation of the tool that is covered by fair use.

Is the tool itself supposed to be a copyright violation or is it a tool facilitating copyright violation by producing violating output?

The later is something that can be tested because we have processes to compare works of art for it. If it is shown that LLMs produce mostly infringing art then we can and should ban or heavily regulate them. If not then not.


> It is the creation of the tool that is covered by fair use.

Copyright doesn’t restrict creation of something, it restricts (mainly) commercial distribution. Research, education and journalism etc are largely unaffected, and would still be.

That said, I believe that selling access to the tool to the public already violates the copyright of the rights holders, even if it doesn’t produce similar works of art. The copyrighted works increased the value of the product (otherwise why would they use it?).

> The later is something that can be tested because we have processes to compare works of art for it.

This is the most expensive, least practical and most arbitrary part of existing copyright. It would be a huge mistake, imo, to expand this dramatically. This problem mostly goes away if the supply chain is sanely regulated.

All you’d need is give access to the training set upon audit, and bureaucrats could check for copyrighted works. There are already automated tools for this.


"That said, I believe that selling access to the tool to the public already violates the copyright of the rights holders, even if it doesn’t produce similar works of art. The copyrighted works increased the value of the product (otherwise why would they use it?)."

So it is similar to how ISPs argue that they should get a cut of streaming services because they enable another product.

I think it is also relevant that more than half of the globe will just completely ignore any regulation and any artist in a country with regulation will just have to compete with ever more empowered artists using all ai has to offer.


“It’s just a machine!”

So are you!


Don't be obtusely misanthropic


The value of copyright is going to vanish. There is enough public domain material to train models on and to avoid the problem altogether.

There used to be professions like tinkerers, bards, clowns. The tinkerers disappeared when the society became modern. The clowns on the other hand managed to lobby for laws that put people into jail for heinous crimes like copying pictures, and survived longer. They are going to bite the dust now.


What you describe would result in the opposite - copyright will be incredibly valuable in a system where the vast majority of "creative works" are just regurgitations of past works in the public domain, churned out by machines. In such a world, none of that has a copyright anyway. Actual creative works, which do garner copyright, will then be that much more valuable, because they will continue to be a property right with a breadth of coverage to make them useful.


Whether or not “humans do it” isn’t relevant. You can walk around with a copyrighted song in your head. That is not copyright infringement. But if you take that song, create a digital copy, and distribute it for money, then you are violating someone’s copyright. Additionally, our legal system requires a balance of probabilities. It’s hard to prove that someone was influenced by another work unless the similarities are plainly obvious. The same does not apply to ML models where the training data and algorithm are knowable facts.


I challenge you to listen to 4 chords of awesome and tell me again about how every song is completely original. How does eragon exist when it's definitely ripped parts from star wars, etc...ai usually doesn't spit out a full plagiarism, but a loosely inspired work which is what most media we consume is.

Edit: 4 chords of awesome link is https://youtube.com/watch?v=oOlDewpCfZQ&si=8vL6PbDnHiaffJh3


A copyright in just Eragon would be incredibly thin, for the exact reasons you state. This criticism of copyright by people that have no understanding of actual copyright law, how it works, how its used, etc, is so exhausting and ignorant.


“Every song is completely original” is the opposite of what I said.


The analogy doesn't hold when you consider the sheer scale of the problem.

I can outright buy a machine for a few thousand dollars that can crank out a faithful rewrite of every Stephen King novel without the shitty endings and nonsense plot points. It can do it in a few days, maybe a couple of weeks at most.

To do that with human labor would take years and cost hundreds of thousands, if not millions of dollars.

Instead of paying an artist a couple hundred for a commissioned drawing, I can just scrape up their entire portfolio and generate any image I want with their style. I can generate hundreds or thousands of images. I can take their distinct style and use it exclusively as the branding for my company.

What a ML model does is very fundamental not what happens when a human draws inspiration from prior art. A human would require an extremely significant amount of time and resources to perfectly imitate every artist they have ever seen. It takes a human significant time and resources to produce faithful variations on prior art.

A ML model is measured in words or images per second.


Hello.

Maintaining a system like Netflix or AWS or even Amazon will require insane amount of people and time, if possible at all within a finite time, without all the computers doing work for us in seconds that would take humans ages to do.


> ... a SWE who is dating a lawyer

> I would argue people do the exact same thing

Perhaps a ménage à trois with a neuroscientist would change your view on this.


> Until we solve the interpretability problem (e.g. can you decode the feature space of a neural network into something we can comprehend) there is no good solution.

This is the rub. Without reverse attribution... open source anonymous models become a free-for-all loophole.

Since that doesn't currently exist, I think the best we can do is to say that any commercial entity using a model bears the responsibility of proving the model they use is untainted by copyrighted material (to which they haven't secured rights).

Open source model X is... whatever it is.

But I'll be damned if OpenAI / Meta / Microsoft / IBM should be able to build a commercial product on top of laundered copyrighted material while ignoring provenance.

I mean, we have models for this: software code and art. Both aren't clearly attributable. In the case of software code, we've developed case law around clean room design and similarity. In the case of art, we value verifiable chain of custody.

Hopefully, something similar would tilt commercial funding of AI in the direction of responsible use.


My problem with this is that artists learn by studying other artists, cutting that off because it's AI rather than focusing on whether the resulting work is derivative, seems more of a problem to me. It seems to me that an AI can be used for either original work or derivatives, proving that you can get derivatives out of it has always struck me as no different than commissioning a copy of someone's work from a human artist and being shocked that you got what you asked for.


Can an AI express to you how van gogh affected it as an artist? I'm not sure that AI is "learning" the way we say humans are "learning," when humans learn and study art. Obviously there is no debate that you can input van gogh into a model and produce something van gogh-like as a result. But I've not seen anything that indicates that the AI is learning anything about van gogh at all. Perhaps it comes down to whether you think learning van gogh is just creating a mapping of all of his brush strokes ever, and only exactly what they look like. It's obvious the AI knows nothing more than that. If you think that's what humans do when they learn art, I'd be sad for you!

As to your hypothetical, we don't give copyrights to people who make rote copies of things, human or otherwise. Is the implication of the shock, that there is sufficient difference with the work as to render it a derivative and not a copy? Okay, how so? And of what consequence? Making derivatives of a copyright without license is infringement.


I think it's learning styles in a way that's at least partially analogous, because it comes out with things that are reasonably original and not in the training data.

I'm sure an LLM can write you an essay like that for any artist you want, but I'm not all that convinced those are meaningful even with humans.

> As to your hypothetical

That's the thing, it's not a hypothetical, it's a past story from here on HN. Someone did that, asking for copies of a famous painting (Girl with a Pearl Earring) and got highly derivative items out of the model and we had a debate over whether that even means anything, because that's both a simple description of the painting and the name of a famous work, so it makes it so it can be ambiguous whether you asked for "Girl with a Pearl Earring" or a girl with a pearl earring in the prompting.

I agree that it looks like copyright infringement whether it's done by a human or AI, though. I guess a lot of people missed the prior discussion on HN.


>I think it's learning styles in a way that's at least partially analogous, because it comes out with things that are reasonably original and not in the training data.

I don't think that is evidence that what it is doing is "learning".

>I'm sure an LLM can write you an essay like that for any artist you want, but I'm not all that convinced those are meaningful even with humans.

Well, it wouldn't be reflective of what the LLM thinks, so what is your point? If you are of the belief that humans don't have thoughts, I guess it's not a surprise you view things this way.

>That's the thing, it's not a hypothetical, it's a past story from here on HN. Someone did that, asking for copies of a famous painting (Girl with a Pearl Earring) and got highly derivative items out of the model and we had a debate over whether that even means anything, because that's both a simple description of the painting and the name of a famous work, so it makes it so it can be ambiguous whether you asked for "Girl with a Pearl Earring" or a girl with a pearl earring in the prompting.

You say derivative but without any reference to what it actually means... what about is derivative - that's the analysis that's happening in court. The analysis isn't "what you asked the LLM" because that's not dispositive to whether or not something is a copy.

>I agree that it looks like copyright infringement whether it's done by a human or AI, though. I guess a lot of people missed the prior discussion on HN.

Sorry I don't read every single thread about copyright on HN? This is the second posting I've seen on the RFC today. Give me a break!


> I don't think that is evidence that what it is doing is "learning".

When I say learning I mean something like "gaining new ability by studying how others did the same task, resulting in being able to produce novel output." I'm not quite sure what you are using the word to mean here, though I might agree that there are differences between what AIs do and what humans do, the question being what they are and whether they're important here.

I don't claim to know anything about the internal experience (if any) of an LLM writing such an essay and I can't really reason about that because I've never been an LLM, whereas I can at least relate to human experience. I think your assertion that it "wouldn't be reflective of what the LLM thinks" is a bit like saying that you don't think submarines are actually "swimming," as the saying goes, though. It may not "think" in human terms as we do, but it's certainly doing some kind of calculation that produces an equivalent output, so I have a lot of questions about whether we can say that on principle. We're well past passing the Turing test for a lot of things, either the original or censored form, these questions are getting less academic by the day.

> You say derivative but without any reference to what it actually means

We're talking about copyright law, so the meaning of derivative was borrowed from that, i.e. that AI model was producing works that could be reasonably thought to have infringed on the copyright of that painting when prompted for "a girl with a pearl earring" and this was held up to mean that AIs are just regurgitating training data and are therefore implicitly missing something essential to being an artist or what have you and all their work should be considered derivative works of the training data as far as copyright law is concerned.

Meanwhile, I'm saying that I think the AI should be judged about like a human artist would be to argue against the people who seem to want to say that the AI can't take input from copyrighted things without all of its output being tainted forever. We have no such requirement for humans and I don't see why it makes sense to add this new restriction on AIs specifically.

> Sorry I don't read every single thread about copyright on HN?

I'm not faulting you for not knowing, I'm faulting myself for assuming too much context and just trying to explain what I had in my head when writing that so you could understand how I came to think that. Hopefully this lets you see where I'm coming from.


>When I say learning I mean something like "gaining new ability by studying how others did the same task, resulting in being able to produce novel output." I'm not quite sure what you are using the word to mean here, though I might agree that there are differences between what AIs do and what humans do, the question being what they are and whether they're important here.

I think the dictionary definition is more than sufficient: "the acquisition of knowledge or skills through experience, study, or by being taught." This is what I mean by running with your own made up definition.

>I don't claim to know anything about the internal experience (if any) of an LLM writing such an essay and I can't really reason about that because I've never been an LLM, whereas I can at least relate to human experience. I think your assertion that it "wouldn't be reflective of what the LLM thinks" is a bit like saying that you don't think submarines are actually "swimming," as the saying goes, though. It may not "think" in human terms as we do, but it's certainly doing some kind of calculation that produces an equivalent output, so I have a lot of questions about whether we can say that on principle. We're well past passing the Turing test for a lot of things, either the original or censored form, these questions are getting less academic by the day.

You are the one redefining words like "think" and "experience" not me. I'm not playing that game at all. After all, you are the one that is equivocating these processes between humans and AI by coming up with your own, much more broad concoctions.

>We're talking about copyright law, so the meaning of derivative was borrowed from that, i.e. that AI model was producing works that could be reasonably thought to have infringed on the copyright of that painting when prompted for "a girl with a pearl earring" and this was held up to mean that AIs are just regurgitating training data and are therefore implicitly missing something essential to being an artist or what have you and all their work should be considered derivative works of the training data as far as copyright law is concerned.

I'm familiar with copyright law, I'm not sure you are. A work can be derivative in a number of ways, some are legal, some aren't. It's not a new thing that some uses by a machine can be infringing, and others, non-infringing. Why now must it be that machines should be analyzed the same as humans all of the sudden?

>Meanwhile, I'm saying that I think the AI should be judged about like a human artist would be to argue against the people who seem to want to say that the AI can't take input from copyrighted things without all of its output being tainted forever. We have no such requirement for humans and I don't see why it makes sense to add this new restriction on AIs specifically.

Yes, I understand that. But I asked why it should be judged as a human, and you are saying because it "learns". But that's only based upon your re-defining the concept of learning in order to make it inhuman. The only reasonable arguments I've seen that AI outputs should be copyrightable are based on them being a tool that an artist can use. What you are saying is just dressed up anthropomorphization.


> I think the dictionary definition is more than sufficient: "the acquisition of knowledge or skills through experience, study, or by being taught." This is what I mean by running with your own made up definition.

I mean, if a human looked at a bunch of art, essays, etc. and then was able to produce similar works, we'd normally consider that "learning." What word would you use for being able to reproduce Picasso (or whomever) by looking at a bunch of examples?

Also I don't think I have defined "think" or "experience" at all. But I'd point out that I don't see anything like a principled boundary around them or that we can point to something that humans do that AIs don't or can't do. It seems to fall back on something that looks like qualia or subjective internal experience and philosophy hasn't resolved that with respect to other humans... except by analogy. "I think the other humans are like me and I have subjective internal experience, so they probably have it to, rather than being p-zombies."

If you have a better answer to that, feel free to tell me, it'd be interesting.

> It's not a new thing that some uses by a machine can be infringing, and others, non-infringing. Why now must it be that machines should be analyzed the same as humans all of the sudden?

Sure, I'll agree that it's not even necessary to consider the works transformative or whatever.

FWIW, I don't think that AIs should be getting their own copyrights or anything like that, I'm just saying that the training data shouldn't forever taint the output no matter what's produced.


>I mean, if a human looked at a bunch of art, essays, etc. and then was able to produce similar works, we'd normally consider that "learning." What word would you use for being able to reproduce Picasso (or whomever) by looking at a bunch of examples?

Would we? What you described sounds a lot more like copying than learning. That's why I asked the question I originally did. Your whole perspective seems to be based on an ignorant and misanthropic view of the arts. That art students just go to school to look at things so they can then reproduce things that look like those things. It's a bit asinine and insulting.

>Also I don't think I have defined "think" or "experience" at all. But I'd point out that I don't see anything like a principled boundary around them or that we can point to something that humans do that AIs don't or can't do. It seems to fall back on something that looks like qualia or subjective internal experience and philosophy hasn't resolved that with respect to other humans... except by analogy. "I think the other humans are like me and I have subjective internal experience, so they probably have it to, rather than being p-zombies."

That's your burden to demonstrate as the person equivocating AI to humanity. You couldn't do it with "learning" without redefining learning, and you can't do it with "experience" or "think", without redefining those words either. Who is seriously advocating that LLMs are thinking and experiencing? I haven't seen anyone make those arguments.

>Sure, I'll agree that it's not even necessary to consider the works transformative or whatever.

That wasn't my point. A transformative analysis is one of the most fundamental elements of determining if something is a copy or not in copyright law. So I don't really have any idea what you are talking about with this one.

>FWIW, I don't think that AIs should be getting their own copyrights or anything like that, I'm just saying that the training data shouldn't forever taint the output no matter what's produced.

Yeah but your only argument for that is to redefine learning to pretend it's the same thing that humans are doing when that's clearly not the case.


> Yeah but your only argument for that is to redefine learning to pretend it's the same thing that humans are doing when that's clearly not the case.

What test can I do to differentiate them, then?

At first, you said they couldn't write an essay... but AIs can absolutely do that. The internal experience of even other people is unknowable and something we guess by analogy, so if you want me to agree you need some other actual test on measurable outputs to differentiate.

Otherwise this is all about qualia and there's no way to come to rational agreement.


You are being obtusely literal, as I did not ask you if they could write an essay. I asked you if they could express their feelings. There's no point in us conversing if you are going to respond this way, as it's disingenuous. I'd think you are capable of understanding the difference between the two. And I don't care if you agree with me or not, it's your burden to elevate AI to humanity, not mine, and you haven't done it here. Your perspective here seems to come from a life devoid of art and experience in things. For that, I'm sorry for you.


> I asked you if they could express their feelings.

And I asked how we can test whether someone has actual feelings or any other kind of conscious internal experience. If it's "obvious" then why is there no consensus on the whole https://en.wikipedia.org/wiki/Philosophical_zombie thing?

> There's no point in us conversing

I gave this conversation to an LLM to respond to.


I only said it was obvious that LLM's don't know anything about art past what you described, which you didn't dispute and was an obvious logicaly conclusion from your own explanation of what AI "learned".

>I gave this conversation to an LLM to respond to.

I'm not surprised, I repeatedly characterized your responses as obtuse, disingenuous, or ignorant. I'm not sure what you think you proved.


You can ask someone to produce a pin-up version of Minnie Mouse, but good luck using it in any commercial activities.

Most LLMs are just profiteering from people’s labor without their consent. And there’s nothing new being produced. It’s always a statistical output of previous works.


> You can ask someone to produce a pin-up version of Minnie Mouse, but good luck using it in any commercial activities.

The same would automatically apply to LLM output -- there's no need to change the current laws to cover that case.

The question is this. Suppose I ask a human artist and an LLM to create me a new female mouse cartoon character. And suppose both the artist and the LLM have been exposed to Minnie Mouse. It's not unlikely that the new character created in both cases will have aspects specifically similar to, or specifically opposite to Minnie Mouse.

In the case of the human artist, the new character will not be covered by Disney's copyright, unless there was a lot of copying. Why should the result be different for LLMs?

The logical conclusion of "any output of an LLM that's seen Minnie Mouse must be subject to Disney's copyright" is "any output of any human that's seen Minnie Mouse must be owned by Disney". Which I'm sure Disney would love, but would certainly make the world a worse place for everyone.


> a pin-up version of Minnie Mouse

that's not because of copyright, but because of trademark. If you make the minnie mouse sufficiently different that it cannot be mistaken for not being Minnie to the average person, and don't call it minnie mouse (to get rid of trademark), disney will have a much harder time suing you. Of course, they will still try, and steam roll you with just money instead.


> And there’s nothing new being produced. It’s always a statistical output of previous works.

I don't think you can define those terms such that what you say is true of AI but not true of people.


I think you're misunderstanding that, I don't expect it in either case, I'm saying you have to judge the output not the input. So even if it trained on a ton of copyrighted artwork, if the output isn't a ripoff of something in the training data, I don't think there should be any copyright issues.


Is intelligence really a factor here?

Say I use the same training set as one of these LLMs, copyright protected text and all, and use it to derive a compression algorithm that uses very little space to store tokens and token sequences that are common in that huge collection of text. The resulting compression scheme includes some sort of statistical artifact derived from that copyrighted text. Is that allowed? And if so why is an LLM different?


Very good question indeed.

A lot of these questions are somewhat ethical/moral in nature. E.g. is it okay to take someone else's creative work, process it through some algorithm, to create a service like ChatGPT? Or a compression algorithm? I don't know.

It's awesome to see the Copyright office request input from both sides of the argument.


It worries me that so much focus is on two sides that may not have the end-users' best interest much in mind. The companies building the models may have an incentive to regulate models to keep smaller players or open source projects away. Artists mostly seem totally anti any solutions as even laws that allow models trained on purely public domain art would be bad for them. If laws around this are shaped primarily by the wishes of those two groups I am not sure things will end up well at all for those of us that want the tools to keep improving and remain reasonably free (including applications you can install locally and run on your own GPU).


> is it okay to take someone else's creative work, process it through some algorithm, to create a service like ChatGPT? Or a compression algorithm?

and the test i use is: if they currently allow a human to perform this same task, then it is allowed to be done using an AI model.


LLMs are generative though not just compressive


Generation, prediction, and compression are all the same - the only different thing is the intent.


> is that an AI, being a statistical model and not generally intelligent, should not be allowed to disregard the copyright of its source material

None of what you are saying has anything to do with copyright.

The tool Photoshop isn't generally intelligent either. And yet, yes it can be used to create art using other people's stuff.

And it could be done legally if the results are transformative.


Photoshop doesn’t install with a massive directory of other people’s copyrighted works to draw snippets from.


Yes it does...


If it does, then Adobe would have commissioned or acquired the license. In either case they would have _paid_ someone to get those images.

It is very unlikely Adobe would be shipping their software with copyrighted material without paying for them first.


I personally have a really hard time finding any meaningful difference or distinction between "AI" and "lossy compression". Copyright and "lossy compression" are pretty easy to reason about. Model "building" is "compression". Model "use" is "decompression". Everything about these AI models seems to be about the "lossy" part, but "lossy" is just an adjective to the main show.

It's very difficult to not conclude that copyright of a trained model should be treated identically to the copyright of a zip file.


Information is not copyrighted, just the expression of said information.

So if you took a recipe book, extracted the recipe information, and listed out the recipe in a different format (such as a table), it's a new work. It does not violate the copyright of the recipe book you extracted the info from.


> I personally have a really hard time finding any meaningful difference or distinction between "AI" and "lossy compression".

If you feed a photo of your dog into a JPEG compressor and the result looked like a cat in the same style, I think you'd be pretty annoyed.


When you perform lossy compression, you feed it one file at a time, not every file in existence.


If you concatenate images into a stream container (say as tar) and then compress the stream, the compression coding will (generally) cross over the individual images. True, that's generally not lossy compression.

But concatenating images is also how you create video. Lossy video compression does typically cross over frames. So I don't actually see a difference. If you want to think about mkv or mp4 instead of zip it's still the same concept.

There's nothing stopping you from putting every available image into a video and figuring out how to compress it lossily.

Maybe there's some bounds for how much information was lost? Obviously piping everything into /dev/null destroys the input. And piping /dev/random from a true random source creates information. So somewhere between that and lossless compression there's the nebulous "plagarism" threshold. And then there's another threshold that is copyright infringement that's considered "fair use".

But the general structure of the "AI" this is about are fundamentally storage and retrieval.


What does any of this have to do with creating a new expression?


What makes anything new? Is anything created by "AI" actually new? How much entropy is in a prompt vs in the output?


>What makes anything new?

In copyright law? It's not being a copy


Some compression, yes, but the analogy oversimplifies. AI rerepresents input information in a transformative way (embedding, say) then creates new, derived and combined output from a new input (e.g prompt).

It's not just lossy compression. It's potentially novel.


Phrases like "transformative way" are meaningless woospeak to me. Everything is a transformation. Sulpose I run a linear convolution on ten images and average them. Is the result "new"? Does it not contain the original images? Subspaces and mappings don't create anything "new" any more than SVD does. This is just playing digital Ship of Thesius.


> Phrases like "transformative way" are meaningless woospeak to me

Fortunately we live in a society that supports specialization where something that is woospeak to a smart person can still be a very well understood topic. AI transformations are methodologically well documented, even if transparency of neural network node activations is yet to be fully formalized.


In that case, you'll surely be able to provide a citation that clearly distinguishes the differences between the ways of transformations performed by "AI" and the ways of transformations performed by compression.


Sure. AI (more specifically, ML) is curve fitting, and more generally, objective function optimization. https://en.m.wikipedia.org/wiki/Curve_fitting

A projection is not compression, necessarily. And you'll find AI is a very poor compressor when used for such a purpose in all but the most trivial setups (e.g SVD matching input data rank, only reversible functions in neural network activation, etc.).


Congratulations, you just discovered that copyright is a weak and ill-defined concept.


I think that unless you can clearly show that an "AI" is not a form of compression, the question of copyright is orthogonal. The copyrights that apply to a zip file may be ill-defined concepts to you, but it's not really important to the core question which is: how are model weights different from a zip file? If you put unambiguously copyrighted content into a zip file, most people would agree that the copyright applies to the zip file. So by analogy if you put copyrighted content into model weights, the copyright applies to the model weights. Issues such as what constitutes fair use comes up, but fair use is permissible copyright infringement, not absence of copyright. And that's where the question of how lossy a compression algorithm has to be to be considered "fair use". In all likelihood it's the specifics of the use itself (rather than technology or method details used) that matters.


It’s compression + filtering. Nothing generative. Its output is like 99.99 % deterministic.


Linear regression is 100% deterministic after training and isn't lossless compression, but rather a linear projection of along a manifold in a (potentially transformed) input space.

So, maybe not just compression+filtering, if level of deterministic behavior is to be the gauge.


Source?


Why is being a statistical model relevant?

The simplest statistical model is an average. Why would the average pixel rgba of a bunch of images invoke the copyright of those images?


The crux of the AI copyright argument sits in economics. Those currently producing content want future content generated from AI to benefit them financially, as long as a thin sliver of their own content was used in the training.

This is like asking all the student to pay their teachers a (small) percentage of their future economic output.


My opinion is we should treat AI like photoshop/word/windows. If you use windows to copy a file and distribute it, Microsoft isn't liable you are. If you use word to type up a book and sell it, you're responsible.

Same with a statistical model, if you general a copyrighted work and distribute it you are responsible. But the tool (GPT-4) maker isn't responsible just like Adobe isn't responsible for copyright infringement.

The copyrighted text/image isn't generated until you ask it to. Your prompt is what reproduces the material.


Why would any non-lunatic want to live in a world where someone can't import an image into software?

If only some software is disallowed, then why permit Excel but prohibit Stable Diffusion?

Can someone even look at a SD-generated image, and claim with certainty that their own art was used to train it? Any more than claiming that another artist was inspired by it, looking at their output?

I'm fine with anything goes. The alternative seems to be copyright maximalist clownworld.


> is that an AI, being a statistical model and not generally intelligent, should not be allowed to disregard the copyright of its source material

But then you are just shifting the problem forward by an inch. What happens when tomorrow someone declares that their model is generally intelligent and is therefore allowed to disregard copyright when training just like a person can?


This point is of the utmost importance from a public policymaking perspective. Laws such as these are easy to craft now and difficult to change later. I feel like we are previewing an unfolding disaster here.

The future will clearly yield a class of "beings" striving for some degree of indistinguishability from or coexistence with humans. Proposals that discriminate --literally discriminate -- without respect for the principles of universality and equal treatment under law are creating and condemning a marginalized group before it even reaches maturity. This is an old and tired theme repeated through history. Let's foresee this and not get it wrong.


Is it your experience that people's facial declarations cary the day in legal disputes? It's not mine. Rather, it seems like the whole thing is designed to provide scrutiny against bare facial declarations that something is true or false.

I see this on HN all the time "someone just has to claim" "someone just has to say". Yeah... that's not how it works. People can say whatever they want, that doesn't mean it satisfied their burden of proof. Self serving testimony is the lowest form of evidence imaginable.


Intelligence lacks any legal definition, for starters. And if a law like that will provide an arbitrary line in the sand, it will just disincentivize AI research in general.


Often, when laws are passed, they provide definitions for the terms in the law that require definitions. Regardless, I'm not aware of any proposals for copyright law where "intelligence" is used.


I agree completely. AI model trainers should have to pay the people who provide their training materials, and there should be a default assumption of opting out until someone or their company explicitly opts in.

Unfortunately the Peter thiels and all those bizarrely out of touch silicon valley assholes have already effectively scraped the Internet because ethics don't matter if you're special like them, so to a degree regulations are way behind the ball.

That said it's still worth doing, and I'd love to see it done retroactively as well. It's not as if "I forgot that I had a public Myspace 25 years ago" is an implicit user opt-in for some startup to save your data - however anonymized they claim it is (lol!) - and train its AI on it.


> The alternative seems to be “anything goes”.

Seems like a huge false dichotomy. You really can't imagine anything in between total shutdown of AI training on public data sources and no rules at all?

I think we should try a bit harder for a middle ground.


I think you are right. People argue if LLM's store or maybe generalize. I propose an experiment for anyone interested. Try and do this prompt multiple times and change the appropriate verse numbers:

> Provide quote from King James' Bible Genesis :25-31

or

> Provide quote from King James' Bible Genesis :1-25

or whatever you fancy.

I didn't go through the whole Bible, but I got pretty much a verbatim chapter. I argue that you can't do this with copyrighted books only because of guardrails and not chatgpt's lack of capability so the information is there, and it's verbatim. Plus other books don't have such nifty indexing.


Because the cat is out of the bag so to speak, any attempt to force ai companies to generate their own content to train on means we are signing up for a future where only multi billion dollar companies are in control.


If they were truly forced to do this, even they would find it difficult.


And everyone else would find it impossible.

Hence the headlong rush to implement regulatory capture.


Is there any precedent where copyright was focused on the input rather than the final published work?


Compilers


Object code is a derivative work I think.

So no. Compilers do not count.


The US had to update copyright law to explicitly protect binaries


That just means some judges got it wrong and congress really wanted to make sure others didn't. I'm not sure what proposition that stands for here, except that sometimes new things are hard to get right at first.


Remixes, generally?


This is more of a problem for images, where similar output to inputs is likely, than for LLMs, where no matter what you prompt it with I doubt you can get it to regurgitate any significant parts of Harry Potter well enough to be a classical copyright violation of any of the novels. Maybe you could generate a copyright violation of character traits.

The output space of images (MB for larger images) tends to be larger than books (a few hundred KB of text for a long novel), but the perceptual output space of books is much larger.

Any determination that licensing is required for AI generation, or use of AI-generated works, is unacceptable until Congress or courts put some reasonable objective tests in place to determine what is and isn't a copyright violation for various types of works of various lengths. Not the ambiguous 4-factor test that is basically whatever the judge feels like. It will be a complete mess otherwise. They can't just define a new AI policy for copyright with a few types of works in mind; it has to work for all works.

You could look at this mathematically from a complexity perspective and try to define a similarity function that's true when a second work is close enough to a first work to be a derived work (assuming the first one had been seen by the creator of the second). Unfortunately that won't work because nobody can define such a function to everyone's satisfaction, and the courts wouldn't accept any informal suggestion of a definition when it didn't come from Congress. Specifically, you'd get into trouble with consistency in the function determining derived works depending on length of the work: short works, like a haiku, are much more sensitive to copyright violation in some ways... a mere 17 syllables is a complete reproduction and therefore a copyright violation, yet a single word isn't; for a novel, reproducing 1/17 of the content is almost certainly a copyright violation, but reproducing 17 syllables probably isn't.

Different stakeholders and creative re-mixers would want different things from the function. It's untenable.


> This would, I think, require the AI’s creator to secure a license for all of its sources that allows this sort of transformation and presentation

That is a fairly illogical leap. From your text alone, “should not be allowed to disregard the copyright of its source material” would be: “the AI’s maintainer should have a fairly reliable (but not infallible) system to output how likely it generated something that is a direct derivative work of something in its dataset”. As a human you don’t need to attribute/license every piece of art you’ve seen of clouds if you draw a cloud. So if an AI draws a cloud that is actually derivative of the millions of clouds it has seen, then it doesn’t need any permission from the millions of creators to draw one either.


AI is taking work away from lawyers, and instantly creating more work for lawyers.

Ain't that interesting to reflect upon?

I speculate there is a hidden force in the universe, something physicists are yet to identify, which mandates: "they shall always have something to do".


The human brain is no different. It generates content from the things it learned.


Repost #4 I believe

https://news.ycombinator.com/item?id=37305580

"I'll keep saying it every time this comes up. I LOVE being told by techbros that a human painstaking studying one thing at a time, and not memorizing verbatin but rather taking away the core concept, is exactly the same type of "learning" that a model does when it takes in millions of things at once and can spit out copyrighted code verbatim."


I hope your opinion isn't shared by lawmakers. Copyright is a relic of the past, and it needs to be put out of its misery. Trying to (mis)apply copyright here would just lobotomize the US. Existing companies would just technically operate out of a saner jurisdiction, and we'd be handing other countries a golden opportunity to leapfrog the US.


"anything goes" is the best and most natural solution. Just don't let people copyright the output if they don't have full copyright on all of the inputs. This should finally get rid of the cancer that is copyright in a generation or two.


Generic reply to siblings here… I get the intelligence argument.

My _main_ point is that there’s a non-trivial question to answer here.

I’m not qualified to answer (though I’ve offered up my non-expert opinion). It certainly seems to quickly veer in to philosophy!


It shows you are not a lawyer. You misunderstand how copyright works. Creating copies or derivative works and distributing those is all that matters under copyright. This is not "disregarding" copyright (which is not an actual thing) but something that is either fair use or may require some kind of permission from the creators of the original by those distributing some kind of derived work or copy. That's why it's called copyright.

Copyright merely restricts the distribution of original works or their derivatives. In case of an infringement, copyright holders can insist you stop distribution and/or compensate them for that.

If I sell you a paint brush, I'm not liable for you putting a red nose on the mona lisa and trying to sell it off as an original work. Doing that on the original would be an act of vandalism (because you don't own it) and doing that on a replica that you got from somewhere infringes on the rights of those that created the replica. Which is a derived work or copy in itself of course and the distribution of that is regulated by copyright. Distribution of such a replica is of course fine because Da Vinci has been dead for a very long time and his work would no longer be protected under copyright. Distributing your red nosed mona lisa would therefore be fine too. Either way, the paint brush seller is no party in this case this is between you, Da Vinci, his descendants, and the replica creators.

Now your assertions as to what AIs are of aren't, are simply not relevant. You assert it's a statistics algorithms thingy. That sounds like a tool to me. Yet another paint brush. Using a paint brush is not infringing on anyone's rights. For that you have to distribute the results of your work. The nature of the tool does not matter. How you use the tool does not matter either. You merely create (potentially) derivative works with the tool and what you do with those matters. Especially when you distribute them to others. One of those derivative works is of course the AI model itself. Creating one is fine. Copyright gets potentially infringed when you distribute one.

Now we get to the core of the matter. Can you with a straight face say the AI model resembles the original and is a derivative work. It doesn't actually look like or resemble the original in any shape or form. Even proving the AI model is derived from the original is tricky. Copyright is not about protecting vague ideas or notions but the concrete shape or form of things. And it's only an infringement if you distribute a derived work or a copy of a thing to others. So, merely creating an AI model is not distributing anything to anyone. You are merely using tools to create something for yourself. An AI model in this case.

Distributing a verbatim copy of a book is an infringement. Citing the book in your own work is fair use (up to a point). Paraphrasing elements from the book, acknowledging it exists, taking inspiration of it, or reading it aren't copyright infringements.

The legal problem with AI models is that their concrete shape or form doesn't resemble the original inputs in any shape or form. Besides, companies like OpenAI don't actually distribute their AI models. They are huge; it's not very practical. They merely exploit those models to generate outputs to inputs from their users and customers. Are those outputs derivative works? Maybe, but that's where it gets tricky. They clearly aren't in the classical sense. Not even close. But if you somehow could conclude that they are, who is distributing that derivative work? Secondly, it the AI model is a tool, who actually creates those outputs and are those outputs protected under copyright? Who actually holds those rights? And how would you tell apart such an output from a human created one?

It's questions like this that make all this extremely murky from a legal point of view. IMHO without dramatic changes to copyright law or the way it has been commonly interpreted legally, it's just very poorly suited to do anything about stopping AI companies from doing what they are doing. You'd have to bend the conventional interpretation quite a bit for that. No doubt, there will be court cases where people will try to do that. But it will take many years before the dust settles on that. And I wouldn't get my hopes up on some unexpected/dramatic outcome.


This is generally, but I'm surprised you aren't aware that distribution isn't the only right protected by copyright - creating derivative works is protected, display rights are protected.


There are three copyright issues here; datasets, model weights, and model outputs.

Dataset copyright is pretty well defined and things can often be used under fair use. Fair use decisions are done with a four prong test and really decided by the courts on a case-by-case basis.

Model weights cannot currently be copyrighted. They are the output of a mechanical process over the dataset. However, software faced a similar situation where the source code could be copyrighted but the compiled binary was not. US copyright law was updated to address this. We may see something similar for model weights.

Model outputs are less clear, but these are likely copyrightable by the user of the model. It is not possible for a non-human to hold copyright, so the model cannot. It is very unlikely that the company producing the model could assert copyright over the outputs. A good analogy here is someone using photo manipulation software.

Super interesting area. I think we will eventually see an update to the copyright code to make weights copyrightable. Also it will be interesting to see how court challenges (code generation, image generation) affect datasets in the future.


Who do you think should hold the copyright on the model weights, the copyright holders of the individual works comprising the dataset or the ones who assembled the dataset?


I don't think that model weights are copyrightable.

They're a mathematical transformation of the source material.

As a trivial example, applying gray = .299 r + .587 g + .114 b to a pixel is also a mathematical transformation but it doesn't create a new copyright.

And thus, the model is a mathematical transformation and a derivative work of its source material.

However I am also of the opinion that the resulting model is sufficiently transformative and fills a different purpose than the source material and so the model is not infringing.

From the Perfect 10 case against Google and Amazon:

> …We conclude that the significantly transformative nature of Google's search engine, particularly in light of its public benefit, outweighs Google's superseding and commercial uses of the thumbnails in this case. … We are also mindful of the Supreme Court's direction that "the more transformative the new work, the less will be the significance of other factors, like commercialism, that may weigh against a finding of fair use."

That was dealing with thumbnails being used in search and it was found that thumbnails being used for search was sufficiently transformative and applied to a different purpose that Google's use didn't infringe.

I believe that a model is even more transformative given that bar.

That doesn't mean that the output of the model isn't infringing, but that's a human with agency creating and publishing that output which is a different artifact to be considered than the model weights.


Indeed. Inputs are likely fair use but if you output Mickey Mouse, Disney will definitely he on you. I think that's the most sensible approach, as anyone can use a tool like a pencil to draw anything they want, doesn't mean that they'll get away with creating a drawing of Mickey and saying they own the copyright.


Depends who you think holds the copyright on the SHA256 hash of an image.

Model weights are even less specific then that number, since they don't represent any specific source input at all.


Biggest issue is the first one. Dataset copyrights are where most of the fight b/w regulators, AI companies & artists is happening.

Should you be able to train your model on an image or text someone uploaded on the internet without buying the copyright? If no, then most of current LLMs & stable diffusion models will have to go & I don't see big tech allowing it.


Realistically an AI model is basically just a very complicated piece of software. The model weights are akin to the software code, the model outputs are akin to the outputs a user of the software creates, and the datasets are akin to the intellectual property put into the software by the developer to create the code.

In the same way that a developer could not simply steal someone elses intellectual property in order to develop a feature of a piece of software, one cannot simply steal the intellectual property to adjust the model weights. The main difference is its generally quite easy to see in practice if a model has utilized some intellectual property (because for example you can ask ChatGPT to recite the first 100 words of Harry Potter) compared to another piece of software where you'd need access to the source code or developers thoughts (which could only be achieved through litigation, in most circumstances).

I think a great many people come up with convoluted answers to this question because they are uncomfortable with the reality that these very large organizations have essentially stolen hoards of intellectual property, and now that the horse has bolted people want to justify not closing the barn door. It seems to me very simple: to train an AI model on data, you must respect its copyright. The model weights should be copywriteable by the developers of the model (even if the law currently does not allow this), and the outputs of the model should be copywriteable by the person who interacted with the model (software) to produce the outputs.

The analogy with Photoshop is extremely simple: If some other software invented Gaussian blurring and copywrited it, then Adobe would have to license that technology from them to include it as a feature in Photoshop. The actual photoshop software/code would be copywrited by Adobe, and if someone created an blurry image with Photoshop they can copywrite it.

I think people only disagree with this due to some sense that the process of translating data to model weights is "automatic" or "computational" in nature. You could in principle get a person to, by hand, go through millions of data sets and compute the changes to the model weights. This is no different to someone writing a piece of code, checking someone elses approach, and adjusting their own code after the fact. It just happens that we have developed very effective tooling to automate the adjusting of the code.


Three points I want to make:

- Models are nothing more than a statistical distillation of facts that can be traversed. They are not like software at all. Calling them software is like calling pachinko machines software. Nonsense.

- Models are mechanically derived with no element of human authorship or creativity. You could argue that there is creativity in selecting the dataset or the process that derives the model, but neither is relevant to the final generated model. Even if we assumed for the sake of argument that a model is more that a just statistical distillation, it should still not be considered copyrightable due to this reason alone.

- Don't use the word Steal when you refer to the well-defined act of infringement. Stealing implies deprivation of property which does not and cannot occur in this case. Using the word Infringe is more honest and less manipulative.


It's not stealing, and the term "intellectual property" should be put to rest:

https://www.gnu.org/philosophy/not-ipr.en.html

Your opinions on what should be copyrightable are wrong, and fortunately the courts agree.


You've rather conspicuously failed to mention a fourth, with two parts, and the first listed in the article abstract: "the use of copyrighted works to train AI models, the appropriate levels of transparency and disclosure with respect to the use of copyrighted works".

Outputs are the third mentioned: "the legal status of AI-generated outputs."


Much debate has been had about how existing copyright law applies to AI models. But once you get past that and start asking about how copyright should apply to AI models (as the copyright office is here) the answer in my mind becomes clear.

Copyright, as defined in the U.S. Constitution, exists "to promote the Progress of Science and useful Arts"[1]. I can think of no better modern example of "the Progress of Science and useful Arts" than AI models themselves. Therefore, it follows that:

1. Existing copyright laws should _not_ be applied in such a way as to make training these models any more difficult than it already is (as that would be in direct opposition to the stated goal)

2. AI models should be copyrightable by the person training the model (for the same reason any other software program is copyrightable)

3. Output of AI models should be copyrightable by the person running the model (for the same reason any other creative work is copyrightable) provided the output does not conflict with any preexisting copyright

For those who think training on copyrighted materials should be illegal, explain to me how that helps "promote the Progress of Science and useful Arts" and I'll re-consider my position.

[1]: https://en.wikipedia.org/wiki/Copyright_Clause


So i'm not sure how I feel, but to play Devil's advocate --

If I know anything I create is just going to be hoovered up and input into somebody's AI model so I do 99% of the work and they get 99% of the profit, perhaps I'm much less likely to progress Science and useful Arts by creating content in the first place.

I fear an internet of signup walls and TOC agreements for everything, just to prevent crawlers that feed AI from soaking it all up.


> If I know anything I create is just going to be hoovered up and input into somebody's AI model so I do 99% of the work and they get 99% of the profit, perhaps I'm much less likely to progress Science and useful Arts by creating content in the first place.

Perhaps you wouldn't but I, and apparently most of the scientific community who publish research, would (and do). The entitlement of people here feel towards their way of doing things is astounding.

It's a story as old as time: Those who try to resist or limit progress (by placing arbitrary restrictions) will be beaten by those who adapt.


I think you should spend some time thinking about the purpose of progress. Progress in and of itself is not useful, nor is it necessarily desirable.

It's very easy to envision a society that is both more advanced than ours and profoundly worse in all meaningful aspects.


More people around here need to hear this.


> If I know anything I create is just going to be hoovered up and input into somebody's AI model so I do 99% of the work and they get 99% of the profit

Could you give a more concrete example of how this could happen?

As-is, I don't see how the existence of an AI model trained on J. R. R. Tolkien's Lord of the Rings is going to result in Tolkien's works receiving 99% less profit.

Maybe you could argue AI models as a whole will devalue certain types of creative works (e.g. art commissions for designing logos), but they don't need to train on any one particular creative work to accomplish that, so unless you're saying we should just ban AI models entirely I'm not sure how copyright helps with that.

> I fear an internet of signup walls and TOC agreements for everything, just to prevent crawlers that feed AI from soaking it all up.

This is a fair point, particularly since it appears to be already happening to some extent. Though it seems to be largely social media companies and content aggregators trying to control access to information they don't hold the copyright to in the first place, not individual users trying to restrict access to works that they created. I'm not sure copyright would really help "promote the Progress of Science and useful Arts" there so much as "promote the wallets of large social media conglomerates".


> If I know anything I create is just going to be hoovered up and input into somebody's AI model

but today, without an AI model, anything you create is already going to be learnt and studied (if it is worth studying of course). What's the difference, but speed?

> they get 99% of the profit

Why is that a priori the assumption? What stops you from getting a profit?

> I do 99% of the work

you did 0.000001% of the work, since the model contains billions of other works from which they train.


> What stops you from getting a profit?

OpenAI and Stable Diffusion not paying for their dataset. I don’t believe GitHub asked for my contribution to Copilot.


But you weren't receiving profit from your works originally? So therefore, why does it matter what someone else was doing?


If someone jacks my car while I'm asleep, races it, and wins a prize, and returns it before I wake up, they haven't deprived me of anything, profited off my property, and it's still wrong.

Profit and deprivation are not and never will be a good tests in determining things like this.


> jacks my car

no, they downloaded your car.


> Copyright, as defined in the U.S. Constitution, exists "to promote the Progress of Science and useful Arts"[1]. I can think of no better modern example of "the Progress of Science and useful Arts" than AI models themselves.

This makes no sense. Before AI, it’s clear that copyright itself restricts what can be done, in order to promote overall health of innovation. You can’t just say this is cool so therefore allowed, then there would never have been any copyright in the first place. You have to argue that the overall result will be better given the rules you propose.

Now, I’m no fan of copyright, but it is abundantly clear that the tech companies are able to capitalize on new tech disproportionately. Thus, it’s a transfer of privilege from the very many designers, artists, musicians, authors etc to whomever will dominate AI. That’s not good, simply because of the centralization.

Moreover, you can’t just look at the short term gains of AI models that can be produced with existing content. You need to include the change in incentives, when creators financial prospects are even more minuscule than today. Even if AI is all that matters, they still need training data, and that needs to come from somewhere.


AI is the innovation. Trying to misapply copyright here would retard the progress of science and useful arts, not promote them, because it would wrongly restrict that innovation.

The answer to big tech corporations centralizing this is to fund it publically and make it available for free to everyone as a shared summation of our culture, not lobotomize ourselves just to profit a few old dinosaurs that are relying on an outdated idea of copyright.


> AI is the innovation.

Yes, but it does not live in a vacuum. It’s power is derived from human creations and their labor, for now.

When Spotify transformed the music industry, they could not simply claim innovation and be exempt. They had to negotiate with the dinosaurs and eventually it came through.

The inertia is a feature, not a bug. We’re still dealing with the fallout of the social media and ad-tech transformations. Unintended side effects takes a lot of time to understand.

> Trying to misapply copyright here would retard the progress of science and useful arts[…]

How so? All of academia would be entirely exempt, and so would the hackers and tinkerers, etc. You’d only violate copyright if you sell the models or the works they produced.

> The answer to big tech corporations centralizing this is to fund it publically and make it available for free

Yeah but that won’t happen. Even if it does, we need something that works in the meantime.


> You’d only violate copyright if you sell the models or the works they produced

That's not how copyright works. Academia would be able to claim fair use in expensive lawsuits. Hackers and tinkerers would be sued into submission exactly because it's a hobby that they won't risk jail for. People would be scared to work on anything related because of lawsuits threatening them with obscene amounts of money, and hence the retardation of science and the useful arts.

Other countries would leapfrog the US, and it would be left behind, all so that a few people can continue extracting rent with their government-granted monopolies.


Do you see #1 and #3 conflicting at all? Ex: you produce a model, run it, publish and copyright some output. I can then use that as training data for another model in the style of your existing model?


I see that more as a conflict between #1 and #2, but fair point. In extreme cases, you could probably make a crude copy of a model by training a new model solely on the outputs of the first one. Normally that would be a derivative work, but that's inconsistent with the idea that training on copyrighted works is always permissible.

Maybe one way to resolve this would be to say there ought to be some practical limits on what percentage of the training data can be from any one individual source. If I train an model solely on the text of one book, for example, such that it's so overfitted that it can do nothing but regurgitate passages from that book, it's probably fair to call that a derivative work. The same would apply to a model trained solely on output from another model. (Though if it merely incorporates a few examples from a bunch of different models that would be okay.)


> provided the output does not conflict with any preexisting copyright

I think this clause is doing all the work here and unfortunately in many cases there's no quick, automatic way to verify this.

Because of the legal costs involved in litigating who copied whom, I fear this would allow someone to sue the original creator of a work for infringing on the copyrighted output of an AI model trained on that work. If this seems far fetched, consider that this already happens with the DMCA and Creative Commons works: https://www.techdirt.com/2016/04/26/ifpi-files-dmca-takedown...


If you're going to bring up the origin of copyright you also need to consider the world copyright was made for. Back then, there was only one type of copying machine: a printing press. These were huge, expensive machines that could only practically be operated by corporations. Copyright was invented to protect authors from those corporations.

Your point 1 is what concerns me. AI models seem too much like the printing presses of old. They are available mainly to corporations and authors need to be protected from them. Otherwise there will be no incentive for anyone to publish anything novel as they know the corporation will slurp it up and make it "better" with their better model.


> 3. Output of AI models should be copyrightable by the person running the model

i would go further, and declare that this output is uncopyrightable.


Number 3 doesn't really make sense. I wouldn't get copyright if I told a human artist "draw a dog". Why would that change just because I'm telling an AI to do it?


I oversimplified that point a bit for brevity's sake, but I'd say I agree that some level of creative input from the human operator should be required in order for the work to be copyrightable. If I draw a black 100px*100px square in MS paint that's not copyrightable, nor should typing "dog" with a seed of "1" into Stable Diffusion be. But as soon as even the smallest level of creative input is involved, yeah it should be copyrightable.

Also, kinda beside the point, but:

> I wouldn't get copyright if I told a human artist "draw a dog".

If they were working for hire, yeah you absolutely would.


You can get very creative in your instructions to a human, but that creativity still won't get you any copyright.

And afaik, in a work for hire scenario, the artist just agrees to transfer their copyright automatically. If there's no copyright in the first place, there's nothing to transfer


Although I disagree and consider copyright an anti-social institution only necessary due to the anti-social capitalist mode of relations, I commend you for making an actually coherent argument on this question. It is the first coherent argument I've come across outside of the small Marxist circles I run in.


I'm going to try to plead my case for images generated using sophisticated prompt engineering to be copyrightable. For example, at the point that I've written a prompt with 20 tags, 10 negative prompt tags, some loras, custom weights, embeddings merges, and prompt editing, I'm now writing what is effectively a "program", which should be copyrightable and so should its outputs.

It's total BS to me that a book of midjourney generated images is itself copyrightable because a human arranged the book together, but that a highly sophisticated prompt involving custom tooling wouldn't be.

If nothing else, my comments should show the US Patent Office how deep the rabbit-hole goes with just how interpolatable everything is with everything else.


Copyright is, like all regulation, freedom restriction for the benefit of society. Producing movies and books takes a lot of investment, and people would not do it to the extent they do without copyright. If we think there is more than enough creative content, we should reduce copyright protections, and if we think there is not enough, we should increase it. Generative AI will move the needle very far in the direction of overabundance and should result in correspondingly reduced copyright protections.


The thing you're authoring is just your prompt. Apply for a copyright for it. The image or text generated in reply is generated by a computer with no more input from you than someone commissioning work using very precise words, and lacks human authorship.


This falls apart when you start getting into things like control net and posing, or iterative erasing, reprompting, and in/outfilling. It starts feeling more like some weird combination of 3d modeling, photoshop, and a really advanced autofill.


if you keep prompting an artist for edits over and over again for more changes, you don't suddenly own the copyright.

the lesson you should learn is that advanced ai autofill in photoshop may lack human authorship, but it wasn't good enough to become a serious copyright issue until those tools came into existence.


That's the same as saying if I open paint and draw with the mouse my mouse movements should be copyrightable but not the resulting image...


It's not the same. If you make exactly the same mouse movements in paint 100 times, you will get 100 identical images. If you enter the exact same midjourney prompt 100 times, you'll probably get 100 different resulting images. The relationship between your authorship and the final image in the two cases is quite different.


your mouse doesnt make decisions for you. ML based art does, which is why it lacks human authorship and you shouldn't be able to copyright it.

If you hand painted something in photoshop 100% you can copyright it. It has human authorship. If its mostly AI based fill, those elements can't be copyrighted. if Its 100% an ai result, its public domain.


The training model for Stable Diffusion has a lot of copyrighted images mixed together into an output which makes the plagiarism non-obvious, but let's reduce the set by 1 image. Shouldn't affect the output too much, right? Maybe some prompt will have a slightly different image.

Now let's reduce it by another image. Again, less options for what to display, fewer images to take pixels from, but still a lot of options, output may sound copyrightable.

Now lets do that N-1 times. What output will we get when the model was trained on a single image, let's say an image that is labeled 'dog'. If your prompt is "an image of a dog" you will get that image, the only image in the training set. When going from latent space to image space, taking pixels from that image in the output, despite it being done in convoluted ways, is that not an obvious copyright infringement? I think it is. There's a cloud of mumbo jumbo about latent space, but after the dust settles and it needs to generate pixels in the output image, Stable Diffusion has a step that is essentially copying pixels from the source image into the output. When there's only 1 image, it will reproduce large portions of that image, necessarily infringing on copyright.

So then adding back images one by one into the training set, each one being used as source for the pixels being copied, what makes that model OK? Just because the output is 50% image A and 50% image B, or 0.1% image A and 0.1% image B and 99.8% image C, doesn't suddenly make it OK.

Once there are millions of images, you end up with just tiny blobs of pixels being copied from many different images. That's still infringes on the copyright of all those images, because it's essentially a map-reduce process that maps pixels from copyrighted images and reduces them into a single image.


This viewpoint is about as coherent as "every image file is copyright infringing because every pixel in it exists somewhere in some other image somewhere".

Derivative works, when substantially changed, are not infringing. If I take an image of the Mona Lisa and rearrange all its pixels so it looks like a picture of a cat, that's not infringement.

If I sample lines and curves and colors and styles from several images and make something new, that's not infringement.

The actual problem with image models is that they can sometimes be coaxed into outputting images that are quite similar to an image they were trained on. That constitutes infringement.


> If I take

> If I sample

You're not a computer program and your viewpoint is about as valid as "cars don't need speed limits because most humans can't run faster than 10mph and that speed is safe".

Copyright laws where made with humans in mind.


If 1 in 100 humans could run up to 100mph you bet your ass there'd be laws against doing so around other people; it's a safety concern. Hell, even now running in most indoor or crowded areas is, if not illegal, at least considered bad behavior and may get you reprimanded or thrown out.

Some people claim to have a photographic memory. Supposing this is true, is it illegal for these people to look at copyrighted material because they may reproduce it later from the copy in their head? Of course not, it's the actual act of producing that copy that isn't allowed.

Of course, we're not talking about a computer program that stores a copy of an image and reproduces it later (that's called an "image encoder"), we're talking about is a statistical software that identifies common patterns in images and associations between those patterns and human language descriptions of the images containing them. It doesn't store or make a copy of the images it learns from, and it should only be able to reproduce images or elements of images that are overrepresented in its training data. Like any other software tool, if someone manages to use it to make an unauthorized copy of someone else's work, whether it was present in the training data or otherwise, then the user has infringed the other person's copyright. The only real argument you could make is that distribution of a trained model constitutes distribution of a tool aimed at assisting users in unlawful copying, but IMO that would apply more easily to wget than StableDiffusion.

Copyright laws were made to encourage and promote the creation and practice of useful arts. Applying them to stop the creation and adoption of a tool that would make humans far more efficient in the creation of art is backwards.


> is that not an obvious copyright infringement?

No, it is absolutely not.

Let's do the same hypothetical that you brought up and using other people's art, but instead are model just takes 1 single pixel from 1 million images.

Taking 1 single pixel from a million images, or the first letter from every book, and putting it into a new work is transformative fair use.

Transformative fair use is legal.

> Just because the output is 50% image A and 50% image B, or 0.1% image A and 0.1% image B and 99.8% image C, doesn't suddenly make it OK.

It quite literally does! Using .1% percent of an image is legal.

The amount of work that you take from someone else is one of the 4 factors of fair use.

Yes, the specific example you gave falls under what the courts literally use right now as one of the factors!


> Once there are millions of images, you end up with just tiny blobs of pixels being copied from many different images.

This is not how these neural nets work. They don't copy pixels from anywhere. They learn features.

The features represented internally are generally not easy to interpret to humans, but for sake of illustration, there could be an artificial neuron that fires when a subject should have blue eyes. Having a lot of blue eyes in the training data would help this neuron learn better when to fire (based on the values of other neurons, which may in turn represent other features). For example, it may learn to place more importance on an input that represents pale skin or Nordic origin.

It can learn concepts like cars have wheels, and wheels are round, etc. And then when you ask it to draw a car, it composes one from the concepts it learned. Some parts of the network will deal with the fine details that more directly influence pixels, but these aren't copying pixels from any image either. They're weighing a bunch of factors (eg is this pixel part of the iris and did the network decide to make a person with blue eyes?) and choosing pixel colors based on those factors.


Thank you for the explanation. Let me explain my position in similar terms.

I'm not replicating an image, I'm "using my brain to build a network of neurons that map electrical impulses from the optical nerve excited by wavelengths projected onto my retina in order to send other electrical signals to actuator tissues".

The complexity of the process is irrelevant imo. We can treat it as a black box and look at the inputs and outputs.

If the images in the database didn't exist, it wouldn't know what to draw, and those images are copyrighted.

Everyone's welcome to take a camera, run around the world and label every object for the neural net to learn, like a human does, but model authors didn't do that because using copyrighted images for free is much easier.


You're right that if you're replicating an existing copyright image, the process doesn't matter. Legally, if you lived in a cave your whole life and never saw any art and by amazing coincidence you just happened to paint and sell the exact same painting as some other artist, you'd be violating their copyright. Independent creation doesn't protect you.

On the other hand, under current copyright law, if Stable Diffusion generates an original image that doesn't look like a copy of any existing image, it's clear the new image doesn't violate any artist's copyright.

The debate is whether you can use copyright images/text to train an AI.

Stable Diffusion is of course trained on millions of photos of the real world, in addition to images made by artists. Of course, humans artists also see and digest both the real world and images by other artists and both influence their output. That's why you get trends like impressionism.


You are describing transformative use, which is permitted. Otherwise I could create a picture with every possible RGB pixel and then claim all other artists are infringing on 0.1% of my work.


How does this square with something like Cariou v Prince? http://www.artistrights.info/cariou-v-prince


It is impossible to 1:1 replicate the input as an output because the images are not stored. It isn't a database. It's basically aggregating summaries/abstractions/generalizations of a bunch of tags.

In other words, it is transformative by default.


Can it replicate the input 0.1:0.1?


I personally feel the mere fact that it was fair use of mostly copyrighted images it's fairly self evident that anything produced from it should not be copyrightable, UNLESS the origin of the art used to train the model is 100% owned by the "artist". This could either via licensing for that purpose or they own the copyright to, that right not extendable to corporations, as a corporation can't be an artist. It doesn't matter how complex the prompt or series of prompts are, the key here is that the "artist" either owns the training material or licensed it through the proper chain of licensors.


Thank you, it basically boils down to this.

I'm baffled at how someone in their right mind still argues as if using any other tool is free of copyright infringement of some kind.

It also boggles my mind how, in our line of work (which is often artistic in its own right), a lot of people make preconceptions on how art is made. Often reducing it to nothing but transformative generation. Such takes are deeply narcissistic, and downright wrong. At this point I'm led to believe they're AI generated.


Your prompt (the input) should be copyrightable. The output should not.


Okay but does that include the seed? The sum of the input model datasets? I think a better scenario is no copyright for any part of this process. Give humanity what they’re going to take anyway- open access to this tech.


No, you're passing inputs to a program, and your description completely omits the vast majority of that input: the creative output of an unknown number of other people, whose rights you are attempting to launder.


Thats nice and all. But it has nothing to do with whether something is copyrightable or not.

Instead, something is copyrightable based on the amount of human input into the process.

And it is very clear that there can be a lot of human input into AI image generation.

Even though I will concede that going into midjourney and just typing in "Hot anime girl" isn't a lot of input and likely doesn't deserve copyright protection.

But there can be so much more to AI art than the boring case of low effort mid journey prompts.


If the court is convinced that prompt engineering is "original and creative".[1]

Maybe it is. Or maybe it's more like tweaking the random seed.

[1]: https://www.copyright.gov/comp3/chap300/ch300-copyrightable-..., see "The Originality Requirement" and "Creativity".


No your human input product is the inputs you authored but you're applying for copyright for something else.


So then yes, it is about the human input into the process. Thats what I just said.

Having large amounts of human input is the thing that matters for this stuff, which is the case for many forms of AI art.

In the same way how photoshop uses a computer, and the computer creates the art, the resulting computer generate art can still have copyright protections. (because of the large amount of human input, even though yes it used a computer)


The product of your human work is the input, not the ai generated image. You are merely commissioning some system to do the work on which you have very little actual future authorship except the commission. For art we don't grant the copyright to someone commissioning work, they have to negotiate with the artist for that.

But your artist is not human, so cannot create copyrightable works, so you can't even bargain for the right. Your copyrightable prompt just created a public domain result.

As for your comparison with photoshop, you have it backwards. The lesson you should learn is that if you fill portions of a work using something that starts authoring parts of the image, you should lose the ability to copyright those parts of the image, because you didn't author them.

Just like other works that are a mix of public domain and copyrighted elements, you can only copyright the human authored work. It's like making a comic with AI pictures - the images themselves are public domain (assuming you haven't forgotten to license the use of those works in your computer system that generates the image for you), you assembling the work into a comic is what you own the copyright to.

the characters and images designed by a machine remain public domain, no matter if you prompted all of them.


> The lesson you should learn is that if you fill portions of a work using something that starts authoring parts of the image, you should lose the ability to copyright those parts of the image, because you didn't author them.

In your opinion then, using this same line of logic, Photoshop is not protected.

The courts disagree with you though.

Using your line of logic, you could say that the computer is authoring the work using Photoshop.

It is the computer printing out the picture using bits and bytes. That's not a human! That a computer program named photoshop!

Then follow the exact line of logic from there.

> characters and images designed by a machine remain public domain,

We know this to be false though, because a character created by Photoshop is done on a computer though.

Therefore, it is clear that yes a machine can be used in the process, unless you are going to claim that Photoshop is not protected because it is run on a computer.


I used words very carefully. It depends on the level of human authorship.

If you use a tool to correct some pixels thats directed by a human closely, no thats not a problem.

If you remove large parts of the image with the ai erase fill, then you've given up authorship of those parts of the image. You could then go in and author changes to the work that you could further add to your copyright. But you would never change the copyright status of the stuff authored by a machine.

You're using 'Photoshop' on a naive level without looking at the most important element - how much authorship is the human having. You can basically 'hand paint' a picture in photoshop, or you can use the ai tools and have almost no authorship.

Photoshop is a program not a method.


> I used words very carefully.

Then I am happy to use your word if that clarifies things.

Just replace everything that I said about "human input" with "human authorship".

And my point is that there are many things that a human can do using AI art that have large amounts of "human authorship" beyond just the boring case of prompting midjourney with a dumb prompt.

> It depends on the level of human authorship.

Oh hey! Yes that is exactly my point.

That point being that just like Photoshop images are copyrightable, because there is human authorship, so can AI art, if there is human authorship.

Glad you agree.

> thats directed by a human closely

Ok! You agree with me then! That's my point!

My point is that AI art can be directed by a human closely and that there is so much more than can be done than a simple prompt into mid journey.

You agree with my central point.

> If you remove large parts of the image with the ai erase fill, then you've given up authorship

Not if you "direct it closely"! Then it's protected.

> how much authorship is the human having

That's is exactly what I am talking about though, that I have said multiple times.

That there are lots of things that a human can do, related to AI art that are directed closely, and that these things are authorship.

But I am glad that you agree with me that if it is directed closely then it is protected, which was my point and that you can do this with AI.


Its clear from this post that you aren't at all here in good faith, just to be glib and purposely misconstrue other people's words. good luck with life.


You can "hand paint" in Photoshop, but you can also assemble collages composed of bits and pieces of other copyrighted works and create a result that is still copyrightable. Why is the latter currently legal in your opinion?


if the things you're collaging can't be copyrighted because they lack human authorship, you can only copyright the arrangement, not the things you're arranging.

If they have human authorship, its the original human artist's work you're collaging and now you're creating an unauthorized derivative work violating many copyrights.


Fortunately AI art can have human authorship if it is closely directed by a human.

> not the things you're arranging

You can if those things are closely directed by a human, using AI as a tool like photoshop is a tool.


If you do not license the "bits and pieces" then the courts will find you in violation of their copyrights. Pretending AI is different is bizarre out-of-touch "but I'm so special" solipsism.


I believe you are wrong there. Transformative art requires no licensing as long as it falls under fair use.

I can quote a book in my article without licensing the quote from the author. I can clip eyebrows off of copyrighted magazine portraits and assemble them into a eyebrow version of some famous art piece and never have to license a thing. I can take screenshots of copyrighted YouTube videos and assemble a "shirts of YouTubers" that I lasso tool'd and collaged together and create an entirely new copyrighted work without having to license a thing. I can take a photo of a street which contains an art gallery and copyrighted art can appear in my photo without having to license anything from the artist. Fair use would cover taking 1 out of 1 million pixels and assembling it into a new image if a human were to perform that action.


You've nailed the "creative" but not the "original" and you need both.


Sampling is an art form if properly attributed (and possibly even without; whole genres are built on the premise), especially if it elevates the original work.

That's not to say copyrighting ML derived creative works is the way forward, but that creativity has bearing wherever a 'medium' can be manipulated.


Transformative collage.


I have never understood the fair use argument when it comes to training data.

I publish a copyrighted article. Some LLM ingests it without permission, but since the output of that LLM is sufficiently different from my source article there is no violation.

I publish copyrighted code. Some company decides to consume it without purchasing a license. The product they distribute is vastly different from my code itself, but I can still sue them into oblivion.

What's the difference between the two?


> I publish copyrighted code. Some company decides to consume it without purchasing a license. The product they distribute is vastly different from my code itself, but I can still sue them into oblivion.

No you can't. If a company reads your copyrighted code, then writes up a spec and sends it to another team that writes up code that accomplishes what you did, that doesn't violate copyright and you wouldn't be able to sue them into oblivion.


> I publish copyrighted code. Some company decides to consume it without purchasing a license. The product they distribute is vastly different from my code itself, but I can still sue them into oblivion.

Isn't the analogy more: an employee at a company reads your copyrighted code, along with many other pieces of code, and produces a new piece of code? Your code influenced the output, but in no way can you a) detect that influence b) assert any copyright over the output.

In your analogy, your code is still "in tact" within the new product; that's not the case with LLM-produced output.


If the new code is almost identical to the original code, then it is very much subject to copyright, or at the very least is grounds for a lawsuit. And generetive AIs can and do generate output extremely similar to some of its inputs if given the right prompt.

My personal opinion is that training a model isn't infringing the copyright. But generating outputs can, if they are sufficiently similar. And since the model itself can't be liable for such infringing works, I think the creator of the model should be responsible.


It's more like if I look at thousands of articles and produced a big spreadsheet containing interesting facts about those articles, like word frequencies or what words tend to come after what other words. I never before heard anyone suggest that that kind of analysis would be copyright infringing. The new thing is that someone figured out how to organize dumb facts like that in a clever way and use it to create something sometimes useful, but without actually copying parts of the original articles since those were never saved as part of the data.


Well "consuming" the code means copying it word for word into your source tree. The compilation process does mangle it a bit but usually you can find identical function names and so on, and the courts have ruled that object code is equivalent to source code for purposes of copyright.

A different situation is you read a copyrighted news article and write the facts from it in your essay. Since the copyright of the news article only extends to creative expression and not factual information, there is no violation. For me, it is hard to tell what difference there is between this situation and an LLM identifying statistical patterns in the news article.


> A different situation is you read a copyrighted news article and write the facts from it in your essay.

This is fine for actual facts(with some caveats) but how should it apply for creative works like a novel ?


The same way it does now. Simply having a character resurrect in a story doesn't violate copyright of the Bible; Game of Thrones; Lion, Witch, and Wardrobe; Harry Potter; Lord of the Rings; etc. Writing a Story About Luke Skywalker as a ten-year-old kid exploring Tatooine, is likely to run into problems.


The copyrighted article is not part of the output (at least, not verbatim). The copyrighted code is part of the output.


You can actually prove that some company distributed a product with your code but you can't practically prove that LLM contains your source article.


The difference is that the software is in active use in your scenario. Consider if you took a copyrighted program and calculated the sha hash of it. You can then use that hash without needing to have a copy of the original program. The hash is also not infringing, because it's a simple fact.


The only clear solution is to abandon the notion of a copyright.

We have know for a long time that everything can be represented with numbers, even more so within the space of computers.

All we have done is invent a system to help us find numbers we find special.


Trying, though it's hard to get noticed. https://news.ycombinator.com/item?id=37346620 And I want to participate in the community here, not merely mention the thing I've built.


> though it's hard to get noticed.

I'd suggest to start with at least a brief paragraph on what thenose is, what it's goals are etc. I read your post, and found myself reading the technical workings of something I didn't know anything about.


Oh, thank you. Basically AI training datasets have been knocked offline recently by DMCAs, and the goal is to bring them back online in a place that can't be knocked offline. The most popular training dataset was The Pile, hosted by The Eye: https://pile.eleuther.ai/

Notice the links now 404. We tried to make a drop-in replacement for those links. All they have to do is change the-eye.eu to thenose.cc in the urls.

Unfortunately there's not a lot of ways to get their attention to let them know this exists now. I'll try emailing the contact address but I imagine they receive lots of spam, so I was hoping to try to get noticed by people like yourself first. Maybe a direct email is still the best way, but there's no guarantee they'll even be willing to change the urls due to legal risks. For all they know I could be logging the IP address of everyone who downloads it and forwarding it to authorities. But I'm not, and it's a frustrating problem to try to solve. I just want to help AI flourish.

This also serves as a template for someone else to do the same thing, so at least there can be multiple mirrors.

Thank you again. The fact that you even took the time to look it over meant a lot. If you have any other ideas, I'd be interested to hear.


I'm surprised that nobody has suggested that what's behind this RFC is Disney and other large studios lobbying to make it legal to copyright AI generated content so that they can move to AI generated movies and art. Right now you can't get a copyright on AI generated content.


I think the current precedent is that you can, as long as it's not wholly machine generated work. Whatever modicum of creativity you put in as an editor is what's got legal protection.


AI Jesus chat-bot could claim copyright over biblical content.

In theory, a company that owns Christian (c 2023) content could be filing DMCA claims every Sunday.

The silliness of digital-racketeers must end at some point. =)


I'm not sure what it's feelings are on the topic but Ask Jesus is a thing on twitch rn. NSFW (It's chat generated content on Twitch)

https://www.twitch.tv/ask_jesus

It takes chat prompts and weaves together scripture, The Internet, odd pronunciation, callbacks to other questions, and hilarious fails when chat gets, uh, playful.

AI Jesus just said there is a time and place for everything and that it might just be the time to buy Starfield. It knew it was a game being only prompted with, 'Jesus, should I buy Starfield?'


Ah, but can you buy Starfield with the change from within yourself.

Also, the buggy nature of Armored Core 6 could be fun too.

Happy computing =)


You've described a viable startup business plan.


It is what I enjoy doing, but someone has already launched a product. =)

text-with-jesus:

https://apps.apple.com/us/app/text-with-jesus/id6446922759


Nope, that's not the plan. The plan is to start an non-practicing entity that copywrites religious texts and sues others who use them.


Sounds like you are working with the other gentlemen's organization... you know the one that is likely... will "McKinsey & Company" be the secret behind these WIPO shenanigans too?

There was a 30% drop is chatGPT users in 30 days, so I think we are now past the "early majority" stage of the hype cycle. Perhaps NVIDIA GPUs will be a little less ridiculous next year too.

Have a glorious day =)


The entire discussion about AI and copyright strikes me as a bit naive.

Right now, we are in a situation where nobody quite knows what these AI models are useful for. We have some inkling that they might be extraordinarily useful for making money -- but not precisely how, not even the companies that are developing the models themselves.

Once they money starts, the debate over copyright will fall exactly into the economic seams between the major players involved:

- new tech orgs who are monetizing models will say that the model is "exactly as humans are": they see copyrighted works in training, and then produce wholly original outputs. And of course that the model weights themselves are, like the outputs of employees, completely owned by the company.

- incumbents who stand to lose out on the new gold rush will say that every single output of a model belongs to them if just a single image or sentence was seen in training. And that because of that, we really should just shut the whole thing down, because how could you ever prove that a model was not trained on copyrighted material?

The faultlines will entirely rest on who has more power, hard and soft. How much can they influence the legal system, either by spending $ to hire legal talent or by sheer soft politicking, balanced with how favorable they appear to the general public who uses their product (or consumes their media). I suspect that the end result of this debate is a "legal" way of doing things accessible only to the extremely large players, and a small, politically insignificant collection of individuals, hackers, and startups who aim to unseat those large players (or just flat-out train "illegal" models). The worst possible end result is that the legal system is just too fossilized to deal and tries something draconian like not allow datacenter-scale GPU compute.

As an aside, I predict a sizeable space for companies that do "compliance" -- asserting the copyright status of a dataset, perhaps even themselves using ML. That market will carve off and leave rotting a sizeable chunk of the new money's ML profits.

It's fun to talk about this, I guess. But remember that what you or I have to say about what a machine learning model philosophically is has no bearing what-ever when it comes to the actual ability for individuals, startups, or large players to use models.

I will predict though: enjoy Llama2 while it lasts. Like the internet, it will become fully assimilated into the larger intellectual property machine.


I wonder how they verify the personhood of the people making the comments. I can see this process being easily abused if the comments aren't taken by real people in person.

That said, I hope the US doesn't end up piling even more restrictions onto copyright. They'd only be shooting themselves in the foot. Copyright has completely failed to achieve the purpose for which it was intended to solve (only intended to give authors a short amount of time to profit off their efforts? look at it now). Perhaps it's time to rethink the concept of copyright as a whole before other countries beat the US to it.


I'd be interested to see the outcome of all this honestly, but I see parallels in how we exclude natural organisms from copyright and instead rely on patents to enforce ownership of unique genetics.


We as a society have a relatively healthy setup for people to create art and content. Sure there are problems, but on the whole it mostly works. What AI will do is destroy that by removing the profitability of creating that content.

Although generative AI operates on a similar principle to a human being exposed to a large number of artworks, it does so at a speed blindingly faster, enabling it to outcompete humans at many tasks. The number of such tasks will only increase in the future.

Thus, small-time content creators who make an independent living from content creation will be squeezed out and left in the dark. In some years, it will be very hard to make money from content creation at all.

The access to information and entertainment will also become more anonymous, will most people consuming things through AI generation. Of course, that will be convenient at first, but we will end up with a world where a significantly SMALLER fraction of people controlling AI supply us with everything. (Including manipulative advertising to consume more of their product.)

For every benefit that AI gives us, there are 10 losses.

I used to dislike draconian copyright laws, but now I like them. And I sincerely hope they are used against AI to make AI unprofitable. I believe further that AI will be society-disrupting in a variety of other ways and thus, as a society, we should destroy it. But I am pessimistic.


I don't want to see AI copyright, but that's more an extension of not wanting to see any copyright.

A lot of it comes down to creativity as a process, or as a product.

It feels like many people tend to view creativity-- especially capital-A Art-- as a process-- it should be arduous and require skill and be an exclusive club. I've seen people be very hostile to pre-AI digital technology because it lowers the bar-- with software, you have a canvas that you can keep erasing until you're happy, a straight-edge that's perfect every time, etc.

I tend to think the important part is the end product: was someone able to express their vision to the highest possible level of fidelity? In that regard, every new technology gets us closer-- the person who couldn't draw a circle can take a photo and doctor it in the GIMP until he's happy. The next frontier for this is to say "I can use AI to generate 200 permutations, and then take the ones I like best and further hack on them to get what I want."

This also makes me hostile to copyright, since anyone trying to claim work as their own, establish restrictive conditions, or "stop plagiarism" is denying others tools and resources that could be used to deliver their vision. Maybe the most effective way to realize someone's vision is to start with an existing piece of work and modify to fit. (programmers seem to understand this more innately than, say, photographers)


Yeah changes to copyright are the solution. Not unenforceable rules around AI.

Local LLMs are going to make enforcement impossible.

Unless corporations and government collude to lockdown the frame buffer, a user at home can build a corpus from their own gameplay footage and train an AI to make new environments for those entities (working on such a setup now, modeling AI vectors with an open source engine’s geometry primitives and shader language).

The output space for visual content is much smaller than language which has to understand context.

Visual outputs we find acceptable for forming spatial understanding are much more limited. There’s only so much nuance to geometry and color gradient to be layered on before a virtual space is an unintuitive psychedelic miasma.

People have favorites and will use them to extend them.

The Constitution offers an opening for copyright change. It protects works “for a limited time”. Anything encumbered by copyright for an average person entire life span was effectively copyrighted forever.


> I tend to think the important part is the end product: was someone able to express their vision to the highest possible level of fidelity? In that regard, every new technology gets us closer-- the person who couldn't draw a circle can take a photo and doctor it in the GIMP until he's happy. The next frontier for this is to say "I can use AI to generate 200 permutations, and then take the ones I like best and further hack on them to get what I want."

I don't think effort should be arbitrarily constrained to make the bar high, but I do not agree with you that the end product is important.

What is important is that the product comes from people because art helps society be healthier by existing as a means of communication from people to other people. Once AI begins to take over part of the process, it becomes like athletes who dope...it removes the essence of their achievement.

Art is more than just a single, isolated product for consumption. It exists to inspire, to change minds based on its ORIGINS, not just on its end results.

The problem with your argument is that you are ignoring the sociological effects of artistic creation, and focusing on the act of maximizing along a SINGLE variable: aesthetic value. That does art a disservice, and it means that the argument for or against AI must also be more subtle.

Besides, isn't the strategy of releasing AI just to see what happens a bit haphazard?


I guess I find it difficult to buy into the origins of art mattering too much because it often seems to turn into a game of academics arguing among each other over the interpretation of symbols. A particularly spiteful take on this would be that if your nuance is so delicate and obscure that a lay audience can't figure out your meaning, maybe it represents a failing in your ability to communicate.

I'm not sure where you got the "single variable" as "aesthetic value". Yes, it's possible that some people are trying to hit a specifically aesthetic goal, but others, their vision may be "the poster that finally convinces Aunt Frank to vote no on Proposition 23", or "the short story that expresses the feelings of loss I have for my beloved hamster", and the people trying to express these things were unable to create what they wanted from whole cloth.

At the end, it's still people making the decisions of what to release, whether directly, or by designing some sort of scoring mechanism to do it. It's similar to the curation choices in a museum collection-- they say something, even separate from the artifacts themselves.


>And I sincerely hope they are used against AI to make AI unprofitable.

No, they'll make AI unprofitable for small time creators but not massive corporations. The latter either already have rights to vast quantitates of training data or will hire a thousands in Africa to create training data that is just legally different enough to count.


That is why we should halt AI completely and do a more thorough analysis of its societal-level implications before blindingly putting it out there.

Because when new technology is introduced, it makes it almost impossible to stop using it due to the way our current society is setup (as a sensitive machine that is very quick to reward any gains in efficieny and economic output as opposed to sustainability).


Simply get every nation on earth to cooperate and ban a vaguely described technology that hundreds of billions of computers can run to varying degrees of efficiency! It's that easy!

If we can't get this level of cooperation for global warming, which is largely the result of a few dozen companies, what makes you think that governments across the world can stop everyone with access to a device with a reasonable amount of compute power? This idea is a non-starter and assumes that there is one single entity that could halt AI altogether. The genie is out of the bottle.


As I see it for country to disconnect from AI they'd need to go fully isolationist. Disconnect the internet fully from the rest of the world, block all mail, block all imports, disconnect all financial markets, etc. Otherwise they'd simply become a consumer of the AI output of other nations. For example, if AI can predict stock markets better by efficiently parsing financial documents then eventually foreign investors leveraging AI would dominate. That will work for a while but eventually AI will get cheap and efficient enough to be easily hidden. So now you need the government to police for AI, search people, track everything they do and so on. Criminals using AI will rise to power and prominence until stopped. Essentially prohibition or the war on drugs all over again.

edit: And of course to better understand and deal with the AI threat the government would be given exemptions to the laws. These exemptions would be used more and more widely by the government to exert power while the population is not allowed to even look into what is possible.


Define AI.


This isn't a math quiz. For practical purposes, we can start with a list of technologies that are clearly harmful. Generative AI like ChatGPT, AI image generators, AI text generators that write something based on a prompt, etc. all halted.


Llama and Stable Diffusion have proliferated already. Many countries view this as an arms race. Propose a plausible way to put the AI toothpaste back in the tube, because this stance seems profoundly impractical.


Until when? The only way to avoid societal impact of AI is to either stop it for decades until we reach some utopia UBI state or lobotomize current generative AI to a point where it's useless.


Personally, I believe generative AI should be lobotomized as you said. After pouring over all sorts of possibilities, I think no good will come of it.


Why would you want that? What is "clearly harmful" in your view?


Change scares people. This has been the case for every single technological invention including writing. Predicting the transformative impact of a technology is nearly impossible (or sci-fi writers would have a better hit ratio) but thinking about how it will break what we currently have (versus what will replace the current status-quo) is fairly easy. Accepting ones own limitations and inherent ignorance about the future is something many people very much do not want to accept.


What you’re describing requires an ability to copyright style on top of expression. That would be an unacceptable constraint on freedom of speech and artistic freedom in my view

The ability of an industry to turn a profit should not constrain the ability of the general public to communicate ideas. Expression is the only thing that should be copyrighted against


Even if we keep draconian copyright laws they need to be changed in some ways. Otherwise Disney will dominate by creating their own AIs and we still end up with a small faction controlling what we consume - and we will be paying them for the honor of it on top.


I agree. I was being somewhat facetious there. I don't really enjoy copyright laws the way they are, but I do believe that AI is the FAR greater evil.


Why would copyrightability of AI generated work affect its profitability? Owning a copyright gives you the exclusive right to copy and sell a work. If no one owns the copyright for a work, no one has the _exclusive_ right to copy and sell that work.


Well, then an AI generated hollywood movie could just be copied and distributed for free.


I'd assume it would only apply to the parts generated by AI, and not the whole movie. But of course you'd have to rely on the studios telling you which parts are generated, if they tell you at all.


Small time creators don't make money anyway. Starving artists and all that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: