Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I consider this a smoking gun for Midjourney's flagrant copyright infringement (twitter.com/rahll)
48 points by kranke155 on Dec 28, 2023 | hide | past | favorite | 66 comments


Just some more detail about this.

Reid Southen has been testing out Midjourney V6. He quickly found out that you could easily generate images of Thanos that are direct copies of images from the Avengers films. https://twitter.com/Rahll/status/1738286342390374882

In response to his and more discoveries like his, Midjourney updated their ToS to place the blame on users for infringing copyright. https://twitter.com/Rahll/status/1739155446726791470

Reid Southen had all his content erased without permission for his grievous transgression of showing just how easily it is to make copyright infringing material using Midjourney V6. This is after they update their ToS to persecute him and people like him, who are posting this kind of imagery on X. https://twitter.com/Rahll/status/1738408490543054949

He (and others) moved on and made more examples, using the film Joker: https://twitter.com/Rahll/status/1739331983992406508 And the film Dune: https://twitter.com/Rahll/status/1739003201221718466 And Star Wars Rogue One: https://twitter.com/KatieConradKS/status/1739765791996711034

A new account has also showed up that's collecting examples of this called Ai Piracy https://twitter.com/ai_piracy

Ed Newton Rex, a former Stability AI employee, has chimed in here https://twitter.com/ednewtonrex/status/1738141349172453785



double whammy you posting nitter which is stealing content from x/twitter


x/twitter doesn't own that content.

It's arguably stealing from the people who posted on twitter (because twitter got permission and nitter didn't) but most people who post on x/twitter want their content to spread as much as possible, and don't benefit from twitter advertising anyway.


that was why I said it was ironic


Perhaps worth noting: That picture comes up in a Google search on numerous websites. It was apparently one of the images distributed to generate publicity for the film.

This prompt is very specific: it references not only a film, but also two actors with unique names. It's not really a surprise that it generates an image out of its training data - what else is it going to do? In this kind of situation, I think Midjourney should just give a reference, something like "Due to an overly specific prompt, this image is very similar to XXX".

As I have posted elsewhere, I see no copyright problem using such images for training. If an image is online, it is going to be looked at, maybe by people, maybe by automated systems. If you don't want your image looked at, don't put it online. The only problem here is the lack of a reference.


> If you don't want your image looked at, don't put it online. The only problem here is the lack of a reference.

This is an incredibly toxic take. If I as an artist, want to share something I made with the world, but don't want others to take something I care about and claim it as something that they created, so they using their popularity can sell my work. My only option is to not share it with anyone?

It's the artists fault that midjourney is duplicating their work! They should never have created anything if they didn't want some one else to do <bad thing>!

I'm of the strong opinion that all information should be free (secrets excluded). I also agree with you that there's nothing wrong with using public information for training data, but the only problem isn't lack of attribution. And victim blaming isn't the answer to the question.


> If I as an artist, want to share something I made with the world, but don't want others to take something I care about and claim it as something that they created, so they using their popularity can sell my work. My only option is to not share it with anyone?

Don’t share it publicly with no viewing restrictions, yes. If you want to limit who can view it, then limit who can view it.

It’s like driving around with a banner on your car and then complaining because a certain subset of the population looked at it.

There are many ways to restrict access to content. The simplest is to create a login and TOS that restricts usage. Just do that and it won’t be used for any training.

But putting it on a server with anonymous http access means that the whole world (including robots) can see it.

It’s not victim blaming, it’s just reason. If you don’t want people to see something, don’t show it to them.


The issue isn't models viewing it, it is reproducing and selling it, Imo.

>Don’t share it publicly with no viewing restrictions, yes.

They did share it with legal restrictions (copyright) and everyone is already bound by the TOS (law) so there is no need to sign. Viewing is fine, resale is not.

Do you think there is something special about the internet that changes this? Dont let anyone read read copies of your book if you wont want them to sell copies of it? Dont let people listen to your music if you dont want them to be able to resell it?


Training doesn’t violate copyright (although this may change in time).

And TOS doesn’t apply to people who don’t agree. Court cases have ruled that you can’t prevent people scraping with a TOS as the data are publicly available and don’t require acceptance to access.

If someone buys my book, I can’t limit how they read it, or who they resell it to. If I give my book away for free, I can’t limit either.

It’s the same with giving out access to my web content.

The restriction point is selling. If I don’t want someone to read my book, I don’t sell it to them. If I don’t want someone to view my web content, I restrict their access and don’t let them view it.


The model is a tool, like photoshop. You as the author of the prompt produced the image by explicit intent. If you then take it and distribute it, you are violating the copyright with criminal intent that’s demonstrable by the prompt itself. According to some friends that are professors of law, that’s likely the most clear violation. The consumption of copyright material to produce a commercial product without license is likely a civil copyright violation, but it’s unclear that’s true. Browser caches for instance were litigated as copyright violations, and CDNs. Obviously they were not found to violate copyright because while they handle and copy and manipulate copyright material they do so incidentally, without intention, and it functions without the copyright material. A lot of the argument on the AI side, which is likely legitimate legally, is these images are incidentally ingested, the models function fine without the copyright material, and don’t produce the copyright material without explicit intent by the user. The case that are the strongest are where the training data is known to have pirated or large copyright libraries inside them and are used regardless.

Finally by posting publicly samples of your work you implicitly grant limited fair use rights to your work. You do not retain full rights prohibiting reproduction of those samples. That’s why artists usually will not post full portfolios. Obviously there are limits but they’re much more egregious than what you see in models where the imagery is consumed into a lossy model that mathematically aggregates it with every other image it ingests.


I think we basically agree. It seems to me that a lot of the confusion stems from the connotation of “training" on copyrighted materials, and if that connotation is commercial or not.

A individual reading and learning from a book on woodworking isnt commercial use. However, if a business includes the same book as part of literal corporate training program, that would be commercial use.


How is ingesting an image, laundering it via a prompt, and then spitting it back out without attribution fair use?


One major test used by the Supreme Court is if the work is an economic competitor to the original. Similarly, infringement depends on the use, not object.

An image coming out of mid journey may or may not be used in an infringing way.

I can download movie stills from Google or midjourney without breaking copyright. If I print and sell posters of them, that is different.


If that were the primary intended use then it wouldn’t be. However the product incidentally has that behavior if you specifically choose to use it in that way, which is not the intended use or the common use.


> If I as an artist, want to share something I made with the world, but don't want others to take something I care about and claim it as something that they created, so they using their popularity can sell my work.

Nitpick: Claiming authorship over an output from an AI model is as infringing as falsely claiming authorship over a work produced by a human (or group of humans), but fraudulent claims of authorship have only minimal relation to the question of whether the model itself infringes on copyright.


If you put your image on a web server that serves it to anybody then it's public information as far as my morals are concerned. I don't see what's wrong with training on it.


Generally speaking, I agree. But it's often more nuanced than that. For example, what if I upload my photo to Facebook such that only my friends can see it, but then one of my friends posts it to a publicly accessible URL?

The argument that "if it's accessible to me, then I have the moral right to read it" assumes that the person who made it accessible to you had the moral right to publish it. It also implies you can bypass moral responsibility by deputizing your moral transgressions to someone less scrupulous.


Is there anything besides 'training' you think would be wrong?

What about resale?


> If you don't want your image looked at, don't put it online.

Looking at things doesn't produce nearly identical copies, so you may be glossing over some important bits.


The keyword is nearly. It's still off enough that a talented artist could paint it from memory after looking at it.


An artist producing it from memory, and then distributing the copy without permission, is clearly a copyright violation.


That's a separate issue. The act of producing a new image is a very different process than looking at one, and it comes with an exciting new set of legal consequences.


With that in mind, isn’t the legal burden on the model output rather than the model weights and training?

An artist has the capability to reproduce something copyrighted, but that’s generally only a problem if they try to sell that reproduction.

In that regard, Midjourney probably is in violation of copyright because they profited from making a close reproduction of an existing work. But a solo person running Stable Diffusion locally probably isn’t, even if they made the same output.


The process of looking at an image online produces many identical and nearly identical copies of it. The DMCA just declared that those copies don't count for the purposes of copyright law, because it would not otherwise be feasible for any copyrighted material to live on the Internet. If you think LLMs are valuable and can't avoid this kind of thing fully, it seems reasonable to argue they do or should fall under a similar exemption.


> If an image is online, it is going to be looked at, maybe by people, maybe by automated systems.

You cannot use a copyrighted image to produce derivatives without obtaining authorisation from the copyright owner of the original work. Even if you are a human, copying the random image from the internet into photoshop and start modifying it could lead you into trouble (it is usually not enforced though, because it is hard to find. But AI systems would simplify scanning for such violations).

Your understanding of copyright and actual copyright are two different things. Or at least you cannot share the produced image - which Midjourney does here.


> You cannot use a copyrighted image to produce derivatives without obtaining authorisation from the copyright owner of the original work.

You can produce derivatives as long as the derivative doesn't contain anything substantially similar to the expression in the original work [1]:

> To win a claim of copyright infringement in civil or criminal court, a plaintiff must show he or she owns a valid copyright, the defendant actually copied the work, and the level of copying amounts to misappropriation.

...

> In the first context, it refers to that level of similarity sufficient to prove that copying has occurred, once access has been demonstrated.

No similarity means no copying means no infringement. Simply using someone else's work is not infringement.

An AI model can produce outputs similar to the original works, but that's almost always if not always because the prompt contains a reference to an author, an existing work, or a characteristic/style strongly associated with an author or an existing work. So there is infringement, but the most of the liability should belong to the person who chose the prompt, assuming the model creator responds to notifications of such situations from the relevant copyright holders by preventing the model from responding to prompts. Think DMCA.

[1] https://en.wikipedia.org/wiki/Substantial_similarity#Substan...


> copying the random image from the internet into photoshop and start modifying it could lead you into trouble

That's only if you intend to sell the resulting image, right? If you just post it online for free then it should be fine


No. Non-commercial uses are not automatically fair use. Commercial uses are not automatically infringing. And the purpose of the use is only one of the four prongs of fair use.


The user may be asking for it, but that doesn't matter. Every piracy website is producing pirated content at user request. They're still breaking the law.

They basically built a piracy website by accident.


> If you don't want your image looked at, don't put it online.

Same with writing. The inability to effectively prevent stuff being used to train AI is a serious problem that is why I (and at least several other people I know of) have removed their content from the open web and likely won't put new content on until/unless there is some effective way to control bot access.

I find it incredibly sad that this issue is making people stop using the web in this way.


> It was apparently one of the images distributed to generate publicity for the film.

And we assume it didn't understand this context? If it did it arguably did well not creating anything new.

I cant imagine the copyright owner being very upset that, when asked about their movie, it shows a promotional image.


How would you implement your solution technically though? There isn't a reference set of images in the model, it's just weights.


and data on disk is just ones and zeros.


> If you don't want your image looked at, don't put it online

Classic victim blaming. Absolutely gross...


Its perhaps one of many smoking guns that the argument “training does not include the ability to recognizably reproduce individual source images” is false, but that's neither a necessary nor sufficient basis to conclude that MJ, as such, is a violation of copyright. (Nor is it either necessary nor sufficient to the lesser argument that MJ can be used for copyright infringement.)

But it might rebut one of a large number of arguments that have been made against the MJ-itself-is-infringing position, so, great, I guess?


Midjourney is just a tool. It looks to me like the author is the one using the tool for copyright infringement.


> Midjourney is just a tool

I’ve seen this framing and don’t agree with it.

What does it mean for something to be “just a tool”? I think most people are trying to indicate that the tool itself is not of consequence, and that the issue is all with users who misuse it.

But by reducing generative AI to a low resolution category like “tool”, the way this tool works - arguably the subject of consequence - gets obscured.

It seems to me like these are “just tools” the way a highly sophisticated manufacturing facility capable of producing products is “just a tool”.

We’re not talking about paint brushes and screw drivers here.


Yes but it is "just a tool". You can choose to generate copyrighted images all day long in private and it doesn't take away from the profits of the company who owns the copyright. You can even do this in public and not make money off of it and that would still be true.


> Yes but it is "just a tool"

In your mind, what distinguishes something that is "just a tool" from something that is "more than just a tool"?

To me, this framing is just not useful. It doesn't tell me anything about the implications of the existence of the tool. If I described an ICBM as "just a weapon" in a discussion about weapons that also includes bows and swords, I don't think it fits in the same category. To me, "just a tool" is fundamentally reductionist. It's anti-curious. It papers over the interesting parts of what's going on and leaves us with nothing to assess the situation. It forces some kind of artificial equivalence between things that are entirely unlike each other.


I think it is an important and insightful distinction, if you are curious to engage.

There are several implications of being a tool that can be explored, if one is interested. A printer manufacturer isnt liable if I use it to print and sell copies of Harry Potter, or some other copyrighted material. Just because I own the printer, doesnt mean I own unrestricted rights to anything that comes out of it.

Another interesting question is what commercial use of midjourney looks like if you cant trust it to produce content unencumbered by copyright? do you have to run everything it produces through a similarity engine? what are the appropriate thresholds that have been established for similarity?

The Orange Prince supreme court case was a fascinating exploration of the topic [1]. The SC case is quite an approachable read or listen and even humorous at times. [2]

1) https://en.wikipedia.org/wiki/Andy_Warhol_Foundation_for_the...

https://www.oyez.org/cases/2022/21-869


> I think it is an important and insightful distinction, if you are curious to engage.

Can you clarify which thing is important/insightful? I want to make sure I’m understanding your comment.

I should also clarify that I do think AI is a tool. And I do think that the questions raised about other tools are interesting to ask relative to AI.

My main issue is with the “just a” framing, which usually comes with a strong implication that this tool status means AI should be treated no differently than a paintbrush.

> Another interesting question is what commercial use of midjourney looks like if you cant trust it to produce content unencumbered by copyright?

This is one of the key things that establishes a new category of concerns that is not present in other common tools. e.g. if I use a paintbrush to make counterfeit artwork, it is clear that the act of creating the specific output was deliberate (it would be hard to argue that I didn’t know what I was doing over 40 hours of detailed painting). I can’t be sure what will or won’t come of a generative image model.


>Can you clarify which thing is important/insightful? I want to make sure I’m understanding your comment.

I pointed out several thoughts it led me to in the post. Another is the tool framing Delinates infringement occurring during the manufacture of the tool, and the use of the tool. It also leads to other interesting points like your last one about the paintbrush. I have never heard of intent being used as a criteria of counterfeit. I don't think this is a new class however. I can use Google images and a printer to quickly produce infringing material. What conclusions can be drawn from viewing generative art models through this lens


The tool doesn’t do anything by itself. The operator of the tool tells it what to do. You could do the same thing with photoshop, or with paint if you have enough skill.


> The operator of the tool tells it what to do

The fact that you can now tell a tool what to do illustrates my point. You can’t “tell” your paintbrush what to do, and what your paintbrush does is wholly based on your personal ability. If you use your paintbrush to create counterfeit currency, there is no question as to whether or not you deeply participated in the creation of the output.

Prior to the introduction of generative AI features in photoshop, you couldn’t “tell” photoshop what to do either.

There is myriad of deep and fundamental differences between manipulating these earlier generations of tools directly and verbally instructing a black box to execute its algorithms.

I think this difference is similar to the difference that would exist if we could build Star Trek style replicators and if we were comparing the act of preparing a meal from scratch vs. asking the replicator to print a meal.

Is the replicator a tool? Yes, but a tool for which the mental models associated with a kitchen full of cooking and baking implements do not apply.


Photoshop does not require that the Mona Lisa be included as a foundational base layer for its paintbrush to brush digital paint. AI does. That is the difference.


Whoever's using Photoshop to paint, however, does need to have seen existing artwork. But the point you raised demonstrates that comparing an ML model to Photoshop is an inappropriate comparison. The relevant comparison is between a person using an ML model and a person using Photoshop. A person using an ML model doesn't need to have seen existing works to ask for a generic output, but does need to have seen existing works to ask for an output similar to an existing work (outside of innocent name coincidences in the prompt like "a painting of a starry night"). If the prompt - which, I remind, comes from the person using the model - were to contain no reference to the existing work, the author, or a characteristic/style strongly associated with the work, then the output would almost never if not never be similar to an existing work.

Similarity is a prerequisite for infringement. No similarity means no infringement [1]. If a person painting with Photoshop produces something similar to an existing work and publishes the work to anyone else in a way not covered by fair use, then the painter bears the liability for the copyright infringement. Analogously, if a person using an ML model gives a prompt with a reference to an existing work, author, or strongly associated characteristic/style, gets an output similar to an existing work, and publishes the output to anyone else in a way not covered by fair use, then the user of the model should bear most of the liability for the copyright infringement. The model maker should take reasonable steps to prevent users from giving such prompts (or the model from accepting such prompts), but otherwise should bear no more than a minority of the liability (unless the model regularly gives infringing outputs for prompts with no references to any works, authors, or characteristics/styles).

[1] https://en.wikipedia.org/wiki/Substantial_similarity#Substan...


“Whoever's using Photoshop to paint, however, does need to have seen existing artwork.”

If this were true there would be no art, my friend.


My bad. I failed to catch the assumptions that were in my head.

Theoretically a painter can make art after having seen nothing but the real world or unedited photographs. Before people had the means to make and share paintings, they couldn't learn from paintings. They learned from the real world and their art styles. (One way of defining a painter's art style is "the opposite of what the painter can't paint".)

Painters today generally don't learn how to paint by restricting themselves to looking at just the real world and realistic depictions, and the exceptions are not relevant to the question of what is copyright infringement. This is an assumption I didn't realize I was making, and now I am making it explicitly: The painter using Photoshop has seen paintings before (complete or in progress, doesn't matter), and the painter's creations are influenced by those past paintings.

So I retract the following sentences from my previous comment:

>> Whoever's using Photoshop to paint, however, does need to have seen existing artwork.

>> A person using an ML model doesn't need to have seen existing works to ask for a generic output, but does need to have seen existing works to ask for an output similar to an existing work (outside of innocent name coincidences in the prompt like "a painting of a starry night").

And replace them with the following:

In order to use Photoshop to produce a painting similar to an existing one, a painter usually needs to have seen existing paintings. There may be coincidences, such as when the painter's art style produces results similar to another painter's.

In order to make a model produce an output similar to an existing work, the user usually needs to have seen existing works and subsequently give the model prompts referencing an existing work, author, or a characteristic/style strongly associated with a work/author. By strongly associated, I mean that a user who hasn't seen the Salvador Dalí painting "The Persistence of Memory" but has seen unrelated oil paintings might coincidentally give the prompt "oil painting of melting clocks" and get an output similar to that Dalí painting.


I really respect and appreciate your willingness to interrogate your own assumptions in public — it’s a beautiful way to be, and I hope I present similarly.

My response would be that much hinges on the word “usually”, as in, the artist must “usually” have seen existing Dali work to recreate a similar aesthetic.

While that’s certainly true, it’s also true that similar art evolves in different places at similar times. Which is to say, while “usually” is accurate, “always” is not.

AI requires “always”. In the same way a photocopier, or a collage of existing works, requires the prior art to exist inside it in order to produce similar output.

I don’t think it’s unreasonable to ask for compensation, if a tool is unable to produce the desired result without including your existing oeuvre of completed, copyrighted work.


> I don’t think it’s unreasonable to ask for compensation, if a tool is unable to produce the desired result without including your existing oeuvre of completed, copyrighted work.

I agree, but I think there should be bounds to how much compensation the authors of the existing works are entitled to, because there is a well-established context in which relying on someone else's work doesn't entitle that person to compensation.

Suppose that I review a book I bought legally and post my review online for anyone to see. I couldn't have come up with the review without reading the book. I would've been unable to produce my book review without putting the author's expression in the existing, copyrighted book into my mind. I don't need to claim that human minds and machine learning models are any more similar than the fact that they can take in information. Without specific information from a particular copyrighted work, I couldn't have made the book review. But as long as I legally obtained the information in the book (by buying the book), I don't need to pay the author for the book review itself because a book review likely is fair use. Due to the first sale doctrine, I don't owe the author anything for my book review even if I bought the book secondhand at a higher price than the original seller charged i.e. 1. I didn't compensate the original author for the book and 2. the person I bought the book from profited from selling the book to me.

I've read the argument that the very scale of what an AI process can do (e.g. be trained on 1000s of books, can then summarize just as many books) means that humans using AI models should compensate authors in cases where a human manually performing the same kind of task doesn't have to compensate authors. That's an important argument, but figuring out the bounds for compensation with respect to scale is not clean either. Suppose that I make a training set consisting entirely of OCRed [1] scans I make of all of the books I own, including a 100 or so books that my friends gave to me. Here are a few versions of how I use the training set, in increasing order of complexity:

EX1. I make a non-ML word counter, and the training is counting the frequency of each word across the sum of the books. I can can make the program predict the next word in a given prompt: pick a random word from the sum of the training set and the prompt; a word gets additional weight in the random selection according to the frequency of the word within the training set and also gets additional weight according to the frequency of the word within the prompt.

EX2. Like EX1, except the program also takes into account the frequency of pairs of adjacent words in the text.

EX3. Like EX2, except the program rates pairs of possibly non-adjacent words in the text according to frequency and the number of words between every two given words in the text.

And so on. At a much later stage such as EX58, I switch to making machine learning versions of the program, which started as a word counter and became more of a word predictor. I think EX1 shouldn't require compensation to the authors of the books, but I'm not sure about EX57. How complex can the capabilities and accuracy (two separate but important factors) of a program trained on copyrighted works be until I need to pay compensation? There is definitely a difference between a sand grain (EX1) and a sand pile (EX??), but a law regulating AI has to consider where to place the boundaries to balance freedom of expression (which includes writing/sharing code and machine learning weights) and obligation to compensate authors. (I also think this is a problem beyond the scope of copyright. Copyright in the US doesn't reward the "sweat of the brow" [2] and doesn't apply in cases without substantial similarity, so I think AI compensation should be treated as completely separate from, not even ancillary to, copyright.)

I don't have answers, but at the very least I don't want a hard line of "no AI model of any complexity without compensating authors". Predictive LLM that can write books (dubious quality, but possibly coherent)? The expected uses of such a model scale very easily in complexity and size, so the authors should get compensation, unless whoever makes the LLM only uses it for private personal purposes. A predictive LLM that can't remember more than 100 words at a time? Probably can't write a coherent book, so no compensation even if the training set includes entire books.

[1] https://en.wikipedia.org/wiki/Optical_character_recognition

[2] https://en.wikipedia.org/wiki/Sweat_of_the_brow#United_State...


>”But as long as I legally obtained the information in the book (by buying the book), I don't need to pay the author for the book review itself because a book review likely is fair use.”

Agreed. A review is generally a personal reaction to a piece of art, and therefore inherently new work.

However, if you’re selling summarized versions of the book as a service, it’s a bit trickier.

When AI exists that is capable of a personal reaction to a work of art, along with the capability of earning money and therefore being able to purchase the book, these questions will be more pertinent. Until that time, we are dealing with people/companies who have included whole cloth existing work into the technology they’re selling to others as a service — a situation well within the realm of licensing laws, which appear to have been utterly ignored in an attempt to force convenient legislation (similarly to the various scooter companies who littered cities with their product and demanded legislation be enacted to their wishes, rather than those of the existing, law-abiding society).

NB: the AI producing books is not the main concern I’m addressing. Most books sold to humans are simply read, not used as fodder for further work — but they still require compensation for merely ingesting the material in the first place. AI is not a human, it is a product. The fact that it can produce further data in no way negates the need to license the data upon which its foundations are built.


The "we're just a tool!" defense didn't work out in the end for Napster.


But it works for google search. I can find all of these images quite easily.


Google doesn’t pretend it generated the image.


Absolutely true. Would you agree that the whole issue is resolved with a disclaimer that not all images are newly generated?


No, but it would be resolved if the tool listed all of the images it used as part of constructing the new image.


I dont think that is really possible. The list would be every image the LLM was trained on.


LOL @ "nearly 1:1"

The anti-LLM panic is going to lead to corporations like Disney owning not just the likenesses of the characters in their movies but anything that even vaguely looks like something in their movies.


Disney is in such a different place politically now than it used to be that they didn't even bother trying to get copyright extended again, which is something they used to be able to get with ease. If IP rights are going to be expanded to be restrictive against big tech, someone other than Disney is going to have to lead the charge.


Isn't that still image considered 'fair use'? So what copyright is involved?


An image is not fair use. It's just an image. Fair use is a test of how someone uses expression copied from someone else's existing work. Publishing the output to report on problematic prompt patterns (as in the Twitter post by Reid Southen @Rahll) is probably fair use. (Can't know for sure unless a court rules on the specific case in question.) Publishing the output while saying "you should try this for a free poster" is infringement.


depends on what you do with it based on my understanding. For example, you couldn't sell posters of it.

In this sense, I think the point isnt that Midjourney cant use it in training, but that not all outputs are free of copyright restrictions.

I can find tons of images on google that carry various forms of copyright, and google is fine search them and display them. I just dont own them.


So the title is hyperbole then. Midjourney has no part in infringement. It's all on the users' shoulders.

Just as I thought.


The output itself seems to be a nothingburger, but it does demonstrate that full images are digested and can be stored largely intact in the model.

This may demonstrate that midjourney didn't just see the copyrighted materials once during training, but continually uses them intact as part of their operations. This is the more interesting Implication


I suppose. It means they should cluster their training data to avoid learning any one thing too sincerely?

Happens on the web. You try to search for something that has some edgy or meme-like meaning now, good like finding the thing. The algorithm is poisoned by the popularity of the other thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: