Interestingly, Ed Newton-Rex, the person hired to build Stable Audio, quit shortly after it was released due to concerns around copyright and the training data being used.
For generative models, if the model authors do not publish the architecture of their model; and, the model uses a transformation from text to another kind of media; you can assume that they have delegated some part of their model to a text encoder or similar feature which is trained on data that they do not have an express license to.
Even for rightsholders with tens of millions to hundreds of millions of library items like images or audio snippets, the performance of the encoder or similar feature in text-to-X generative models is too poor on the less than billion tokens of text in the large repositories. This includes Adobe's Firefly.
It is also a misconception that large amounts of similar data, like the kinds that appear in these libraries, is especially useful. Without a powerful text encoder, the net result is that most text-to-X models create things that look or sound very average.
The simplest way to dispel such issues is to publish the architecture of the model.
But anyway, even if it were all true, the only reason we are talking about diffusers, and the only reason we are paying attention to this author's work Fairly Trained, is because of someone training on data that was not expressly licensed.
> If you require licensing fees for training data, you kill open source ML.
And likely proprietary ML as well, hopefully.
(To be clear, I think AI is an absolutely incredible innovation, capable of both good and harm; I also think it's not unreasonable to expect it to play a safer, slower strategy than the Uber "break the rules to grow fast until they catch up to you" playbook.)
I'm all for eliminating copyright. Until that happens, I'm utterly opposed to AI getting a special pass to ignore it while everyone else cannot.
Fair use was intended for things like reviews, commentary, education, remixing, non-commercial use, and many other things; that doesn't make it appropriate for "slurp in the entire Internet and make billions remixing all of it at once". The commercial value of AI should utterly break the four-factor test.
Here's the four-factor test, as applied to AI:
"What is the character of the use?" - Commercial
"What is the nature of the work to be used?" - Anything and everything
"How much of the work will you use?" - All of it
"If this kind of use were widespread, what effect would it have on the market for the original or for permissions?" - Directly competes with the original, killing or devaluing large parts of it
Literally every part of the four-factor test is maximally against this being fair use. (Open Source AI fails three of four factors, and then many users of the resulting AI fail the first factor as well.)
> If they lose, they’ll survive.
That seems like an open question. If they lose these court cases, setting a precedent, then there will be ten thousand more on the heels of those, and it seems questionable whether they'd survive those.
> To be clear, I don’t like the idea of companies profiting off of people’s work. I just like open source dying even less.
You're positioning these as opposed because you're focused on the case of Open Source AI. There are a massive number of Open Source projects whose code is being trained on, producing AIs that launder the copyrights of those projects and ignore their licenses. I don't want Open Source projects serving as the training data for AIs that ignore their license.
It’s not so clear cut. Many lawyers believe all that matters is whether the output of the model is infringing. As much as people love to cite ChatGPT spitting out code that violates copyright, the vast majority of the outputs do not. Those that do, are quickly clamped down on — you’ll find it hard to get Dalle to generate an image of anything Nintendo related, unless you’re using crafty language.
There’s also the moral question. Should creators have the right to prevent their bits from being copied at all? Fundamentally, people are upset that their work is being used. But "used" in this case means "copied, then transformed." There’s precedent for such copying and transformation. Fair use is only one example. You’re allowed to buy someone’s book and tear it up; that copy is yours. You can also download an image and turn it into a meme. That’s something that isn’t banned either. The question hinges on whether ML is quantitatively different, not qualitatively different. Scale matters, and it’s a difference of opinion whether the scale in this case is enough to justify banning people from training on art and source code. The courts’ opinion will have the final say.
The thing is, I basically agree with you in terms of what you want to happen. Unfortunately the most likely outcome is a world where no one except billion dollar corporations can afford to pay the fees to create useful ML models. Are you sure it’s a good outcome? The chance that OpenAI will die from lawsuits seems close to nil. Open source AI, on the other hand, will be the first on the chopping block.
>Those that do, are quickly clamped down on — you’ll find it hard to get Dalle to generate an image of anything Nintendo related, unless you’re using crafty language.
really it seems more like someone was afraid of angering Nintendo who is a corporate adversary one does not like to fight and thus it has a bunch of blocks to keep from generating anything that offends Nintendo, that does not really translate to quickly and easily stopping and blocking offending generations across every copyrighted work in the world.
Dalle on Bing is happy to generate Mario and Luigi and Sonic and basically everybody from everybody without using crafty language so I'm unsure of what you're talking about.
It would be interesting to see if courts agree that training+transforming = copying.
If I paint a picture inspired by Starry Night(Van Gogh) - does that inherently infringe on the original? I looked at that painting, learned the characteristics, looked at other similar paintings and painted my own. I basically trained my brain. (and I mean the copyright, not the individual physical painting)
And I mean cases where I am not intentionally trying to recreate the original, but doing a derivative(aka inspired) work.
Because it's already settled that recreating the original from memory will infringe on copyright.
> Many lawyers believe all that matters is whether the output of the model is infringing.
What I don't understand (as a European with little knowledge of court decisions on fair use): with the same reasoning you might make software piracy a case of 'fair use', no?
You take stuff someone else wrote - without their consent - and use it to create something new. The output (e.g. the artwork you create with Photoshop) is definitely not copyrighted by the manufacturer of the software.
But in the case of software piracy, it is not about the output. With software, it seems clear that the act of taking something you do not have the rights for and using it for personal (financial) gain is not covered by fair use.
Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works?
What am I missing?
That's not a good example. Making a copy of a record you own(as an example ripping a audio CD to MP3) is absolutely fair use. Giving your video game to your neighbor to play - that's also fair use.
Fair use is limited when it comes to transformative/derivative work. Similar laws are in place all over the world, just in US some of those come from case law.
> With software, it seems clear that the act of taking something you do not have the rights for and using it for personal (financial) gain is not covered by fair use.
> Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works?
That's not a good analogy. The argument, that is not settled yet, is that a model doesn't contain enough copyrightable material to produce an infringing output.
Take your software example - you legally acquire Civ6, you play Civ6, you learn the concepts and the visuals of Civ6... then you take that knowledge and create a game that is similar to Civ6. If you're a copyright maximalist - then you would say that creating any games that mimic Civ6 by people who have played Civ6 is copyright infringement. Legally there are definitely lower limits to copyright - like no one owns the copyright to the phrase "Once upon a time", but there may be a copyright on "In a galaxy far far away".
> Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works? What am I missing?
If Photoshop was hosted online by Adobe, you would be free to do so. It's copyrighted, but you'd have an implied license to use it by the fact it's being made available to you to download. Same reason search engines can save and present cached snapshots of a website (Field v. Google).
In other situations (e.g: downloading from an unofficial source) you're right that private copying is (in the US) still prima facie copyright infringement. However, when considering a fair use defense, courts do take the distinction into strong consideration: "verbatim intermediate copying has consistently been upheld as fair use if the copy is ‘not reveal[ed] . . . to the public.’" (Authors Guild v. Google)
If you were using Photoshop in some transformative way that gives it new purpose (e.g: documenting the evolution of software UIs, rather than just making a photo with it as designed) then you may* be able to get away with downloading it from unofficial sources via a fair use defense.
Bear with me here. Rushed and poorly articulated post incoming...
In the broadest sense, generative AI helps achieve the same goals that copyleft licences aim for. A future where software isn't locked away in proprietary blobs and users are empowered to create, combine and modify software that they use.
Copyleft uses IP law against itself to push people to share their work. Generative AI aims to assist in writing (or generating) code and make sharing less neccesary.
I argue that if you are a strong believer in the ultimate goals of copyleft licences you should also be supporting the legality of training on open source code.
The obvious difference is that copyleft is voluntary, while having your art style stolen isn't.
If an artist approached a software developer, created a painting of them using their Mac, and said "There, I've done your job for you" you'd think they were an idiot.
This is the same from the other side. The inability to understand why that's a realistic analogy does not change the fact that it is.
"> The obvious difference is that copyleft is voluntary, while having your art style stolen isn't."
This is why it is important whether you consider that infringement occurs upon ingestion or output. If it only matters for outputs, then artists have a problem, since copyright doesn't protect styles at all, see for example the entire fashion industry.
There is a saving grace though: Artists can make a case that the association of their distinctive style with their name is at least potentially a violation of trademark or trade dress, especially if that association is being used to promote the outputs to the public. This is a fairly clear case of commercial substitution in the market for creating new works in that artist's style and creating confusion concerning the origin of the resulting work.
Note that the market for creating new works in a particular artist's distinctive and named style kind of goes away upon the artist's passing. What remains is the trademark issue of whether a particular work was actually created by the artist or not, which existing trademark law is well suited to policing, as long as the trademark is defended, even past the expiration of the copyright.
Meanwhile, trademark (and copyright) also apply to the subjects of works, like Nintendo's Mario or Disney's Mickey Mouse or Marvel's Iron Man. But we don't really want models to simply be forbidden from producing them as outputs, or they become useless as tools for the purpose of parody and satire, not to mention the ability to create non-commercial fan art. The potential liability for violating these trademarks by publishing works featuring those characters rests with the users rather than the tools, though, and again existing law is fairly well suited to policing the market. Similarly, celebrities' right of publicity probably shouldn't prevent models from learning what they look like or from making images that include their likeness when prompted with their name, but users better be prepared to justify publishing those results if sued.
You can also make the (technical) argument that if you just ask for an image of Wonder Woman, and you get an image that looks like Gal Gadot as Wonder Woman, that the model is overfitting. That's also the issue with the recent spate of coverage of Midjourney producing near-verbatim screenshots from movies.
It might be appropriate though to regulate commercial generative AI services to the extent of requiring them to warn users of all the potential copyright/trademark/etc. violations, if they ask for images of Taylor Swift as Elsa, or Princess Peach, or Wonder Woman, for example.
The majority of AI models out there (at least by popularity / capability) are proprietary; with weights and even model architectures that are treated as trade secret. Instead of having human-written music and movies that you legally can't copy, but practically can; you now have slop-generating models that live on a cloud server you have no control over. Artists and programmers who want to actually publish something - copyright or no - now have to compete with AI spam on search engines, while ChatGPT gets to merely be "confidently wrong" because it was built on the Internet equivalent of low-background metal - pre-AI training data. Generative AI is not a road that leads to less intellectual property[0], it's just an argument for reappropriating it to whoever has the fastest GPUs.
This is contrary to the goals of the Free Software movement - and also why Free Software people were the first to complain about all the copying going on. One of the things Generative AI is really good at is plagiarism - i.e. taking someone else's work and "rewriting it" in different words. If that's fair use, then copyleft is functionally useless.
It's important to keep in mind the difference between violating the letter of the law and opposing the business interests of the people who wrote the law. Copyleft and share-alike clauses have the intention of getting in the way of copyright as an institution, but it also relies on copyright to work, which is why the clauses have power even though they violate the spirit of copyright. Generative AI might violate the letter of the law, but it's very much in the spirit of what the law wants.
[0] Cory Doctorow: "Intellectual property is any law that allows you to dictate the conduct of your competitors"
Is FSF's stance on AI actually clear? I thought they were just upset it was made by Microsoft.
Creative Commons has been fairly pro-AI -- they have been quite balanced, actually, but they do say that opt-in is not acceptable, it should be opt-out at most. EFF is fairly pro AI too -- at least, against using copyright to legislate against it.
You shouldn't discount progress in the open model ecosystem. You can sort of pirate ChatGPT by fine tuning on its responses, there's GPU sharing initiatives like Stable Horde, there's TabbyML which works very well nowadays, and Stable Diffusion is still the most advanced way of generating images. There's very much of an anti-IP spirit going on there, which is a good thing -- it's what copyleft is there for in sprit, isn't it?
The Software Freedom Conservancy has been complaining about GitHub Copilot since 2022[0]. They specifically cite Copilot's use of training data in ways that violate the copyleft and attribution requirements of various FOSS licenses. Hector Martin (the guy porting Linux to MacBooks) also agrees with this. It's also important to note that the first AI training lawsuit was specifically to enforce GPL copyleft[1].
The EFF's argument has come across to me less like "AI is cool and good" and more like "copyright doesn't do a good job of protecting artists against AI taking their jobs". Cory Doctorow's also taken a similar position, arguing that unions are better at protecting against AI than copyright is. e.g. WGA being able to get contractual provisions preventing workers from being replaced with AI.
This is a different vein of opposition to AI from what we saw the following year in 2023 with artists and writers, though. Even then, those artists and writers aren't suddenly massively pro-copyright[2] and more consider it a means to fatally wound AI companies[3]. In contrast, big businesses that own shittons of copyright have been oddly quiet about AI. Sure, you have Getty Images and The New York Times suing Stability and OpenAI, but where's, say, Disney or Nintendo's litigation? These models can draw shittons of unlicensed fanart[4], and nobody cares. Wizards and Wacom made big statements against AI art, but then immediately got caught using it anyway, because stock image sites are absolutely flooded with it.
My personal opinion is that generative AI creates enough issues that we can't group them down into neat "pro-copyright" vs. "anti-copyright" arguments. People who share their work for free online are complaining about it while people who expect you to pay money for their work are oddly ambivalent. AI is orthogonal to copyright.
I will give you that the open model community is doing cool shit with their stolen loot. However, that's still something large corporations can benefit from (e.g. Facebook and LLaMA).
[3] Their actual argument against AI is based on moral grounds, not legal ones. I don't think any artist is going to accept licensing payments for training data, they just want the models deleted off the Internet, full stop.
[4] OpenAI tried to ban asking for fanart, but if you ask for something vaguely related (e.g. "red videogame plumber" or "70s sci-fi robot") you'll get fanart every time.
> Fair use was intended for things like reviews, commentary, education, remixing, non-commercial use, and many other things
"many other things" has included, for example, Google Books scanning millions of in-copyright books, storing internally them in full, and making snippets available.
The basis for copyright itself is to "promote the progress of science and useful arts". For that reason a key consideration of fair use, which you've skipped entirely, is the transformative nature of the new work. As in Campbell v. Acuff-Rose Music: "The more transformative the new work, the less will be the significance of other factors", defined as "whether the new work merely 'supersede[s] the objects' of the original creation [...] or instead adds something new".
> "How much of the work will you use?" - All of it
For the substantiality factor, courts make the distinction between intermediate copying and what is ultimately made available to the public. As in Sega v. Accolade: "Accolade, a commercial competitor of Sega, engaged in wholesale copying of Sega's copyrighted code as a preliminary step in the development of a competing product" yet "where the ultimate (as opposed to direct) use is as limited as it was here, the factor is of very little weight". Or as in Authors Guild v. Google: “verbatim intermediate copying has consistently been upheld as fair use if the copy is ‘not reveal[ed] . . . to the public.’”
The factor also takes into account whether the copying was necessary for the purpose. As in Kelly v. Arriba Soft: "If the secondary user only copies as much as is necessary for his or her intended use, then this factor will not weigh against him or her"
While there are still cases of overfitting resulting in generated outputs overly similar to training data, I think it's more favorable to AI than simply "it trained on everything, so this factor is maximally against fair use".
> Directly competes with the original, killing or devaluing large parts of it
The factor is specifically the effect of the use upon the work - not the extent to which your work would be devalued even if it had not been trained on your work.
None of those arguments make sense. The output of AI absolutely does supersede the objects of the original creation. If it didn't, artists wouldn't care that they were no longer able to make a living.
Substantiality of code does not apply to substantiality of style. What's being copied is look and feel, which is very much protected by copyright.
The copying clearly is necessary for the purpose. No copying, no model. The fact that the copying is then compressed after ingestion doesn't change the fact that it's necessary for the modelling process.
Last point - see first point.
IANAL, but if I was a lawyer I'd be referring back to look and feel cases. It's the essence of an artist's look and feel that's being duplicated and used for commercial gain without a license.
That's true whether it's one artist - which it can be, with added training - or thousands.
Essentially what MJ etc do is curate a library of looks and feels, and charge money for access.
It's a little more subtle than copying fixed objects, but the principle remains the same - original work is being copied and resold.
> None of those arguments make sense. The output of AI absolutely does supersede the objects of the original creation. If it didn't, artists wouldn't care that they were no longer able to make a living.
The question for transformative nature is whether it merely supersedes or instead adds something new. E.G: Google translate was trained on books/documents translated by human translators and may in part displace that need, but adds new value in on-demand translation of arbitrary text - which the static works it was trained on did not provide.
> Substantiality of code does not apply to substantiality of style.
I'm not certain what you're saying here.
> The copying clearly is necessary for the purpose. No copying, no model.
Which, for the substantiality factor, works in favor of the model developers.
> It's the essence of an artist's look and feel that's being duplicated and used for commercial gain without a license.
Copyright protects works fixed in a tangible medium, not ideas in someone's head. It would protect a work's look/appearance (which can be an issue for AI when overfitting causes outputs that are substantially similar to a protected work), but not style or "an artist's look and feel".
> "many other things" has included, for example, Google Books scanning millions of in-copyright books, storing internally them in full, and making snippets available.
That succeeds on a different part of the four-factor test, the degree to which it competes with / affects the market for the original.
Google Books is not automatically producing new books derived from their copies that compete with the original books.
> That succeeds on a different part of the four-factor test, the degree to which it competes with / affects the market for the original
It satisfied multiple parts of the four-factor test. It was found satisfy the first factor due to being "highly transformative", the second factor was considered not dispositive is isolation and favoring Google when combined with its transformative purpose, and it satisfied the third factor as the usage was "necessary to achieve that purpose" - with the court making the distinction between what was copied (lots) and what is revealed to the public (limited snippets).
As you had all factors as "maximally against" fair use, do you believe that AI is significantly less transformative than Google Books? I'd say even in cases where the output is the same format as the content it was trained on, like Google Translate, it's still generally highly transformative.
> the degree to which it competes with
Specifically, to be pedantic, it's the effect of the use/copying of the original copyrighted work.
> "What is the character of the use?" - Commercial
Your first factor seems to not at all be like that which Stanford has in its guidelines[1], which they call the transformative factor:
In a 1994 case, the Supreme Court emphasized this first factor as being an important indicator of fair use. At issue is whether the material has been used to help create something new or merely copied verbatim into another work.
LLMs mostly create something new, but sometimes seems to be able to regurgitate passages verbatim, so I can see arguments for and against, but to my untrained eyes doesn't seem as clear cut.
Where this argument falls down for me is that "use" w.r.t. copyright means copying, and neither AI models nor their outputs include any material copied from the training data, in any usual sense. (Of course the inputs are copied during training, but those copies seem clearly ephemeral.)
Genuinely curious: for anyone who thinks AI obviously violates copyright, how do you resolve this? E.g. do you think the violation happens during training or inference? And is it the trained model, or the model output, that you think should be considered a derived work?
You're trying to use words without the legal context here. The legal definition of words isn't 1-1 wit our colloquial usage.
Translation of a book is non-transformative and retains the original author's artistic expression.
As a counter example - if you write an essay about Picasso's Guernica painting, it is derivative according to our colloquial use of the term, but legally it's an original work.
> In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of ... the underlying work
A trained model fails that on two counts, doesn't it? Both the "includes" part, and the fact that a model is itself not an expressive work of authorship.
> And no part of the original source code is in the binary output.
It's not about whether the binary includes the raw text of the source, but whether it copies the expressive content. Anything expressive (i.e. copyrightable) in a compiled binary must have come from the sourcecode, so that's what makes it a derived work.
But the same isn't true of LLMs, which are more like "data about their inputs", than "a transformed version of their inputs".
If a trained model doesn't meet the definition of being a derivative work, it doesn't matter whether the data it's not a derivative work of was curated.
> "How much of the work will you use?" - All of it
That depends on the interpretation of "use", and it would be interesting to read what lawyers think. You learned the language largely from speech and copyrighted works. (All the stories, books, movies, etc. you ever read/heard) When you wrote this comment did you use all of them for that purpose? Is the case of AI different?
To be clear that's a rhetorical question - I don't expect anyone here to actually have a convincing enough argument either way.
Principles applied to human brains are not automatically applicable to AI training. To the best of my knowledge, there's no particular law that says a human brain is exempt from copyright, but it empirically is, because the alternative would be utterly unreasonable. No such exemption exists for AI training, nor should it.
Ideas/works/etc literally live rent-free in your head. That doesn't mean they should live rent-free in an AI's neural network.
Changing that should involve actually reducing or eliminating copyright, for everyone, not giving a special pass to AI.
> To the best of my knowledge, there's no particular law that says a human brain is exempt from copyright, but it empirically is, because the alternative would be utterly unreasonable.
Human brain most definitely is not exempt. If you read Lord of the Rings and then write down a new book, with the same characters and same story line - that's plain copying(lookup the etymology of the verb to copy). If you look at a painting and paint a very similar painting - that's still copying.
Human brains are the reason we have copyright. Your recital of passages from any copyrighted book would violate the copyright, if not for fair use doctrine. And it has nothing to do with whether you do it yourself, or have a TTS engine produce the sound.
The human brain is absolutely exempt, insofar as the copy stored in your brain does not make your brain subject to copyright, even if a subsequent work you produce might be. Nobody's filing copyright infringement claims over people's memories in and of themselves.
I'm saying that AI does not and should not automatically get the exception that a human brain does.
AI is a genie that you can't really stuff back into a bottle. It's out and it's global.
If the US had tighter regulations, China or someone else will take over the market. If AI is genuinely transformative for productivity, then the US would just fall behind, sooner or later.
Then let them! If another country put forward tighter regulations to help actual people over and above the state that holds them, then that is good in itself, and either way will pay for itself. Why are we worried about China or whoever taking over the market of something that we see has bad effects?
Like, we see this line everywhere now, and it simply doesnt make sense. At some point you just have to believe something, be principled. Treating the entire world as this zero sum deadlock of "progress" does nothing but prevent one from actually being critical about anything.
This would-be Oppenheimer cosplay is growing really old in these discussions.
That makes no sense. OpenAI must lose and it must not be possible to have proprietary models based on copyrighted works. It's not fair use because OpenAI is profiting from the copyright holders work and substituting for it while not giving them recompense.
The alternative is that any models widely trained on copyrighted work are uncopyrightable and must be disclosed, along with their data sources. In essence this is forcing all such models to be open. This is the only equitable outcome. Any use of the model to create works has the same copyright issues as existing work creation, ie if substantially replicates an existing work it must be licenced.
Just because something is not copyrightable doesn’t automatically mean it must be disclosed. If weights aren’t copyrightable (and I don’t think they should be, as the weights are not a human creation), commercial AI’s just get locked behind API barriers, with terms of usage that forbid cloning. Copyright then never enters the picture, unless weights get leaked.
Whether or not that’s equitable is in the eye of the beholder. Copyright is an artificial construct, not a natural law. There is nothing that says we must have it, or we must have it in its current form, and I would argue the current system of copyright has been largely harmful to creativity for a long time now. One of the most damning statements I’ve read in this thread about the current copyright system is how there’s simply not enough unlicensed content to train models on. That is the bed that the copyright-holding corporations have made for themselves by lobbying to extend copyright to a century, and it all but assured the current situation.
> Just because something is not copyrightable doesn’t automatically mean it must be disclosed.
No I'm saying that's what they law should be, because models can be built and used without anyone knowing. If it's illegal not to disclose them you can punish people.
Copyright is something that protects the little guy as much as big corps. But the former has more to lose as a group in the world of AI models, and they will lose something here no matter what happens.
For what it’s worth, I agree with your second paragraph. But it would take legislation to enforce that. For now, it’s unclear that OpenAI will lose. Quite the opposite; I’ve spoken with a few lawyers who believe OpenAI is on solid legal footing, because all that matters is whether the model’s output is infringing. And it’s not. No one reads books via ChatGPT, and Dalle 3 has tight controls preventing it from generating Pokémon or Mario.
All outcomes suck. The trick is to find the outcome that sucks the least for the majority of people. Maybe the needs of copyright holders will outweigh the needs of open source, but it’s basically guaranteed that open source ML will die if your first paragraph comes true.
Absolutely true. That's the end game and we should be working toward influencing that. It's within our power.
> I’ve spoken with a few lawyers who believe OpenAI is on solid legal footing
No one knows anything, this is too novel, and even if OpenAI gets some fair use ruling, it will be inequitable and legislation is inevitable. OpenAI is between a rock and a hard place here. If you read the basis for fair use and give each aspect serious consideration, as a judge should do, I can't see it passing fair use muster. It's not a case of simply reproducing work, which in unclear here, it's the negative effect on copyright holders, and that effect is undeniable.
> All outcomes suck.
I don't think so. It's possible to fashion something equitable, but people other than the corporations have to get involved.
> If you require licensing fees for training data, you kill open source ML.
This is another one of those “well if you treat the people fairly it causes problems” sort of arguments. And: Sorry. If you want to do this you have to figure out how to do it ethically.
There are all sorts of situations where research would go much faster if we behaved unethically or illegally. Medicine, for example. Or shooting people in rockets to Mars. But we can’t live in a society where we harm people in the name of progress.
Everyone in AI is super smart — I’m sure they can chin-scratch and figure out a way to make progress while respecting the people whose work they need to power these tools. Those incapable of this are either lazy, predatory, or not that smart.
"Ethical" in this case is a matter of opinion. The whole point of copyright was to promote useful sciences and arts. It’s in the US constitution. You don’t get to control your work out of some sense of fairness, but rather because it’s better for the society you live in.
As an ML researcher, no, there’s basically no way to make progress without the data. Not in comparison with billion dollar corporations that can throw money at the licensing problem. Synthetic data is still a pipe dream, and arguably still a copyright violation according to you, since traditional models generate such data.
To believe that this problem will just go away or that we can find some way around it is to close one’s eyes and shout "la la la, not listening." If you want to kill open source AI, that’s fine, but do it with eyes open.
Yes, it’s true that open source projects that cannot pay to license content owned by other people are at a disadvantage versus those who can. Open source projects cannot, for example, wholly copy code owned by other people.
Also, beware of originalist interpretations of the Constitution. I believe there’s been about 250 years of law clarifying how copyright works, and, not to beat a dead horse, I don’t think it carves out a special exception for open source projects.
> If you require licensing fees for training data, you kill open source ML.
I don't think this is true. There's a huge amount of public domain works, as well as stuff licensed under permissive copyleft licenses, that can be used.
But, even if it did kill off open-source ML, it would still be necessary, because it's morally wrong to train ML models on copyrighted content without compensating the copyright owners (on their terms).
As a content creator, I explicitly do not want or consent to any of my creative works being used to train ML models without having a licensing agreement through which I am financially compensated.
Doesn't the Ars Technica article of that post state that the courts have not rejected the claim of copyright infringement?
> failed to provide evidence supporting any of their claims except for direct copyright infringement
(emphasis mine)
Where are the courts saying that models can be trained on copyrighted content? (I believe that it's possible but unless I'm missing something I don't see it in that Ars article)
> It sounds as if you imply that would be bad. But what if it wasn't?
Entirely possible. The early history of aviation was open source in the sense that many unlicensed people participated, and died. The world is strictly better with licensing requirements in place for that field.
But no one knows. And if history is any guide for software, it seems better to err on freedoms that happen to have some downside rather then clamping down on them. One could imagine a world where BitTorrent was illegal. Or cryptography, or bitcoin.
It’s much the same. Only authorized people are allowed to do X. Since X costs a lot of money, by definition it can’t be open source. There are no hobbyist pilots that carry passengers without a license, and if there are, they’re quickly told to stop. Generative AI faces a real chance of having the same fate. Which means open source will look similar to these planes trying to compete with commercial aircraft: https://pilotinstitute.com/flying-without-a-license/
If you can think of a better example, I’d like to know though. I’ll use it in future discussions. It’s hard to think of good analogies when the tech has new social effects.
If I fly a plane and crash, my passengers die. If I generate an image using a model whose training included some unlicensed imagery... Disney misses out on a fraction of a cent?
There is a real reason why some professions are licenced and others are not.
Your analogy is nonsensical. Not having a better one is irrelevant.
If training data requires licensing fees, ML practitioners will become a licensed field de facto, because no one in the open source world will have the resources to pursue it on their own.
Perhaps a better analogy is movies. At least with acting, you can make your own movies, even if you’re on a shoestring budget. With ML, you quite literally can’t make a useful model. There’s not enough uncopyrighted data to do anything remotely close to commercial models, even in spirit.
It’s deeper than that. The basis of licensing is copyright. If the upcoming court cases rule in OpenAI’s favor, you won’t be able to apply copyright to training data. Which means you can’t license it.
Or rather, you can, but everyone is free to ignore you. A license without teeth is no license at all. The GPL is only relevant because it’s enforceable in court.
I’m sure some countries will try the licensing route though, so perhaps there you’d be able to make one.
EDIT: I misread you, sorry. You’re saying that if OpenAI loses and license fees become the norm, maybe people will be willing to let their data be used for open source models, and a license could be crafted to that effect.
Probably, yes. But the question is whether there’s enough training data to compete with the big companies that can afford to license much more. I’m doubtful, but it could be worth a try.
I would say that GPT-3 and its successors have nothing to do with open source, and if OpenAI uses open source as a shield, then we are all doomed. I would distance myself and any open source projects from involvement in OpenAI court cases as far as possible. Yes, they have delivered some open source models, but not all of them. Their defense must revolve around fair use and purchased content if they use books and materials that were never freely available. It should be permissible to purchase a book or other materials once and use them for the training of an unlimited number of models without incurring licensing fees.
Sadly not. Making something illegal has social effects, not just legal effects. I’ve grown tired of being verbally spit on for books3. One lovely fellow even said that he hoped my daughter grows up resenting me for it.
It being legal is the only guard against that kind of thing. People will still be angry, but they won’t be so numerous. Right now everyone outside of AI almost universally despises the way AI is trained.
Which means you won’t be able to say that you do open source ML without risking your job. People will be angry enough to try to get you fired for it.
(If that sounds extreme, count yourself lucky that you haven’t tried to assemble any ML datasets and release them. The LAION folks are in the crosshairs for supposedly including CSAM in their dataset, and they’re not even a dataset, just an index.)
US copyright has limited reach. There are models trained in China, where the IP rules are... not really enforced. It would be an interesting world where you use / pay for those models because you can't train them locally.
Is there evidence that it's actually everyone or even close to everyone? The core innovation that the internet brought to harassment is that it is sufficient for some 0.0...01% of all people to take issue with you and be sufficiently dedicated to it for every waking minute of your life to be filled with a non-stop torrent of vitriol, as a tiny percentage of all internet users still amounts to thousands.
Perhaps. The reason I did it was because OpenAI was doing it, and it’s important for open source to be able to compete with ChatGPT. But if OpenAI’s actions are ruled illegal, then empirically open source wasn’t a persuasive enough reason to allow it.
The point should be to kill training on unlicensed material. There needs to be regulation and tools to identify what was the training data. But as always, first comes the siphoning part, the massive extraction of value, then when the damage is done there will be the slow moving reparations and conservationism.
A ton of us out here don't agree with your goals. I think these models are transformative enough that the value added by organizing and extracting patterns from the data outweighs the interests of the extremely diffuse set of copyright holders whose data was ingested. So regardless of the technical details of copyright law (which I still think are firmly in favor of OpenAI et al) I would strongly opposed any effort to tighten a legal noose here.
Agreed. And every software engineer writing code should pay 10% of their salary to the publishers of the books that they learned their programming skills from.
The reality is always a dynamic tension between law, regulation, precedent, and enforceability.
It is possible to strangle OpenAI without strangling AI: pmarca is anti-OpenAI in print, but you can bet your butt he hopes to invest in whatever replaces it, and he’s got access to information that like, 10 people do.
A useful example would be the Napster Wars: the music industry had been rent seeking (taking the fucking piss really) for decades and technology destroyed the free ride one way or another. The public (led by the technical/hacker/maker public) quickly showed that short of disconnecting the internet, we were going to listen to the 2 good songs without buying the 8 shitty ones. The technical public doesn’t flex its muscles in a unified way very often, but when it does, it dictates what is and isn’t on the menu.
The public wants AI, badly. They want it aligned by them within the constraints of the law (which is what “aligned” should mean to begin with).
The public is getting what it wants on this: you can bet the rent. Whether or not OpenAI gets on board or gets run the fuck over is up to them.
“You in the market for a Tower Records franchise Eduardo?”
> But anyway, even if it were all true, the only reason we are talking about diffusers, and the only reason we are paying attention to this author's work Fairly Trained, is because of someone training on data that was not expressly licensed.
Thanks for putting this into words. I'm of the same opinion and this is the best articulation I have so far.
Calling him "the person hired to build Stable Audio" seems a bit misleading? He was in a executive position (VP of product for Stability's audio group). An important position, but "person hired to build" to me evokes the image of lead developer/researcher.
I think that also helps in understanding his departure, since he's a founder with a music background.
It isn't unusual for those in leadership positions to use such phrasing when talking about projects and products. It's not a "taking credit" from the engineers sort of thing, but rather about the leadership of the engineers.
Managing a group of people is not synonymous with doing the actual knowledge work of researching and developing innovations that enabled this technology. I find it hard to believe that the contribution of his management somehow uniquely enabled this group of engineers to create this using their experience and expertise.
A captain may steer the ship, but they're not the one actually creating and maintaining the means by which it moves.
> A captain may steer the ship, but they're not the one actually creating and maintaining the means by which it moves.
And yet virtually everyone will go along with a statement like "The captain sailed the ship across the ocean" or "Captain Kirk charted the Gamma Quadrant" or whatever, so I'm not sure how this serves as an objection to the original phrasing.
Depending on the work, an equal level of technical competency can be very beneficial for leadership to have, if not required in some situations. To debate his contribution as a counterfactual exercise without the context of his leadership is entirely non-productive. HN is always quick to dismiss leadership, despite evidence of good and poor leadership being tautologically debated about nearly constantly here. A talented group of engineers can self-organize, but their collective technical expertise does not translate to business acumen.
The crux of the debate appears to rest on the fact that context matters. If an engineer says to another engineer "I built the spam detection system", it is understood that they mean they either wrote the code or had some direct part in producing it. If an executive says to another executive "I made the Mac", neither is interpreting that as them literally building the thing. They know they are in leadership, the meaning is assumed to be "as a leader".
Ed here. Saw this thread and thought I'd weigh in.
Agreed, I wouldn't say I was hired to build Stable Audio. Crazy talented team of research engineers / software engineers / designers did the building.
Also wanted to clarify that I didn't quit due to concerns around the training data used for Stable Audio. I was proud of the approach we took to training data - a rev share with rights holders. I quit because of the prevailing view on training data at the wider company, as documented in its public response to the copyright office, where it argues that training on people's work without consent is fair use.
FWIW, I am a rightsholder for a number of published songs and recordings. I once spent $12k of my own money on a record and made about $1000 back.
I have spent more blood, tears and money on art than most of you would find even remotely bearable.
I not only consider my songs to be fair use for training a model but I would also honored if my works were included and influenced further musicians in a way that my records probably never will.
The best songwriters I know have other careers and keep on going otherwise. If you actually care about musicians you should make it a habit to go see local live music!
Thanks for setting the record straight! People tend to put a singular face to things, especially if they have strong feelings about it. Leads to a lot of misrepresentation, especially when the context doesn't accompany the message.
HN is usually allergic to recognizing the value of leadership, which always struck me as ironic considering it's leadership that made plenty of startups work.
Ed still likes Stability, especially as we fully trained stable audio on rights licensed data (bit different in audio to other media types), offer opt out of datasets etc.
There has to be a solution for the copyright roadbloacks that companies encounter when training models. I see it no different than an artist creating music which is influenced by the music the artist has been listening throughout his whole life, fundementally it's the exact same thing. You cannot create music or art in general in a vacuum
Surprised to see an actual gif pop up after adding that to a site. I guess thats just base64, still kind of amazing that its all inside a seemingly random string of text
By the way, you can simply paste the base64-encoded data (everything inside the quotes) into your address bar to view it. Probably not the safest action generally, but should be OK if it's an image.
Chrome and Chromium are virtually identical except for Google services, which aren't required to do anything with the browser except for installing Chrome extensions that can alternatively be sideloaded, so this is nitpicking.
Jumping in to defend parent comment, there’s nothing Open Source about Google Chrome and it’s highly relevant in this context because they are notorious for putting technologies and tracking in there that many people find objectionable.
Tangential, but I tried to build chromium the other day but stopped when it said it required access to Google cloud platform to actually build it. If something requires a proprietary build system, does it matter that it's open source?
I think I got my wires crossed with ChromiumOS which when I last read the docs seemed to suggest that Google cloud platform was required. I now can't find those specific docs either so I retract my statement.
Widevine is a Google service so I didn't forget it. You can still play media if you dislike DRM usually, at =< 720p that is, which is a lesser security standard. I'm not even sure whether it involves additional servers.
Safari is known to be troublesome when a webpage contains many HTML audio players. It can get extremely slow and unresponsive.
Every researcher I know in the audio domain uses Chrome for exactly that reason. The alternative would be not to use the standard HTML audio tag which would be ridiculous.
As with Stable Diffusion, text prompting will be the least controllable way to get useful output with this model. I can easily imagine midi being used as an input with control net to essentially get a neural synthesizer.
Yes. Since working on my AI melodies project (https://www.melodies.ai/) two years ago, I've been saying that producing a high-quality, finalized song from text won't be feasible or even desirable for a while, and it's better to focus on using AI in various aspects of music making that support the artist's process.
Text will be an important input channel for texture, sound type, voice type and so on. You can't just use input audio, that defeats the point of generating something new. You can't also only use MIDI, it still needs to know what sits behind those notes, what performance, what instrument. So we need multiple channels.
Emad hinted here on HN the last time this was discussed that they were experimenting with exactly that. It will come, by them or by someone else quickly.
Text-prompting is just a very coarse tool to quickly get some base to stand on, ControlNet is where the human creativity again enters.
I think it would be ideal if it could take the audio recording of humming or singing a melody together with a text prompt and spitting out a track that resembles it
It's crazy that nobody cares. It seems to me that ML hype trends focus on denying skills and disproving creativity by denoising randoms into what are indistinguishable from human generation, and to me this whole chain of negatives don't seem to have proven its worth.
LLMs allow people without certain skills to be creative in forms of art that are inaccessible to them.
With Dalee - I can get an image of something I have in my head, without investing into watching hundreds of hours of Bob Ross(which I do anyway)
With audio generators - I can produce music that is in my head, without learning how to play an instrument or paying someone to do it. I have to arrange it correctly, but I can put out a techno track without spending years in learning the intricacies.
This is incredibly good compared to SOTA music models (MusicGen, MusicLM). It looks like there's also a product page where you can subscribe to use it, similar to Midjourney: https://www.stableaudio.com/
Sadly it's not open-weight and it doesn't look like there's an API (again like Midjourney): you subscribe monthly to generate audio in their UI, rather than having something developers can integrate or wrap.
I was hoping to use it to generate some sound effects to use in a game I'm working on - but looks like I need an "enterprise license" (https://www.stableaudio.com/pricing)
Why does this have a different clause I wonder, and doesn't just fall under "In commercial products below 100,000 MAU"?
I think we still need the step where the AI learns what a high quality sound library sounds like and then applies the previously learned abilities by triggering sounds of that library via MIDI.
That way you'd get perfect audio quality with the creativity of a musical AI.
I've always wished for something like that for image generation AI. It'd be much cooler/more interesting to watch AI try to draw/paint pictures with strokes rather than just magically iterate into a fully-rendered image. I dunno what kind of dataset or architecture you could possibly apply to accomplish this, but it would be very interesting.
I get what you’re saying, but if you watch Stable Diffusion do each step it’s at least kind of similar. If you keep the same seed but change a detail, often the broad “strokes” are completely the same.
You could have AI do some postprocessing. I think a similaar approach is the future for image generation, you have a model output a 3D scene, use a classical raytracer to do rendering and then have a final model apply corrections to achieve photorealism.
Not trying to knock the progress here, impressive.
As a drummer, 'drum solo' is about as boring as it gets and some weird interspersing sounds. So, it depends on the intended audience.
FWIW the sound effects also are not 'realistic' to my ear, at the moment.
As a drummer, the 'drum solo` was surprisingly interesting to listen to, if you consider it happening over a stable 4/4 pulse. The random-but-not-quite nature of the part makes for very unconventional rhythmic patterns. I'd like to be able to syncopate like this on the spot.
Don't ask me to transcribe it.
Tempo consistency is great. Extraneous noises and random cymbal tails show the deficiency of the model though.
I agree. It's an impressive effort but it's still very far from being able to generate viable music/sound.
There are already millions of library music tracks and sound effects available which sound a lot better. It's going to take a huge investment in gen AI to compete with that and I don't think it makes economic sense (unlike text or images).
Yeah the drum solo really highlights how badly the model missed the point in a drum solo. I'm not a drummer, but this is just not pleasing to hear. Sounds like somebody randomly banging drums more or less in tempo.
It does okay with muzak-type things though, which I guess tracks with my expectations.
I find it interesting that they are releasing the code and lovely instructions for training, but no model. They are almost begging anonymous folks to hook the data loader up to an Apple Music account and go nuts. Not that I am suggesting anyone do that.
Speculatively it might have been part of an agreement with they were given the licensed stock audio library from AudioSparx to train on they wouldn't redistribute the resulting model.
I tried generating music on stableaudio.com and, yes, it's bad. However, given the blistering pace of developing in these models, I would not be surprised if these sound incredible in a year or two.
The plateau we're heading for is getting professional human level output from these models with logarithmic progress.
I suspect this is because the underlying production factors like compute, data & model design are steadily improving whilst humans have diminishing sensitivity to output quality.
In the game of AI generated photorealistic images or history essays there's not much improvement left to make. Most humans are already convinced by the output of these things.
I think the proof is seeing how good diffusion models have gotten for making images. They're not perfect but they're leaps and bounds over what we had just a year and a half ago.
Many of these problems seem to have been unexploited simply on basis of nobody throwing enough gpu clusters at it yet.
So there aren't public weights, is that right? Having trouble finding anything that says one way or the other.
edit: Oh okay, didn't realize this was somehow a controversial comment to make. It would have been great if you had answered the question before downvoting but that's fine I suppose.
Maybe sometimes you want an old cassette sound, or even older scratched 78 rpm sound, etc. Computers, as usual, do what you asked them to do, not what you meant.
"Gen AI is the only mass-adoption technology that claims it's Ok to exploit everyone's work without permission, payment, or bringing them any other benefit."
Is it? What about the printing press, photography, the copier, the scanner ...
Sure, if a commercial image is used in a commercial setting, there is a potential legal case that could argue about infringement. This should NOT depend on the production means, but on the merit of the comparisons of the produced images.
Xerox should not be sued because you can use a copier to copy a book (trust me kids, book copying used to be very, very big).
Art by its social nature is always derivative, I can use diffusion models to create uncontestably original imagery. I can also try to get them to generate something close to an image in the training set if the model was large enough compared to the training set or the work just realy formulaic. However. It would be far easier and more efficient to just Google the image in the first place and patch it up with some Photoshop if that was my goal.
But the social nature of art also means that humans give the originator and their influences credit - of course not the entire chain but at least the nearest neighbours of influence. While a user of a diffusion generator does not even know the influences unless specifically asked for.
> Art by its social nature is always derivative, I can use diffusion models to create uncontestably original imagery
How are you defining “uncontestably original” here?
The output could not exist if not for the training set used to train the model. While the process of deriving the end result is different than the one humans use when creating artwork, the end result is still derived from other works, and the degree of originality is a difference of degree, not of kind when compared to human output. (I acknowledge that the AI tool is enabled by a different process than the one humans use, but I’m not sure that a change in process changes the derivative nature of all subsequent output).
As a thought experiment, imagine that assuming we survive, after another million years of human evolution, our brains can process imagery at the scale of generative AI models, and can produce derivative output taking into account more influences than any human could even begin to approach with our 2024 brains.
Is the output no longer derivative?
Now consider the future human’s interpretation of the work vs. the 2024 human’s interpretation of the work. “I’ve never seen anything like this”, says the 2024 human. “The influences from 5 billion artists over time are clear in this piece” says the future human.
The fundamental question is: on what basis is the output of an AI model original? What are the criterion for originality?
Why are AI developers so goddamned keen on having it make art, one of the few kinds of work that human beings actually LIKE doing? We could use AI to be a CPA, or to write citations for a paper, but noooo, AI has to be a painter and a musician.
It's almost like the software developers are jealous that someone out there is having a good time and want to take it from them.
Also miss me with that 'AI enables me (a scrub) to make art I couldn't otherwise because I don't want to learn how to do it'. You are lazy. Congrats on finding a high horse about your laziness.
I think the development of Generative models for images and audio has more to do with the fact that Computer Vision research goes back decades, and the same systems that originally recognized and labeled images or audio were tweaked to invert the process - and it became naturally an intriguing topic of development precisely because creation is seen as an innately human thing. Beyond that, I'd speculate that the reason we keep seeing developments in "the arts" (though I disagree that an AI can make art, even if it can make beautiful images or music) is because there's no readily-agreed-upon value for that task.
An AI CPA has a specific economic value, but is also a commodity service that no one wants unless they need it. Since there's a clearly comparable cost for needed CPA services, then naturally creating an AI system to do it has a readily comparable market price. People aren't going to make that AI system unless they can do it in way that will make be an improvement as compared to that existing service and price.
I think "just because" has always been a justifiable reason for humans creating beauty (not the same as making art), so it works for research projects better than building a better mousetrap.
I'd bet the 'scrubs' making AI art are enjoying it so to twist your words why would you force them to do the do work they don't enjoy (learning to paint) to get the part they do. You obviously wouldn't decry a painter for not making their art by carving marble or the Mona Lisa for not being as big as The Creation of Adam (funilly enough the Mona Lisa took longer to paint). Though I do feel for the 'real artists' who probably aren't enjoying being forced by economic considerations to output what they view as crap quality using those tools.
Having said that I bet you're seeing many more developers creating AI art stuff because frankly there are many more developers who enjoy making art and being creative that than there are developers who enjoy creating AI CPA or AI citation stuff. So the getting-rid-of-unenjoyable-work-AI stuff is mainly being made by those seeking a profit and it's naturally much less open as they'll sell it as soloutions to those seeking it.
I assume you're just being tongue-in-cheekfully dramatic, but the answer of course, is that there are AIs for those things, but they're under much less demand and are much less controversial.
The overall audio quality sounds pretty good and it seems to do a good job of sustaining a consistent rhythm and musical concept. But I agree there's something "off" about some of the clips.
- The rave music sounds great. But that's because EDM can be quite out there in terms of musical construction.
- The guitar sounds weird because it doesn't sound like chords a human hand can make on a tuning nobody tunes their guitar to - with a strange mix of open and closed strings that don't make sense. I think the restrictions of what a guitar can do aren't well understood by the model.
- The disco chord progression is bizarre. It doesn't sound bad, but it's unlikely to be something somebody working in the genre would choose.
- meditation music - I mean, most of that genre may as well just be some randomized process
- drum solo - there's some weird issues in some of the drum sounds, things like cymbals, rides and hats changing tone in the middle of a note, some of the toms sound weird, it sounds like a mix of stick and brush and stick and stick and brush all at the same time...it's sort of the same problem the solo guitar has where it's just not produced within the constraints of what a drum player can actually do on an instrument made of actual drums
- sound effects, all are pretty good, a little chunky and low bit-rate or low sample-rate sounding, there's probably something going on in the network that's reducing the rate before it gets build back up. There's a constant sort of reverb in all of the examples
I honestly can't say I prefer their model over some of the musicgen output even if their model is doing a better job at following the prompts in some cases.
All of the models have a very low bitrate encoding problems and other weird anomalous things. Some of it reminds me of the output from older mp3 encoders, where hihats and such would get very "swishy" sounding. You can hear some of it in the autoencoder reconstructions, especially the trumpet and the last example.
However, in any case, I'm actually glad in some ways to see the progress being made in this area. It's really impressive. This was complete science fiction only a very few years ago.
> - drum solo - there's some weird issues in some of the drum sounds, things like cymbals, rides and hats changing tone in the middle of a note, some of the toms sound weird, it sounds like a mix of stick and brush and stick and stick and brush all at the same time...it's sort of the same problem the solo guitar has where it's just not produced within the constraints of what a drum player can actually do on an instrument made of actual drums
And I would say that there is also background noise from time to time, at some point I heard some noise akin to voices. Maybe it is some artifact caused by the training data (many drum solos are performed exclusively live).
Here is a silly song I generated using suno.ai, which I have found to be incredibly impressive (at least, a small percentage of its outputs are very good, most are bad). I think it's good enough that most humans wouldn't realise it's AI generated.
https://app.suno.ai/song/8a64868d-9dd3-46db-91af-f962d4bec8b...
Very good for my taste, but I should clarify, I'm obsessed with catchy tunes, as a listener and as a hobby musician, growing my own brainworms from time to time. And I must say that suno.ai is very impressive, in my case semi-ready brainworms are almost always in 30%-50% cases. And what's more important, it's really an inspiration tool for all kinds of tasks, like lyrics polishing or playing-along after track separation. Maybe catchy melodies are not for all, but who can argue with charts when The Beatles, ABBA and Queen were almost always producers of ones.
I generated the lyrics using ChatGPT 4 and the suno model attempts to follow them.
It generally does a good job, but I have noticed it's fairly common in a second chorus for it to ignore the direction and instead use the same lyrics as the first chorus
Wow. I’m guessing it’s generating MIDI or something rather than synthesizing audio from scratch? Even so, the quality of the score is leaps and bounds better than any of the long-form audio on the Stable Audio demo page (either Stable Audio itself or the other models). The audio model outputs seem to take a sequence of 1 to 3 chords, add a barebones melody on top, and basically loop this over and over. When they deviate from the pattern, it feels unplanned and chaotic and they often just snap back to the pattern without resolving the idea added by the deviation. (Either that or they completely change course and forget what they were doing before.) Yes, EDM in particular often has repetitive chord structures and basic melodies, but it’s not that repetitive. In comparison, from listening to a few suno.ai outputs, they reliably have complex melodies and reasonable chord progressions. They do tend to be repetitive and formulaic, but the repetition comes on a longer time scale and isn’t as boring. And they do sometimes get confused and randomly set off in a new direction, but not as often. Most of the time, the outputs sound like real songs. Which is not something I knew AI could do in 2024.
My understanding is that they use a side effect of the Bark model. The comment https://news.ycombinator.com/item?id=35647569 from JonathanFly probably explains it well. If you train your model on a massive amount of audio mixes of lyrics+music then prompting lyrics alone pulls the music with it as when the comment suggested that prompting context-correlated texts might pull the background noises usual for such context. Already while writing this I imagine training with a huge set of publicly performed poetry pieces that would allow generating novel performances of artificial poets with novel prompts. This is different to riffusion.com approach, where works the genius idea of more or less feeding spectrograms as images to Stable Diffusion.
I don't have any special insight into how it works, but I suspect it is largely synthesizing audio from scratch. The more I've thought about it, the task of generating music feels very similar to the task of text-to-speech with realistic intonation. So feels like the same techniques would be applicable.
> Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.
I've generated probably >200 songs now with Suno, of which perhaps 10 have been any good, and I can't detect any pattern in terms of the outputs.
Here's another one which is pretty good. I accidentally copied and pasted the prompt and lyrics, and it's amazing to me how 'musically' it renders the prompt:
One thing I noticed is that when it’s playing chords, it seems a lot more likely than human players to put both major and minor thirds in. This isn’t unheard of — the famous Hendrix chord in “Purple Haze” consists of root, major third, 7th, minor third. But it sounds pretty weird when you do it in every chord.
Music without changes is boring. I enjoyed the much less stable results of OpenAI's JuleBox (2021?) more than any music AI to come since. Their sound quality is better but they only seem to produce one monotonous texture at a time.
As a musician, I found the pieces unremarkable. Of course, a lot of contemporary music is forgettable as well, as people try to create songs that all sound like hits but, in doing so, create uninteresting songs. I wonder what music the model is based on. I suppose for game music/sounds, perhaps its good enough?
The problem with music generation is difficulty in editing. Photos and text can be easily edited, but music can't be. Either the piece needs to be MIDI, with relevant parameterisation of instruments, or a UI creating that allows segments of the audio to be reworked like in-painting.
What's the easiest way you found for using AI to edit photos? I was just yesterday looking at the openai dolly 3 API and it feels pretty limited. For instance, in the picture I have of a fisherman with too many fishing lines hanging down from his fishing rod, I'd like to just point it at the extra fishing lines and say make these go away, but there's no way to do that.
They publish the code to train on your own music, but not the weights of their model? So you cannot just upload this thing to some EC2 instance and start creating your own music, correct?
this sounds like progress, but it is still very bad except for highly repetitive music like the EDM examples they give, and even then, it still can't get tempo right
A small point: Needs to be in something other than 44.1kHz. The two to which they make comparisons are at either 32kHz or 48kHz, both of which are friendlier for video work, something for which I think AI audio will be used a lot.
The few examples I was able to play are very promising, unfortunately the host seems to be getting some sort of HN-hug, because all the audio files are buffering every other second -- they seem to throttle at 32 KiB/s.
obviously someone shadowy and non-corporate (eg. an artist) just needs to come out and make a model which includes promptable artist/producer/singer/instrumentalist/song metadata.
describing music without referring to musicians is so clunky because music is never labelled well.
of course saying "disco house with funk bass and soulful vocals, uplifting" is going to be bland.
Saying "disco house with nile rodgers rhythm guitar, michael mcdonald singing, and a bassline in the style of patrick alavi's power" is going to get you some magic
so this model can only ever understand music which is classified, described, labelled, standardized. and recombine those. sounds boring, sounds like the opposite of what (I would like to believe) people listen to music for, outside of a corporate stock audio context.
Serious question, I'd genuinely like to know - why?
You didn't license the images when training Stable Diffusion, and yet you did for Stable Audio? In both cases the training should either be fair use and legal without any licensing, or be infringing and need licensing. Why is audio different than images? Am I missing something here?
Reading about it, that ironically seems to be the exact problem Safari has. I mean the page "works" in Safari it's just you get these really random delays to the start of some of the sounds with all sorts of web discussion threads saying different ways to mitigate it on different platforms. I don't really fault them for having the goal to publish a paper and go the extra bit to make a friendly but imperfect webpage instead of being website creators who happen to publish papers on the side.
Music is perfect for AI generation using trained models, because artists have been copying each other for at least the past 100 years and having a computer do it for you is only notionally different. Sure a computer can never truly know your pain, but it can copy someone else's.
Now this is released and now I feel I got grist to my mill.
Sure it still kind of sucks, but it's very impressive for a _demo_. Remember that this tech is very much in it's infancy and it's very impressive already.
I don't find this music to be good in any way. It sounds interesting over a few notes, but then completely fails to find any kind of progression that goes anywhere interesting, never iterating on the theme, never teasing you with subtle or surprising variation over a core theme, no built-ups or clear resolution. Very annoying to actually listen to.
He’s since founded https://www.fairlytrained.org/
Reference: https://x.com/ednewtonrex