Hacker News new | past | comments | ask | show | jobs | submit login
AI-generated sad girl with piano performs the text of the MIT License (twitter.com/goodside)
748 points by amichail 26 days ago | hide | past | favorite | 273 comments



Ha. Voice synthesizers and TTS systems (and NLP in general - dead electronics imitating this very intimately human thing, speech and language) always fascinated me, so far that this was a significant reason for me to study CS and computational linguistics.

This is literally some of the impossible sci-fi tech I dreamt of as an undergrad. Crazy. I'm still a bit in disbelief how fast things currently move on this front.

Interestingly, suno.ai is also able to imitate the very robotic and staccato-like intonation of Vocaloids: https://app.suno.ai/song/f43e9c46-92d3-4171-bdd9-026213d6772... - everything comes around. :)


> https://app.suno.ai/song/f43e9c46-92d3-4171-bdd9-026213d6772...

Unironically very good. A convincing replica of Miku's voice. Plus the beat itself is great too. As another commenter put it, it is indeed, a banger.


That's shockingly good. It reminds me of a mix between Kaibutsu by YOASOBI and The Fox's Wedding by MASA. (Warning: both links are very anime. Nothing too bad)

Kaibutsu: https://www.youtube.com/watch?v=-5M4lbEpn6c

The Fox's Wedding: https://www.youtube.com/watch?v=khNi_6PnvaE


It reminded me of Bad Apple - I'm not really familiar with all of this weird, nerdy Japanese culture, but I agree it feels very enjoyable to listen to what Suno created here.


> Miku voice, speck fast, Vocaloid, math rock, j-pop, mutation funk, bounce drop, hyperspeed dubstep

What a banger.


Some strange and funny vocal aberrations here:

* sublicence - "sublissence"

* fitness - "fisted"

* infringement - "infring-ment"

* liable - "liar-ful"

It's also obviously not a pure human voice recording as the pitch transitions sound heavily auto-tuned or electrified (think Cher's "Believe").

I anticipate people becoming experts in detecting AI-generated vocalists in much the same way that we can currently detect AI-generated images due to abnormalities especially in details like ears or fingers.


And I also expect that, very soon, we won't be able to tell them apart anymore (like those wine experts that fail to detect the good wines if blindfolded).


The non-singing TTS are barely discernible now. I watch a lot of narration-heavy edu-tainment on YouTube and often the only way I can detect TTS is the consistent monotone and uniform syllable cadence. There can be 15 minutes before a single mispronounced word is spoken. That could be a preview of what's to come with AI video.


That fail to detect even whether they're having white wine or red wine.*


I've heard this, and I would have been inclined to believe it. But then I watched the documentary Somm about the journey of a couple of friends reaching for the highest rankings of sommeliers. They could identify grapes, regions and year with striking accuracy. I just don't see how you could do that and then not be able to tell white and red wine apart.


I barely know what I’m doing with wine but am 100% sure I could at least tell you which are whites and which reds if you lined up a typical Chardonnay, a typical Pinot Grigio, a typical cab sauv, and a typical Pinot noir.

I am certain there exist weird wines that could fool me (I’ve had a few really weird wines) but typical shit from the grocery store, I’m gonna be able to tell at least that much. I might even ID them more precisely than red or white. It’s not exactly subtle…

Then again I don’t have a clue how someone could fail to tell which is coke and which Pepsi in the “Pepsi challenge”. They’re wildly different flavors. I can tell by smell alone.


I vaguely remember looking into this before, and it turned out that the tasters were being told (incorrectly) that it was a red wine, and asked to describe the flavour profile. They then used tasting terms more frequently associated with reds than with whites, and didn't question what they were told.

So it's less a case of "they cannot distinguish red from white" and more a case of "they went along with a suggested classification". I feel like this is a weaker result, although it's still a little surprising.


My feeling is there is the high level classification which is quite difficult to fuck up. After that it’s all adjectives and analogues, which is the fluffed up phoniness that inherently presents itself in the process of converting our subjective experiences of physical reality into abstract symbols.


That sounds a lot weaker.

Quick, label all the US states: https://imgs.xkcd.com/comics/label_the_states.png

I've given this map to half a dozen smart/well educated Canadians, who happily engaged in pointing out the states they recognized for several minutes, and not one of them noticed until it was pointed out.


I'm from the UK and would probably have fallen for this for several minutes as well. I hope that I'd eventually realise from the number of states down the West coast.


For anyone else who is geographically challenged, the map apparently has 64 states instead of 50. Here is an article with the extra states highlighted.

https://www.explainxkcd.com/wiki/index.php/2868:_Label_the_S...


I suspect it would work nearly as well on many Americans.


What’s the joke? Looks normal. I see Thirdmont. Indiantwo. Yep, ordinary map.


So? That's focusing on what they know and not having time to notice the extra. Different from making incorrect statements


Yeah, but that still shows people's perception of wine is barely above noise level, if it can be so easily misled.

For comparison, imagine someone showing a piece of Picasso to art critics and saying "Could you please describe the artistic significance of this painting by da Vinci?" The critics won't start using terms commonly reserved for Renaissance era; they'll say "What the fuck are you talking about, this isn't da Vinci."


Both artists are dead. It is possible to learn all of their paintings. It is not possible to learn all of the wines.


Thanks. Together with GP's point about the possibility of weird wines, it seems reasonable that one could go along quite far on a false premise.


A lot of the biggest perceived differences come from temperature, since red wines are usually served at room-temperature. If you ever decide to do a blind test, make sure to control for temperature. I did it, and I had a very hard time picking out which varietals were red and which were white.


I rarely drink wine (less than 1x every few years) and I can tell the difference between a red wine and a white wine, and subcategories of red wines (and I do specifically mean the difference, so that means only when compared to another wine).

The hard part is identifying the type of wine, but many of my wine-drinking friends can do with ease. We've tried the "test," having me or someone else randomly purchase wines from the closest store and then serving random samples to them while they're blindfolded. They're able to identify the specific variety more than 4/5 of the time.


Yeah, I'm sure a lot of these tasters are overly pretentious. But some people are willing to go the opposite extreme and think people can't taste anything. Can anyone tell the difference between Coke and Sprite? Between Coke and Pepsi? Coke and Diet Coke? Of course we can. The difference between a typical pinot noir, syrah, or cabernet sauvignon is not something it takes magic powers to differentiate. Now specific years, wineries, etc, now that raises questions.


This one only shows you poor “expertise” more than anything, as it is standard exercise while training to become a wine expert in France (they also give students white wine that have been red-colored or otherwise tempered with), so I wouldn't expect any legit expert to be fooled this way. Though it's true that with some wines it can be tough initially for enlightened amateurs.

Source: my wife's godfather did the studies for that[1] two years ago.

[1]: https://www.isvv.u-bordeaux.fr/fr/diplome-universitaire-dapt...


This myth is based on a fundamental misunderstanding of the experiment conducted. The conclusion of the experiment was that the vocabulary used to describe wine is subjective, and that the chosen descriptors are most heavily influenced by the color of wine, the perceived cost of the wine, and the taster's opinion of whether it was a good wine or not. I've participated in a blind wine tasting, and it was trivially easy for even complete amateurs to guess the right color of wine 100% of the time.


I believed this myth until I actually tried it blind with a handful of wine novices, and every one could tell them apart.


As far as I can tell, AI image generation still struggles with some things after many years of research and is often detectable. Perhaps vocals is easier though.


It's like cgi, you only recognize bad examples of it while the good ones go right past you. I've got plenty of ai generations that fool professional photo retouchers - it just takes more time and some custom tooling.


> I've got plenty of ai generations that fool professional photo retouchers - it just takes more time and some custom tooling.

What’s a good place to find out the SOTA of the custom tooling and workflow?


Comfyui + civitai. 4chan and reddit threads if you want to go deep


> It's like cgi

Right. Full of code injection vulnerabilities.


"many years" lol, midjourney only came out like a year and a half ago and the quality has quadrupled in that time.


Generative text-to-image models based on neural networks have been developing since around 2015. Dall-E was the first to gain widespread attention in 2021. Then later models like Stable Diffusion and Midjourney.

"Quadrupled" is a very specific and quantitative word. What measure are you basing that on?


The recommended resolution went from 512x512 to 1024x1024 in that time span :)


Ah right. But that's only tangentially related to being able to distinguish AI-generated images. There are tells that are completely separate to resolution, such as getting the correct spacing of black keys on a piano.


From audio video editing experience years back, it is much easier to slip some cheap audio cuts past people than visual ones.


This already sounds like something I would have listened to in the 90s, except with too much autotune.


if by very soon you mean already then. yeah. i can't anyway, and i'm in the business.- js.

https://soundcloud.com/rs-539916550/soul-of-the-machine?utm_...


Many human vocalists have similar aberrations. Remember Jimi Hendrix "Excuse me while I kiss this guy", or the notorious autotune on a number of contemporary pop artists (you gave an example yourself)?

IMHO many of the successes of "artificial intelligence" come from "natural stupidity". Humans have many glitches in our perceptual mechanisms. The AIs that end up going viral and become commercially viable tend to exploit those perceptual glitches, simply because that's what makes them appeal to people.


The difference between this and Hendrix's "kiss this guy" is that you can listen to it and plausibly believe that Hendrix is actually saying "the sky". In the linked track you know the actual words but it still doesn't sound like them.


You can fix most misspoken words by tweaking the lyrics. e.g. in my most recent song it pronounced "pasting" from "copying and pasting" as "past-ing.

I just rewrote the lyrics as "paste-ing" and it sung it perfectly afterwards.


The average modern pop songs boasts a worse voice that sounds more autotuned than this AI song here. The biggest problem to me is how the voice appears to be shaky.


I'm going to go out on a limb and say you don't listen to much "modern pop" - the production quality of the biggest mainstream pop is _extremely_ high at this point, and while "worse voice" is obviously subjective, this really wouldn't be anything stand-out in that regard even if it didn't sound like a robot.


I mean, if you dislike the sound of an autotuned voice (which a lot of people do), then there are a lot of modern(ish?) pop artists that use it heavily who you're gonna hate: Travis Scott, Lizzo, Neo, Kesha, T-Pain, Doja Cat ...


I'd say that is more about hip-hop, really. (And a couple of names that aren't exactly the most relevant to this decade)


> I anticipate people becoming experts in detecting AI-generated vocalists in much the same way that we can currently detect AI-generated images due to abnormalities especially in details like ears or fingers.

People fail to identify even the most basic and obvious fakes, but somehow there’s a group of people who think that as fakes become harder to distinguish from reality, we’ll all magically become experts at it. We won’t. People’s ability to detect fakes will get worse, not better, as a consequence of more prevalent and better fakes.

https://www.youtube.com/watch?v=1U1HMqtam90


I'm always surprised by this idea too. I can easily see an outcome of it becoming increasingly hard to convince people that real things are actually real.


Funny video :)

I didn't mean to imply that everyone will become experts; I meant that some people will become very good at distinguishing AI vocals.


> I meant that some people will become very good at distinguishing AI vocals

Until it becomes impossible which seems to be the trajectory here, no?


> It's also obviously not a pure human voice recording as the pitch transitions sound heavily auto-tuned or electrified ...

    The cake is a lie, but the music is real

    It’s all fake when the truth is revealed
> (think Cher's "Believe")

Or think GLaDOS. Pretty sure that's not a coincidence.


And I'm sure a skilled editor could already edit out those tells.


At that point, is it AI generated?? That feels like an entirely different category to me

(like it's sort of no difference than paying someone to voice something and share it)

I think the stuff that is completely generated with no human in the loop is a different category for me because it can be used for things at scale like, bots on social media, or ads in a podcast generated just for you, etc. As long as there is still a human in the loop making the editing decisions, it feels not categorically different from the world we have today.


That's a fair point, but "ai does the work and humans clean up the mistakes" is generally a lot faster than humans doing all the work. Singing well takes skill (even when you have autotune), splicing together multiple "takes" into one good recording, less so.


I'm never going to have the voice to sing this, but I can easily imagine learning how to edit it.

AI/Human combos can still be valuable. More broadly I'd argue that that's how almost all tech works. E.g. there are still textile workers, just many less of them producing much more clothing.


I would say it's still categorically different, just because we're automating one piece of labor that was kind of thought until about ~12 years ago to be un-automatible.

Like, there's been computer-singing voices for awhile, but they always sounded pretty robotic and goofy (e.g. Microsoft Sam), and I think for a long time people just assumed that to get mostly-realistic voices, you need an actual singer. Yes, it still requires a bit of human tweaking to make it perfect, but I suspect that if put to the test it would reduce the cost of making a song substantially.


The delivery of "(the 'software')" at 0:21 had me chuckling.


I was very impressed by the mini bridge and then sudden addition of harmony between the license and the ALL CAPS statement part. Is that all AI deciding that? This made it a true song in my opinion.


Mer chan ta billlll iiii tttyyyyyy


Agreed! Unexpected and made my morning.


I hoped for a choir when I saw the al caps section coming, and I was not disappointed.


My mind hasn’t been this blown by AI since GPT4. You owe it to yourself to check out Suno.ai. As a non-pro musician I’m excited by this. Some version of this could become a _starting point_ for me, rather than an unreachable end goal. I can see how pros would be horrified by this, too. For quite a few people some future version of this could be an adequate replacement for a music subscription, but of course not for a show.


Does it take a long time for it to generate a song? I've been waiting for about 10 minutes now with a spinning circle line.


With the pro subscription it usually takes less than thirty seconds for the songs to be playable. It keeps generating while you play though, so the whole audio file isn’t available for a few minutes.

Free accounts are queued so it depends on load and I don't think the v3 model is available to them.


The v3 model is now available to free accounts. Normally takes me a minute or so at most on a free account to generate a song. Though, as you said, it does seem to depend on load.


I was thinking it would impact places like bars and streams and tv the most rather than actual consumers, or wherever licensing is concerned. I don't believe people would listen to AI generated music for the same reason AI isn't impacting fine art. People aren't going to hang AI paintings in their houses or listen to AI music.


> People aren't going to hang AI paintings in their houses or listen to AI music.

A lot of people are very confident about this and I dont understand why. The same was said for jazz and comic books. But I am listening to jazz with comic book posters on my wall. There were different reasons to give the same statement, but it almost always turns out to be wrong. Humans like what they like and seldom judge an artwork for its process (outside of a very small niche community).


People hang commercial art in their homes all the time, and that's essentially design-by-committee.

I see no reason that someone wouldn't hang an AI poster print in their AirBnB that's abstract, or even a "commission" based on the property and its locale.

Same reason for AI music. Your store needs a bit of music to set a vibe, but with AI it can be free of copyright and performance licensing, and again, be tailored to your location, products, clientele, the day of the week, etc.


This is something different entirely. We're outside of the "human sphere" so to speak.

>Humans like what they like and seldom judge an artwork for its process (outside of a very small niche community).

That's true, but how do you zoom out of process? This is beyond process. I would just say most people don't like inhuman things.


> We're outside of the "human sphere" so to speak.

Can you elaborate on this a bit? Because this is what I don't get.


It's a non-human algorithmic mish mash of a bunch of stuff, there is no human quality to it or years of effort to reach new heights. AI will not make "new" music in the sense that it will make a trumpet song that escapes our current understanding of a trumpets limit like how a once in a generation player will come along and move the ceiling up.

It's an omellete. There is no Dolly Parton behind an AI Jolene or a Michael Jackson turning a 4 track tape into a musical masterpiece. The journey and personalities are what contextualize the sound, without AI that context is gone. That's why I think it will just be used for cafes and things like that where they want to escape licensing fees.

As for consumers - I believe people will see AI music consumption as a way of supporting the new technological powers that be, and the act of listening to human-made music will have an element of counter-culture baked into it. I'm a professional musician and I have a very physical reaction to sound. Once I know it's AI my goosebumps fade.

Another lame incarnation of a tech that will also fade like crypto and everything else. The types of personalities who will leverage this tech are not the same personalities that make the greats.

I'm not worried.


In this post, I can summarize two points you are trying to make. One, it takes less effort, and two it doesn't fit into our current understanding of how art creation narratives work. I don't see how that precludes a piece from being good/bad. I feel like you are arguing for your personal opinion (if not your image of what the world should be) as if it's some kind of objective truth. Your goosebumps might have faded but when I heard this post in a half sleepy state, I got goosebumps when my sleepy mind figured out its fully AI generated. But that doesn't add to the argument either way.


I love art made by non-human intelligences. I especially love how it can transcend and redefine loved mediums by combining them in surreal ways that are otherwise quite difficult to obtain. Algorithmic exploration of mediums outpaces mere mortal "effort" in its efficiency and in doing so raises the bar for what constitutes media worth giving our attention to.


AI Art is second-order human art. From this viewpoint it's still human by proxy.

And anyway, is it measurably different from art produced while tripping on LSD or in similar states of altered consciousness, such as schizophrenia, dementia, or even depression, which often produce things many people would not describe as regular?


I see this take often, but I don't buy it. Mixtapes and playlists are quintessential gifts of affection based on art that the giver did not make and by artists the receiver often does not know. Just the same lots of people hang costco paintings on their walls by anonymous sweatshop workers and kids love cool posters about whatever interests them with no regard to who made them. I believe consumers are likely to enjoy lots of this generated art.


> I don't believe people would listen to AI generated music

Counter-point, we've been listening to a rock song about the moon given the words from one of my kids books all morning.

People will 100% listen to (edit - I never finished the sentence) things that sound fun. It might not bring me to tears or stop me in my tracks but lots of things are just fun.

> People aren't going to hang AI paintings in their houses

People will absolutely do this. If AI systems can make nice pictures people will hang them in their houses. And they can make nice pictures.


Why not? Have you seen the top 10? It couldn’t be any worse than what it is now. People who reach the top 10 are rarely there for the “art”. A lot of them don’t even write their own songs or music.


All music at this point is largely ambient music and Muzak.

The future is obviously a form of custom AI Muzak/Ambient music with a few pop stars for people to focus on.

I am a big fan of more art type music and guess what? No one listens to it. My fav album of 2023 has 6.4k views on youtube. At least a 100 of those are mine. No one listens to this stuff. People watch video critic reviews of more art type music than the actual music itself.


Hey curious you're opinion on a track like this.

https://app.suno.ai/song/2a5a9327-5b27-4353-b62b-8eb3e314fff...

I normally do noisey acoustic stream on consious stuff but I've been too busy for the past two years to get anything out.

This was the first track in a while I've been happy with the lyrics are very real to me and it took a couple hours to learn the workflow and I still haven't went back and fixed edited the ending as well as adding overdubs throughout as well as a real guitar solo.

I can post my youtube for more context but I didn't feel right posting it without someone requesting.

I'm just very interested in new ways to get my art out. And this is more of a transformer of my poetry into real listenable music.

I'm very excited.


Lol, same. A lot of the stuff I listen to is completely unknown to a “normal” person. And guess what? AI is not replacing those folks for their audiences in the foreseeable future, because they don’t just regurgitate the same chord progression as everyone else


Honestly I don't even know what the "top 10" is or how its measured and have never met anyone in my life who listened to top 10 stuff. It's always HR office radio, mechanic radio, the bar, club, etc. Even the most normal people find stuff they like on youtube and listen to that.

Even if the AI music is extremely good, it's just missing the fact that it was made by a person, which changes the experience entirely. I think we're more likely to see musicians and those top10 artists leverage AI without explicitly saying so.

I expect we will have a daft punk moment where someone is using exclusively AI and later unmasks that it was all AI, and as soon as that happens the music is disconnected.

Same with AI art. I can see something and be duped and go "oh wow!!!" and as soon as I know it's AI the caring leaves my body completely and reverence and interest is lost.


I love this sentiment about "top 10" radio. If only it was so. That's the stuff that's on everywhere, all the time. Grocery stores, cafes, etc etc. hell, I listen to it on YouTube. It's like junk food. It's bad, it's good.

It's better than AI, even this incredible mindblowing suno thing. Production value counts.


Quality isn’t the only factor though. Music made by people has copyright which means grocery stores and coffee shops have to pay a license fee.

There’s certainly a point where this synthetic music gets good enough to replace the elevator music Muzak crap that they have to pay $2000 to license.


Will it be better than AI six months from now, that’s the question. My money is on “no”.


At least the classical world is safe. They want no part in AI.


I think it's a lot better at classical, orchestral, and instrumental music than it is at anything requiring vocalization. I created this in less than 20 minutes: https://app.suno.ai/song/eb93c25b-bdbe-4c9f-8e03-66e9479c869...

I need to stem it, fix it up a bit, and remix for stereo in a DAW but it's much better than I expected for my first ever piece of music. Obviously it'd take a lot of work to create a Hans Zimmer level OST from the tool but IMO it wouldn't feel out of place on a Ludovico Einaudi album or on some Spotify or Pandora classical radio.


That's actually a very good piece. Like something I'd hear on late night Paradise Radio. If I was creating an indie movie on no budget I'd be all over this technology for the soundtrack.

I don't think musicians and composers are going to disappear as a consequence of this technology, in the same way that theatre actors were not made obsolete by film. What I do think is that a whole new category of professionals will be created - musicians and composers who get paid to train AI models. I bet it will pay better then the laughable amounts that are streaming royalties.


At some point in the future, wanting no part in AI-generated content is going to be like that old Onion headline. "Area Man Constantly Mentioning That He Doesn't Even Own A Television".


Someone is writing it. There are a lot more than 10 people that want to be in the top 10. It's hard to get into the top 10. You might not appreciate it as art, but the songs that are there are good at something. You could call it being catchy. AI is not even close on this metric.


Its not even close now. And these things have been out maybe a few months? Of course even the potential of the current tools aren't fully explored.


I think it will get there, wherever "there" is. I think it's very impressive now, as a technical marvel. But it's really not competing with the best humans yet. I don't say this to dismiss it. I say this as an appreciator of music who is neutral on AI. Probably one day I'll listen to mostly AI generated music. But it won't be this month.


People are buying art in IKEA, of course they'll hang an AI painting as well.


I don't believe people would listen to AI generated music for the same reason AI isn't impacting fine art.

Pretty soon we'll be reviving the old Palmolive "You're soaking in it" commercial (https://www.youtube.com/watch?v=_bEkq7JCbik).

We'll all be soaking in it, and no, you won't be able to tell the difference.


Here is my effort

https://app.suno.ai/song/13cffa0c-bbd5-41b6-abde-43332b21b0f...

I took the litany of fear from Dune and got Bing Chat to re-write it to be about facing down code complexity, then I put those lyrics into suno.ai to turn it into a 2 minute song to express all your emotions about code that needs to be simplified ;-)


I just had it make a rap song of its own ToS

https://app.suno.ai/song/7995f966-6265-4b34-a68e-400981f5931...


Asked Claude Opus to minimally modify the lyrics to add rhymes

https://app.suno.ai/song/40d0fb88-246b-42f9-8998-0387e75262e...


The last ~45 seconds after the end of the rap (with the chorus and the hallucinated repeat of the bridge) was a nice surprise.


Lol, nice!

When they allowed longer text inputs, and for faster rapping, I can really see this kinda thing taking off with L1s and med students.

Like the Animaniacs song about the state capitols.

Or like a Homeric epic that is meant for remembering and singing.

The method of loci may have a new competitor as a way to remember things here.


One of the things that came to mind when I was listening to the ToS song it generated was a video I had watched years ago on the very dry topic of Rule 803 - Hearsay Exceptions but it was put to a catchy tune and made it very memorable and easier to digest. (https://www.youtube.com/watch?v=UoJ6fgIKYy8)


I think it might be more memorable when the med students do the writing ... and singing.

https://www.youtube.com/watch?v=GVxJJ2DBPiQ

    Diagnosis, Wenckebach *(what?)*
    It's AV nodal block and that's a fact *(yeah)*
    Take PR interval and lengthen that *(yeah)*
    bradyarrhythmia and heart attack *(oh-no!)*
AI songs do make sense if AI will be making the diagnosis!


This thing can generate lyrics and music for a hindi song. Its way better than I expected. Here's a song about wrestlemania.

https://app.suno.ai/song/018ca476-803b-4a45-81ed-c7263e08ef3...


I chuckled at gone "live", but otherwise that was pretty good code poetry. Thanks for that.


The girl is sad because we don’t know the name of the people/artists on which the music and her voice is modelled.


In the great web tradition of harvesting the vast body of other people's work in the large[1] and shoving it through huge amounts of computation to wring out a nickel's worth of value that will eventually manifest in some good-paying SWE jobs, a rich executive class, and a whole lot of shareholder value and inevitably mutate in another goddamn ad-serving platform.

[1] Ha, the poor millions of dumb minions who put their work on the web thinking it might be fun for others or garner themselves a small following, they didn't check the terms of the EULA!


These kinds of discussions always leaves me wondering if people consider how actual humans learn their craft, constantly studying and mimicking others. Inspiration is to use existing experiences however mixed together, while originality comes from an input or an experience that others have yet to use.

"Write a sad song about the MIT license" is certainly such new input, and if I was commissioned to write the song it would be based on inspiration (i.e., "use training on") music I have heard or studied. And yes, none of the musicians I have listened to or have studied will benefit from the endless money fountain I'd acquire from composing such song.


In the case of a human studying, a person puts in effort and gets rewarded for their efforts.

In the case of AI, a person puts in minimal effort to generate something that devalues the work of all the people who did put in effort.


> In the case of a human studying, a person puts in effort and gets rewarded for their efforts.

When someone needs something composed, they don't learn how to write music. They pay someone else the bare minimum, e.g. a few bucks on fiverr. The person will spend the least possible amount of effort to try to make their life go around with the little money they got.

When you then use an AI model, the work done for those five bucks is replaced by work done for almost free.

Neither the person you would hire or the AI credited those who created the material they trained on.


> When you then use an AI model, the work done for those five bucks is replaced by work done for almost free.

In other words, you pay a few cents to big tech for a generator that only exists thanks to work real composers, singers etc who now get the grand total of 0$.


As opposed to paying $5 to a singular composer that only exists because of other real composers, singers, etc., as they studied the craft. On the other hand, the easy access may cause new types of artists to appear by lowering the bar of entry, or just make custom music more generally available and more widely used as the price makes it a commodity.

We also stopped hiring computers (the occupation) and instead pay big tech companies which made computers (the device) available to everyone. And we stopped hiring people to do dangerous manual labor as companies started selling machinery to automate it. Markets change.


> As opposed to paying $5 to a singular composer that only exists because of other real composers, singers, etc., as they studied the craft

Yes.

> computers

> dangerous manual labor

If people are not in danger and don't have to do mechanical work, it's one thing. If composers stop composing original music that was used for training current AI because they don't get paid anymore then the field stagnates. Same for writing and everything else


People doing mechanical work made art through physical objects. Woodwork, pottery, glassware, you name it.

There are now far more options, both on the high and low-end, with the whole area being more affordable. The quality of most products also arguably went up, as factories beat handmade goods. And yet, if you want custom artisan goods, you can still pay a woodworker for it at more or less the same cost as you would have otherwise, as their labor costs are a function of time required and local living conditions.

In some cases, those workers were the ones to automate, benefiting from the assistance - woodworkers using CNC mills and laser cutters even for handmade goods, or composers themselves can use the AI - to speed up their otherwise fully manual work. It benefits the majority creating the demand, and tends to improve the craft overall.

> ... then the field stagnates.

A market that is not changing has already stagnated.


Dangerous manual labour like mining or tunnel building. In woodworking the art is art but if you use a machine to copy my original design then it's the same old theft again. :)

> A market that is not changing has already stagnated.

Yep. The way art is changing is thanks to original work and no one will be making it since anything you make gets stolen for free


> When someone needs something composed, they don’t learn to write music…

Speak for yourself! There is only one thing that scares me more than composing music, and that’s paying somebody a few bucks in fiverr to do it for me.


Despite your personal fears I believe I spoke for the vast majority of cases rather than just for myself.

Although I suppose royalty-free stock music is the norm nowadays for most commercial uses, which takes it a step further, anonymizing the composer entirely...


> Although I suppose royalty-free stock music is the norm nowadays for most commercial uses, which takes it a step further, anonymizing the composer entirely...

That's by choice though?


By the composer, yes, but the composer here is the AI. In neither case did the musicians that the composer studied/trained on get asked.

And that's the point: The difference is the replacement of 1 flesh-and-blood composer with 1 virtual composer, with the consequence being the lost business of the former. The artists studied were never part of the transaction in either case.

Now, the long-term consequences for artists - e.g., reduced supply on the low end as they're out-competed - is harder to guess, but that's just market dynamics. It may very well increase supply as composition becomes more available, diversifying by allowing people with other skills or creative treats to create music that previously could not - even if the musical part is done by AI.


I meant, royalty free music is released/licensed by artists who get paid or are OK not being paid

but with AI whatever consequences there may be their work is highjacked/stolen.


The AI "stole" it's training data the same way that those fleshy composers "stole" their training data.


Not AI, people who trained AI and who use it for profit

You can learn yourself but if you use an automatic tool to bypass and automatically make similar works and compete with original authors then you're IP thief


> In the case of AI, a person puts in minimal effort to generate something that devalues the work of all the people who did put in effort.

Worded differently: people who couldn't otherwise produce skill-based works of value have had the barrier of entry lowered for that specific medium of expression, allowing for more works across a wider spectrum of skill.


It’s so bizarre when people say stuff like this. There is absolutely nothing preventing the unpracticed or untalented people from any form of creative expression. What instead people who use AI seem to want is for unpracticed or untalented people to perform at the level of the practiced and talented, but this is no net gain to anyone. Why? Because only a rare subset of people who ARE practiced and talented create anything of interest or value in the first place. What this tells you is that skill or level of performance is not the barrier, but a means through which great things CAN be achieved (i.e. necessary, but not sufficient)

Flooding the world with unpolished, unpracticed works, AI-tuned to the level of being mediocre, is a creative and intellectual dead end.


> for unpracticed or untalented people to perform at the level of the practiced and talented

This is what tools are.

Cheap digital tablets have done away with the need for expensive consumables. You can just download a different brush style instead of learning a physical technique. No waiting for paint to dry or smudged pencils. The barrier to entry for painting has dropped to a one time investment of like a hundred bucks. Almost nobody mixes their own paint, nor stretches their own canvas. Those skills aren't needed anymore.

It's possible to build very precise machine parts by hand. It's very difficult and requires great skill, so nobody does that. Some do and are admired for it, but everybody else uses precise machines to make precise parts with nearly no effort.

It's just a tool. Only difference is that we had assumed art would never be automatable.

Objectively, I don't think this is a bad thing. It doesn't change the subjective value of art any more than the average cartoonist devalues the Mona Lisa. It's just a new form of art, there will always be people mixing their own paints and stretching their own canvas, just as there always has been.

It's only a problem because in our society you either have a job or you starve. No one can afford to be an artist. Those that do tend to grind out as many pieces as fast as they can so they can pay the goddamn rent. If not for that, these AI tools would be pretty cool.


I think the bizarrity arises from the following differences in beliefs:

* That "_any_ form of creative expression" is a viable creative substitute for people wanting to create in a _specific_ medium of creative expression -- especially those that had a high barrier of technical skills required to be seen as "good enough" to share.

* That a person who has an idea for art will put in the necessary time to become proficient enough to create that "good enough" art through traditional means (IMO demonstrably incorrect), and that is preferred over that person just not expressing a lower-quality version of that idea at all.

* That those who use AI primarily want or expect to "perform at the level of the practiced and talented" (i.e. top-tier art) rather than using it to produce art they otherwise couldn't have, even at low- and mid-level qualities.

* That there is no skill or talent in using AI tools to produce art (or that the skill or talent using AI tools is meant to be a full replacement for traditional artistic skills or talents).

FWIW, I'm a long-time sketch artist and acrylics painter (~20 years). There are many mediums, subjects, and styles that I'm not good at -- and I enjoy using AI to express myself in those areas (and have also liked using AI to create songs to show to my more musicially-adept wife...). But even in my own wheelhouse (landscapes and still life), I also often use AI to brainstorm composition, perspective, colors, textures, lighting, etc. It's a great tool for experts to lean on, but an even better tool for non-artists who couldn't or wouldn't otherwise share their art.


Indeed. As an amateur guitarist, but a professional virtual machinist, I have a ton of respect for people who have dedicated their whole lives to mastery in any one particular area. To have a machine gulp down untold eons of human exertion and then barf out soulless mimicry, no matter how jaw-dropping of a feat of engineering behind it, and then mint no-talent ass clowns by the million because viral videos make an awesome advertising platform--it's just some kind of dystopian peak tech, except the dystopia is mildly amusing rather than a disappointing and jarring marginalization, flippant dismissal of all of us.


This feels like weird gatekeeping.

Why is this the line? Where are the complaints about people using pianos to achieve rather precise notes instead of using their own voices? They are just untalented at singing and their use of any tool to create sound is of no net gain to anyone.

This person: https://www.youtube.com/watch?v=IbUE-LxhUR8 ? They're recording and playing back on a loop! They should record full repeated playings, any use of the recording is of no net gain because it could be achieved otherwise.

Songwriters? If they write lyrics and someone else sings them the result should be cast into the sea - it's of no net gain to anyone because they did not create the sounds themselves.

Composers? Frankly pointless.

> Flooding the world with unpolished, unpracticed works

I hate to break it to you but there are a vast number of terrible works of art out there already.

> What this tells you is that skill or level of performance is not the barrier, but a means through which great things CAN be achieved (i.e. necessary, but not sufficient)

If it's a necessary thing, of course it's a barrier. That there are two barriers doesn't change that.


Stable Diffusion did cost 500k to train ... I wouldn't call that "minimal effort". (And that is only the computation cost.)


Even the most derivative of singer songwriters tend to use their own voices rather than a weighted average of the voices of other singers in their genre...


Is that why so many people sound so much like Adele or some other popular artist?


Using the skills they presumably developed listening to and copying other singers and studying music, with an instrument built from roughly the same instructions as everyone else.

That a person can't sound like the weighted average is human limitation (although with modern pop people do get quite close!), not because new singers aren't trying to. That of course adds variation that we appreciate, but doesn't change the underlying similarity in how acquired skill is mimicry of those who acquired it before us - with very rare exceptions.


No, sounding like the genre-weighted average of Spotify simply isn't what singers try to do. They haven't listened to that much music, they have actual preferences, they have natural qualities to their voice which they're complimented on or asked to mask, and they're trying to hit notes based on their aural perception of harmony and related theoretical principles not based on the waveforms of other songs involving singer songwriters. The fact that they literally couldn't do what NNs do even if they wanted to also seems quite relevant to the fact that they don't do what NNs do.

What next, are we going to argue that what programmers creating new programs are really trying to do is generate a prompt-weighted average of the bytecode of every program they've ever downloaded, and all that business analysis and functional spec and use of high level programming languages and expressed preferences for coding standards is irrelevant?


> they have actual preferences

That's just a bias.

> natural qualities to their voice

That's the physical limitations I referred to, which isn't something humans tend to be happy about but can sometimes end up being a differentiating benefit.

> What next, are we going to argue that what programmers creating new programs are really trying to do is generate a prompt-weighted average of the bytecode of every program they've ever downloaded

That's a horrible strawman. Do you as programmer often read and write bytecode directly?


> That's just a bias.

I'm beginning to assume you're an LLM, because I'm not convinced a human would honestly try to argue that their emotional reaction to their favourite songs is basically equivalent to flipping the values of some bits to ensure that they generate music more similarly to them.

> That's a horrible strawman. Do you as programmer often read and write bytecode directly?

As an improvising guitarist (even a very mediocre one) my creative process is even further removed from an LLM parsing and generating sound files directly....


I suspect the issue here is just the assumption that LLMs are "just flipping some bits", while simultaneously putting humanity on some unreachable pedestal.

We are all nothing but a horde of molecular machines. Your "you" is just individual neurons reacting to input in accordance to their current chemical state and active connections. All your experiences, unique personality treats, and creativity you add to the process is solely the result of the current state of your network of neurons in response to a particular input.

But while an LLM is trained once and then has its state fixed in place regardless of input, we "train" continously, and while an LLM might have experience of an inhuman corpus for a certain subject, we have many "irrelevant" experiences to mix things up.

Your "prompt" is also messy, including the current sound of your own heartbeat, the remaining taste in your mouth from your last meal, the feeling of a breeze through your hair as it tickles your neck, while the LLM has just one, maybe two half-assed sentences. This mix of messy experiences and noisy input fuels "creativity". You don't think "I need to copy XYZ", but neither does the AI. You both just react.

In some regards our chaos is better, in others it is worse. But while the machinery of an LLM still does not even remotely approach a brain, we should not forget that we are nothing but more a cluster of small machines, assembled from roughly 750 MB worth of blueprint.


I wonder if this won't drive a resurgence of demand for live performances - as recording becomes more and more artificial, live performance will mean more. (Or maybe, as a live performer, I'm just wishful thinking here...)


Generally speaking, people create internet content so that it is shared.

All of the creators and subjects of meme formats... Should they receive royalty every time you post some inane mashup?


This is not that. We're not talking about some inane mashup, but a wholesale digestion of every creative thing any person ever did by a monster computer cluster whose scale dwarfs imagination, which then promptly uses it to maximize "engagement" to gather eyeballs to feed them advertising. It's profoundly messed up.


The cost of that computer cluster must also dwarf imagination.

I don't begrudge crypto miners either.


I wasn't aware of a right to recoup the costs of any bad idea, which seems to be what you're implying here. Because computers, therefore profit? Huh?


The earlier comment was "vast work", so the size of effort is somehow relevant to the discussion.


It isn't. If a serial killer spent a week digging mass graves by hand, they don't get years taken off their sentence. You don't get points just for working hard or spending money, particularly when it cheapens or just appropriates other people's work.


> which then promptly uses it to maximize "engagement" to gather eyeballs to feed them advertising.

This is the real problem, right? People don't dislike generative AI, they dislike the attention economy. Yet I see more disgust towards AI than the company policies which suck. I don't understand why.


I think it is more that art, film and music have largely been replaced with complaining online about various subjects as the major form of entertainment in America.


Oh, haha, yeah. I guess I'm the opposite--I actually like AI more than the attention economy! At least one of them is not actively trying to drain my brainpower and skill set and get my to buy stuff and do stuff I wouldn't otherwise buy or do.


yet


People also differentiate heavily on the basis of scale and profit. Artists are often fine with people sharing their posts and may even tolerate someone asking for permission to make printouts or whatever else for their circle of friends, but will expect some sort of royalty if you're asking to be able to sell prints of their artwork on a store.

Hell, even with viral videos it's relatively common that normal people can share away while entertainment companies and influencers are expected to pay for a license.

With memes it isn't clear exactly who made the first template, and the creation of them doesn't revolve around specific people in the same way, nor are they meaningfully tied to profits.

When creators post their content online to be shared, they do it with the focus being on reaching individuals, not for it to be sucked up by soulless companies to extract all value without the intention of giving back.


> With memes it isn't clear exactly who made the first template.

The Office, The Matrix, Lord of the Rings, Django Unchained, Game of Thrones, etc

These works have identifiable creators.


The conversation is quickly devolving into a vacuum of ignorance where things like royalties, fair use policies, revenue-sharing agreements, parodies, sampling, etc, have apparently never been thought about.

We're not talking about any of those things. We're talking about wholesale digestion of the entirety of human knowledge by automated means, which is now not just theoretically possible, but routine.


Those aren't meme formats in terms of what is typically meant by meme.


> eventually manifest in some good-paying SWE jobs

Unless Devin has his way.


We don't know the names of all the people on which the style and content of your comment is modelled either.


That's correct, but they are (probably) human, which is pivotal to the application of copyright law.


She's sad because she knows the license will be changed to business non compete one in a year.


Just like we're all sad because we don't know the names of the people whose work or interactions influenced Stephen King's writing.


AI isn't influenced. It doesn't have restrictions. It doesn't have to work within confines. AI can always remember the word it wants to use. It always can hit the note it intends. And it can hit every note. Etc. It uses the corpus of training data and mashes it into a new form.

Stephen King won't be able to remember every word of every story he's ever read. And if he wants to make something "Lovecraftian", it'll be what Stephen King thinks is Lovecraftian. And there will be something to that. Some bit he believes is more or less important than other people And those bits are what makes Stephen King, Stephen King.

Everyone has had access to the same material King read. Access to the same tools he used to create. Everyone had the chance to effectively be Stephen King. But there is just one. Because there is some unique bit of observation or recall or combination of such things that is unique to King.

And from what I've seen so far, these LLMs can't do that. There is a missing element of pure imagination.


You can tune AI output.


But you can't make it creative. You can't say "give me something cool" and have it produce something of note.


Yes, you can absolutely play god of the gaps.


How am I doing that? I am claiming that LLMs lack imagination. They are incapable of creating out of whole cloth or interpretation.

Saying they cannot create based off of a vague suggestion is very much in line with that claim. I consider it a vital difference between Stephen King being inspired and LLMs mashing training inputs together.


I wonder who was the first to claim this was plagiarism; ironically, everyone else seems to have mindlessly plagiarised their belief


95% of beliefs are shamelessly plagiarized from someone else.


The funny thing is that most creatives are quite open about their influences.


They wouldn't be if every named influence wanted a 5% cut of all future projects.


Is it Hatsune Miku? Twitter is glitching out again, so I can't hear.


No, it's a synthetic voice from suno.ai, sounds like a (very sad) American singer-songwriter.


I suppose the focus was on voice synthesis here. I won't add anything about it since other commenters have already said significant things about this wonderful feat.

Musically, however, I can't help but notice that these models are still very far from being able to generate something interesting: from harmony, to tempo, to musical structure, to dynamics, everything is muddled and without structure. I guess there is still very much to work on, and I am not sure that purely generative models can attain higher levels. Maybe a mixed rule-based and generative approach would do?

The progress is really fast in this field, I really do not know.


I think historically every time someone says that the solution to an ai problem is more structure, the truth turns out to be an issue mostly of data and scale


That's probably true. Maybe there is a point to trade computational/energetic efficiency for attainability of a result. Let's see how this unfolds.


To my knowledge, the model being used for this is "chirp" which is 'based on' bark[1], an AI text to speech model.

The github page for bark links to a page about chirp, which returns a 404 page for me [2]. My guess is that the model used for suno.ai's song generator isn't too much different than the text to speech model.

I also have a hunch is that it was more like a coincidence than intentional that the bark model was capable of producing music, and that was spun off into this product.

Unfortunately, there seems to still be issues with bark when generating long (like book length) spoken audio. Which is too bad, as someone who's worked jobs that require lots of driving, it would be awesome to be able to have any text read to me in a natural sounding voice.

[1]https://github.com/suno-ai/bark [2] https://www.suno.ai/examples/chirp-v1


What structure and tempo can you realistically give to the MIT license?


A lot more than this.

Elton John improvising on an oven manual is the high bar in my opinion. https://m.youtube.com/watch?v=8GuI4UUZrmw


I'll try to give a serious answer, even if I suppose yours was a nice joke :)

Music is a language, even if with no semantic. It has conventions, dialects, a syntax, a grammar. There are multiple dimensions a musician uses to convey what he wants/feels: just like an actor has to control at the same time its voice, posture, interplay with other actors, so a good musician is aware of the structure of the piece he is composing/executing, the relations between the various subparts, how the musical discourse progresses in time, besides agogic, dynamics, sound color.

All of those aspects are continually perpetually compared against the conventions of the genre, mixed, evolved, strictly followed or balatantly negated.

This is something that normally a professional musician takes decades to master (apart from musical geniuses).

A listener takes less time to educate himself to appreciate those nuances (but not too little: let's say ~years). Once you develop a taste, it becomes very obvious to see through the spectrum that goes from bad quality tunes to musical artistry.

I see nothing musically interesting in this (wonderful) PoC of speech synthesis.

Just to be clear: I did not see anything particularly stunning even in Google's Bach Doodle from some years ago https://doodles.google/doodle/celebrating-johann-sebastian-b...


You didn't actually answer


As always with art, the answer is: it depends on what you think, feel like and/or are trying to convey.


Reminds me a little bit of Catholic mass when the priest "sings" some of the sections. There is no consistency, no cadence, but their voice goes up and down. It's high-effort talking.

I wonder if these models would do something better if the text were poetic or punctuated differently.


All the AI generated music just sounds like someone jamming without any hint of any real melody, original or a cover. It's very strange to listen to. It sounds exactly like an AI generated photo of a person looks like. Looks/sounds kinda real until you look/listen closer.


Some of these are just "nice" but I'd probably buy this one: https://app.suno.ai/song/abddd209-4ad7-469d-82b9-f0117db0e51...


That... was amazing. Were the lyrics AI-written too, or is it a pre-existing text?


I don't know, it wasn't generated by me, I just found it in the "explore" tab. I tried to generate something similar and this came out: https://app.suno.ai/song/24dddb7b-5a10-45a2-8f52-94d627030c3...

I generated lyrics with chatgpt4 + some manual tweaking.


I didn't expect the twist there, nice work on the lyrics!

I feel the suno-generated song itself lacks congruency. Like, you could listen to it 10 times, have the lyrics in front of you, and not be able to sing along.


I'm impressed how it managed to extract rhyme from that license.

    The software is provided (as is)
    without warranty, of any kind
    express, or implied.


Exactly. Tough constraints, having zero flexibility in lyrics.


Yeah, there are some good rhymes in there. Actually better rhymes than those ChatGPT delivers if asked for lyrics or poems.


Plot twist: MIT license was written by a poet.


The Free Software Song:

https://app.suno.ai/song/2ce5eab5-d1c5-48b2-91a0-8e6095e29ed...

https://www.gnu.org/music/free-software-song.en.html: “Richard Stallman and the Free Software Foundation claim no copyright on this song.”


This is scratching an rare itch for me because I am a heavy subvocalizer when I read just about anything, and when I have a song stuck in my head, I end up wondering what it would sound like if someone sang the words I’m reading to the tune of the song.


Probably relevant, I took a photo of my kids "curious questions about space" book and threw the words into a song

https://app.suno.ai/song/f283429e-ec3e-4152-b5be-a57cd72a6d9...

They've been listening to it in the bath to huge success. Particularly the change at about 40s for "why can't we breathe on the moon?" which feels like an excellent song lyric.

Honestly I'm blown away at how well it does.


for college i would convert my physical textbooks to wordfile text, then convert the wordfile text to computer voice mp3s and use those to play in the background to help me studying.

break up chapters or sections of the college textbook into suno songs instead - itd be maad interesting how much better that wouldve helped my studies. monotone computer voices of 10+ years ago will put you to sleep.


It's a satire generator. Take any text you want to make fun of, turn it into music.

I'm not sure whether I've just run out of credit, or Suno actually knows what the political sensitivities of the text might be, but I can't generate a second amendment song.


In the lower left hand corner there is a "subscribe" button, and above that a "credits" counter.


Has anyone had decent results with C++ code?


Won't be able to beat "Program in C" https://www.youtube.com/watch?v=tas0O586t80


Barely related but this reminds me of a video where Sir Elton John sings the text of an oven manual.

https://youtu.be/8GuI4UUZrmw


Reminds me of Regina Spektor's style.

And some of the generated phenomes actually just sound like stylistic auto-tuning. I kind of like it.

I'm sure many have already observed this, but I think the thing that most artists fear from AI is not that AI will be able to produce works on parr or superior to human works, but that most people won't care enough to value the difference.


In a similar vein, LessWrong released an entire AI-generated album with lyrics adapted from significant posts made there over the years: https://www.lesswrong.com/posts/YMo5PuXnZDwRjhHhE/lesswrong-...

I think I'm going to enjoy how surreal widespread access to generative AI will make the world.


I'm not seeing a Roko's Basilisk track, disappointing.


I've been with LW people for years and no one has ever mentioned Roko's Basilisk.


The song really picks up when you get to the all CAPS section.


Made one reading the Declaration of Independence. I am impressed. https://app.suno.ai/song/54898804-8cd9-4b6f-a18d-3ffbe728579...


I can see this being "a thing." I tried one too - Gangsta Rap Constitution: https://app.suno.ai/song/0ed4c4e2-9a92-40c1-a1ab-20ae49b7a8d...



There should be a whole Broadway musical of this


Someone should get a radio frequency and broadcast these 24/7. Civic Gangsta Rap.



This is impressive, but part of what makes it so is that we are not used to it. As these kinds of AI-generated music/images/videos become ubiquitous, it will be the new normal and they will become less impressive.


Maybe, but I think there is something innately funny about making computers say silly things. As a small child it was peak comedy to me making a Macintosh say “fart” and it’s still funny to me when a computer sings the MIT license.


On that theme, I asked Suno to sing a rap but "remove all the vowels", and it's hilarious how well it attempts to sing the silly result:

https://app.suno.ai/song/30f8223e-0d0b-4cac-8b3f-5d8f0f743e2...

The lyrics generator is some version of GPT so you can give it natural language instructions like this.


Had Claude Sonnet write the bones of this song about an AI convincing you it's got your back. Edited for segmentation in Suno, and it turned out pretty well, though it didn't quite hit the style I was looking for.

https://app.suno.ai/song/4f96f485-8d84-4df0-9a9c-941984137cc...


I was wondering if she would sing really loud at the ALL CAPS sections, but fortunately she did not. Still better than most Eurovision Contest songs :)


But the accompaniment changed. Very uplifting.


Disagree, Eurovision is stacked this year!


I have my song ready, now I need to know how can I make a video clip based on it?


My approach to generating a music video was to generate scenes using DALL-E 3, and then animate those using Stable Video Diffusion (SVD).

SVD doesn't have well-controllable motion and is utterly blown out of the water by Sora, but it's what we have right now.

Here's the resulting vid, "a death metal song about a macro photographer":

https://www.youtube.com/watch?v=kNVRQ1Zg-a0

If you only want a video file from Suno to share with the default static lyrics screen on it, hit Download Video from the three-dots menu.


Reminds me of Richard Dreyfuss reading an Apple license.

https://www.youtube.com/watch?v=Cu0lqUlHEko


This is so much better than stable.audio released yesterday!?

I've dabbled in music production and this is just unbelievable.

Both amazing and a bit sad because this is already so much better than would i would have anticipated.

First illustrators, copywriters, then VFX guys, and now music. We're going to loose so many jobs in the creative sector right?


I generated a song in 30 seconds from getting on the site, and generated a song that is crazy relatable, funny and sounds good. Made the whole family smile. This is going places.


Care to share the link?


suno.ai


I assumed they meant to the song you generated?


This reminds me of OpenBSD release songs! https://www.openbsd.org/lyrics.html


And early nerdcore.

(Dual Core FTW!)


Can I include this as LICENSE.mpeg in the root directory of my projects instead of a text file?


And does the requirement that "this permission notice shall be included in all copies or substantial portions of the Software" mean that the mpeg specifically must be included?


Going to make those boring textbooks sound more tolerable. Interesting implications for education. If this was a foreign language I didn't understand, I don't think I would have been able to tell it was generated.


It seems like it can generate EDM okayish: https://suno.com/song/05e80706-1081-4461-b2c5-2850a2d40eee


Suno.AI is very fun. I find that asking ChatGPT to create lyrics and then feeding it gives some great results, although half the generations tend to have a bit too much static, so you have to keep generating.


Wow! I hadn't kept up with music generation for the past few years. It's come a long way!

Long-term coherence, reasonable-ish melody, all on top of very unmusical text. Very impressive.




We came a long way from the first synthetic singing voices.

https://simulationcorner.net/SAM/sing.wav

Edit:

https://youtu.be/Rm4ZCGgzeeU?si=upK-qCMev8ZaibIa&t=222


Even older, Daisy Bell on an IBM 7094 from 1961: https://youtu.be/41U78QP8nBk?t=63


I prefer her singing C source, especially @karpathy's: https://suno.com/song/2a210337-62fc-49f8-8850-9af12e06e6e0


There is a throve of "X singing Y" (like "Beatles singing Billie Jean") clips on YouTube right now. Most of them are boring, but some are funny and entertaining. RN it requires a lot of manual work to generate, although some models and a lot of datasets are already available on Huggingface.

Suno, or its next clone, will dramatically close that technological gap; no army of lawyers could stop it. I would brace for impact into a short interim period of dead talent revival. Then, newly synthesized, superhuman artists will emerge.


Song lyrics (generated by ChatGPT) based on the The Declaration of Independence.

https://app.suno.ai/song/a693c847-7ce6-475c-adc5-0328786b901...

Haha this is amazing!


ChatGPT has no humor, but this certainly made me laugh.


Babe, new cover of GNU General Public License v2.0 just dropped!


Amusing, as well as a decent ad for the latest version of Suno.ai


I remember when Solaria came out there were a ton of people making emotional spiritual music with it. It felt so odd, robot voices singing to God and about the wonder of experiencing life. Sounded pretty though.

Soon we will have 'preacher's in a box' that will sing to lift you up, mentor you, guide you through life. Most will even be 'non-religious' but will basically become your religion, your guide through life.


It's a real Nier: Automata vibe. The machines all chant "Become As God" as they try to sacrifice you.


TIL: When singing a license, ALL CAPS means chorus.


Suno.ai and the underlaying technologies are really quite amazing. I've done a few things like:

* Put a poem my late mother wrote to music for her memorial

* <asian-language> versions of 80's new wave songs

and they came out so lovely compared to what I'd be capable of as a musician, but puts me in the role of a "producer" of sorts tuning the sound and vibe. Really well worth the money.


Is there any information how such songs are made? It probably is way more complicated to get a decent result than one might expect.


It's suno.ai (has a free trial), works much the same as image generation, you give it a description and it writes a song in a couple of seconds. Lyrics can be customized:

https://app.suno.ai/song/41fde9b6-a722-4c39-92dc-8a8296c018c...


Suno is pretty cool. If I had to guess this uses Suno's Bark and Facebook's MusicGen? The output of the latter is used as conditional layers for the prior similar to ControlNet?

Anyway, what will be interesting is when this can be done locally on consumer hardware with open-source AI, a nice UI and Vulkan/DirectML GPU inference.


To get chorus right I’m not sure if LLM type tech can accurately repeat the chorus it has made up before. Many songs have very repeated chorus. An example is U2s One. “Is it getting better..” and then another chorus “did I disappoint you..”

Current generated songs are made like sentences where you hear entire song without much structure


The chorus (or all caps part) is now burnt in our Eula memories.

Wonder how long it will be until someone sings Mein kampf though


The track stops abruptly before the last line. Has anyone continued [0] it to a proper end?

[0]: https://suno-ai.notion.site/FAQs-b72601b96de44e5cacd2cd6baa9...


most impressive ive heard on suno was a live performance. all the live performance cliches including the crowd singing along acapella. it was unfuckingbelievable - and at the same time i can see how that can get burned out real quick by others replicating same idea over and over.


Nice results! It reminds me of the Portal song about Aperture Science: https://www.youtube.com/watch?v=Y6ljFaKRTrI


Yes:

    The cake is a lie, but the music is real

    It’s all fake when the truth is revealed
The autotune electronic voice seems likely styled on GLaDOS.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: