Hacker News new | past | comments | ask | show | jobs | submit login
Stable Audio 2.0 (stability.ai)
59 points by meetpateltech 9 months ago | hide | past | favorite | 34 comments



It's very impressive. Some of the same tracks are very nice to listen to.

However, I can't help but feel like it's missing something. Perhaps the reason we listen to human made music is to feel what the artist wants us to feel. There are intentions behind every beat and every tone in a human created track.

Maybe it's just getting used to listening to AI music.


I don't think your reason is valid at all.

I have bsaically 0 background knowledge of music and still enjoy it.

Do you know the story of the writer and his son? He wrote a book and this book was used at his sons school. The son wrote an analysis of the book and he got a bad grade because the teacher disagreed with the authors sons interpretation of his fathers book.

If an artist doesn't exactly tell us what they want us to feel, its nothing we can feel through magic.

I liked the cinemactic syncwave, but you know its a demo page. Its not included in a game, movie or something else.


There’s a better version of that story written by Asimov where a physicist is talking at a party to an English professor about a Time Machine they created that can bring famous people from the past to the present. He used it to bring William Shakespeare to the present. And Shakespeare took the English professors class on Shakespeares plays. He had to send Shakespeare back tho because he was angry that he failed the class on his own plays.


What is that story supposed to mean?

I liked the cinematic synthwave track too - much better than what a lot of human creators do.

But… the fact that it can generate a pleasant electronica tune is essentially meaningless. This is no Miles Davis.

An LLM generated narrative cannot convey the emotions and experiences of the author. Nor can AI generated music.


My point was that the interpretation or 'feeling' you get, doesn't has to align with what Miles Davis thought to induce.

Therefore it doesn't really matter what Miles Davis was planing to do because if Miles Davis doesn't tell you exactly what he felt and thought you would feel about it, but you still enjoy the music, you don't need a Miles Davis as long as you enjoy the music


I don't think you're getting my point.

The point is that human created songs have some sort of "soul" to it. I don't know what the reason was that the human artist chose this lyric or this beat. But some thought was put into it.

It's why I appreciate human art over AI generated art.


I think i got your point.

And i still argue that this 'soul' thing doesn't exist and it doesn't matter.

Music becomes popular not because it has a magic soul to it but because it triggers enough people at the right time.

And covers are often also super popular and i myself was not always aware that a cover was a cover. Which also indicates to me that some music wasn't magic because of its soul but because the search space for interesting/intriging music is HUGE and it takes time for us to walk through it.

Edit: Nonetheless, I do count myself to the group of people who value things more than others IF it has more personallity. Like if i know this thing is handmade and very well build vs. the same thing machine build.

But this is appreciation and luxury. In case of me not having the money to do so, i would not say that the mass produced thing is bad just because.


This same argument will be henceforth made about anything created by AI, and regardless of how ubiquitous and capable AI becomes, it will never be invalidated.

AI generated inputs will never have the soul of human created art, by definition. The soul is the intangible and inimitable kernel of emotion buried in the art by virtue of it being created by a human. It’s what separates art from ‘output’.

Of course it’s not to say AI content cannot be enjoyed, but I think as a society we will need to be increasingly mindful of cultivating and appreciating this soul wherever we can. Endless scrolling apps have shown us we humans get easily sucked into becoming content consumers, and AI is particularly well poised to generate limitless content.

This is uncharted territory for our spongy, dopamine-seeking brains. Self-awareness of our own consumption is more important than ever.


I think that's probably right too. I think it's also important to remember right now most of us are not really artists trying to tell our stories, move people, call to people, whatever, with our art. Most people playing with this stuff right now are techheads faffing around with cool new toys. In the future, I don't know how I'll feel but I suspect when I just want audible distraction, I'll be fine with an infinite loop of AI generated inoffensive tropical house. That won't be the same as seeking out or discovering something a human made, and I don't know I'll care about the tools they used to make it, I'll still be interested to learn why they made it, how they made it, etc.

I went to art school in the early 2000s, lived the second wave of analogue to digital change in the motion picture industry. I was somewhat worried about AI in art, these days I'm changing my mind, I think it will be fine/good and will push the meaning of art even further, that's always good. now I'm excited. Humans will always have emotions. Artists will always art. Humans will always seek out other humans emotions thru art.


It’s also worth pointing out that a lot of current “art” is produced under similar constraints to AI art - for commercial purposes and with algorithmic feedback. It’s easy for people to think ‘this is just as good as a human’, when they are only comparing it to soulless corporate art in the first place.


A sense of intentionality vs spontaneous cut-up forms

Eventually people will say they can see thoughtful attention in AI art. Not today


> intentions behind every tone

Or at least an interesting process. I didn't appreciate this until I started making tracks myself. Music that's simple to notate, once interpreted by a human, gains more information than could possibly fit in the score.

I don't have reason to think AI can't accomplish the same thing eventually, but I haven't seen it happen yet.


This with audio to audio, style transfer & all the elements to come likely put into a comfyUI style format is just the beginning as a inspiration and augmentation tech.

Team aren't that interested in Taylor Swift by Drake or whatever but steadily building up tools to customise to musicians own libraries (the open version of this, it tunes super well), plus then have control and composition over it all.

Plus making sure the licensing is all good versus other audio tools where they are asking for permission later (which is a very bad idea imo)


You sound like someone who's not a musician.

There are ways to 'humanize' music, by use of messing around with rhythms/syncopation, velocity, distortion, etc. Not all of it is intentioned - quite a number of stuff is automated. Yet it feels no less 'human' - you would be a fool to think that human-created music must have "intentions behind every beat and every tone". That's not how it works - as any musician, arranger or composer can attest to.

I suspect that AI music can write music far better than even humans, and more consistently than with AI (visual art), given how music tends to be repetitive and more mathematically consistent in that regard.


This is really neat. I didn't see mention of copyright on the output audio. They talk a good bit about how they handle the input audio's copyright (which is great they mention that), but I didn't see anything about the output. Do you own it? Do they own it? I may have just missed it, but that's an important piece here IMO.


That's very much up in the air with generative AI output. The US Copyright Office has claimed so far that there's just no copyright in model output, though that's far from settled one way or the other. If there is copyright, who owns it would be yet another question.

Sound is even a little trickier than other domains: the composition of a song and its embodiment in a specific recording have separate copyrights. Do both necessarily get treated the same?


"Subject to the remainder of these Terms, we encourage Basic Tier Users to employ the Services to make creative, non-commercial use of Content ... Unlike Basic Tier Users, paid tiers of users may use the Services to generate Content for some (but not all) commercial uses".


What a neat prototype.

I gave it a beat I made 10 years ago, the original source files are long go. Set it to 90%, sounded like someone played put a stereo in a washing machine.

I can't help but think this can't really work without a larger dataset. Still, I'll probably buy a subscription.

I might see if I can automate using Chat GPT to summarize the morning news, generate a beat reflecting it, and then upload a visualization of the music to YouTube( or another service, don't want to piss off the Google gods.)

Edit: After playing around with it for 30 minutes, nothing sounds particularly great. I can make something better than this in an hour or so. Let's see where we're at next year.


> "Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators."

This is how you do it. Well done, Stability AI!


Again, technically impressive, but extremely mediocre outputs. I listened to all the samples and they're just nothing. A competent modern electronic musician would do a better job, or at the very least get a vastly more interesting result trying, for all of them. The actual artistic potential in this is in how it breaks, not how it works. I'm sure future versions will come with higher fidelity, but clearly still zero artistry.


As someone who doesn't know much about music, I found the samples interesting. Sadly for the actual competent musicians, I'm not an outlier... We won't notice the difference when videos, ads, movies and elevators are padded with these instead of human-made ones.


They're interesting, as far as that goes. I'm sure the technology will improve and will start to be used by musicians and yes, if used well, people won't notice. If the result is fine, it will be acceptable by most. I'm absolutely sure that anyone listening to something lesser, though, say something generated by a non-musician and not worked at all, will at the very least notice something off or lower-quality, even if they can't explain it better.


Unfortunately not open source. I really hope StabilityAI doesn't go the OpenAI route but I don't want them to fail either.


The irony of this tweet is growing ever rapidly I see.

>Emad (Ex-Stability CEO, 2023/11/21)

>Not your models not your mind

>#decentralizeAI

>https://twitter.com/EMostaque/status/1727091529440022641


The Audio-to-Audio samples sound more like the model trying to construct a similar enough instrumental and vocoding the difference using the original sample rather than drums and strums being generated identically to match the intent. Still interesting though, and I certainly imagine we're not far off from improved versions.


I tend to listen to synthwave while coding, as for me it has the right mix of energy + being able to ignore. I have felt my playlists have become a bit stale, and have been looking to add new stuff. Now it feels like this could just generate an infinite "good enough" play list.


Stability made a 24/7 stream for this: https://www.youtube.com/watch?v=k04RGJBHTv8


The sound samples are cool but I'm failing to generate anything very interesting using their site.


Anyone know if there's a ComfyUI style interface or similar for the audio models?


space elevator music


Not sure if you mean "space-elevator music" (music to listen to on a space elevator) or "space elevator-music" (an elevator in space) but I spent too long thinking about it and decided I'm fine with answer being either. Also I kind of hope I live to see a "space elevator" but I'll also settle for an elevator in space.


I think they meant music that elevates space. Like the auditory equivalent to helium.


I am picturing a toejam and earl type situation, but that music was way better.


It seems to have issues for me (Safari), at least this morning, complaining about being unable to get a user. Maybe too many people are trying it. Looks like a 500 error.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: