As with Stable Diffusion, text prompting will be the least controllable way to g...

zone411 · 2024-02-13T04:36:13.000000Z

Yes. Since working on my AI melodies project (https://www.melodies.ai/) two years ago, I've been saying that producing a high-quality, finalized song from text won't be feasible or even desirable for a while, and it's better to focus on using AI in various aspects of music making that support the artist's process.

3cats-in-a-coat · 2024-02-13T14:38:24.000000Z

Text will be an important input channel for texture, sound type, voice type and so on. You can't just use input audio, that defeats the point of generating something new. You can't also only use MIDI, it still needs to know what sits behind those notes, what performance, what instrument. So we need multiple channels.

l33tman · 2024-02-13T13:41:50.000000Z

Emad hinted here on HN the last time this was discussed that they were experimenting with exactly that. It will come, by them or by someone else quickly.

Text-prompting is just a very coarse tool to quickly get some base to stand on, ControlNet is where the human creativity again enters.

emadm · 2024-02-13T16:01:23.000000Z

Yeah, we build ComfyUI so you can imagine what is coming soon around that.

Need to add more stuff to my Soundcloud https://on.soundcloud.com/XrqNb

raincole · 2024-02-13T08:59:52.000000Z

For music perhaps. For sound effects I think text prompting is the rather good UI.

bemmu · 2024-02-13T12:08:37.000000Z

Controlnet/img2img style where you can mimic a sound with your mouth and it then makes it realistic could also be usable.

gcanko · 2024-02-13T17:36:27.000000Z

I think it would be ideal if it could take the audio recording of humming or singing a melody together with a text prompt and spitting out a track that resembles it

yogorenapan · 2024-02-13T23:12:23.000000Z

1. Do your humming and pass it to something like Stable Audio with ControlNet

2. Convert/average the tone for each beat to generate something resembling a music sheet

3. Use vocaloid with LLM generated lyrics based on your prompt (or just put in your lyrics) and pass in the music file

4. Combine the 1-3

Would love to see this

b0ner_t0ner · 2024-02-13T12:52:19.000000Z

But works great when you don’t need much control, prompt example: “Free-jazz solo by tenor saxophonist, no time signature.”

MetalGuru · 2024-02-14T01:40:36.000000Z

What other inputs besides text promoting is there for SD? Are you referring to img2img, controlnet, etc?

numpad0 · 2024-02-13T08:18:21.000000Z

It's crazy that nobody cares. It seems to me that ML hype trends focus on denying skills and disproving creativity by denoising randoms into what are indistinguishable from human generation, and to me this whole chain of negatives don't seem to have proven its worth.

JAlexoid · 2024-02-13T16:15:57.000000Z

LLMs allow people without certain skills to be creative in forms of art that are inaccessible to them.

With Dalee - I can get an image of something I have in my head, without investing into watching hundreds of hours of Bob Ross(which I do anyway)

With audio generators - I can produce music that is in my head, without learning how to play an instrument or paying someone to do it. I have to arrange it correctly, but I can put out a techno track without spending years in learning the intricacies.