
Show HN: Playing music with your voice and machine learning - rammo92
https://blog.buildo.io/humming-with-the-bot-254808879644
======
VeejayRampay
Not totally related, but I've thought for sure time that a similar technique
could be applied for automatic subtitles synchronisation on media with
external subtitles. The system would read the sentences, automatically match
it to the relevant segment of the audio track and adjust the timing
accordingly.

~~~
calippo
This sounds interesting. Searched the web for something similar and I found
this [https://substital.com/](https://substital.com/) (possibly it uses more
naive techniques).

------
JepZ
Facebook only logins suck.

~~~
Bromskloss
Why do I need to log in at all?

------
ThomPete
Turn this into an app for ipad with instruments and you have me as a client.
Turn it into a plugin for Logic/Ableton etc and I would be willing to pay
monthly for that. I have been wanting this for years.

I can play instruments and I know my music theory. Sometimes I am just looking
for easier and other ways to express myself. This I think would be welcomed by
many just like the vocoder was.

~~~
calippo
Thank you... your comment was very inspiring :). I'm considering to add a way
to download the MIDI file, that's very low effort and would be of some use in
the short-term.

~~~
ThomPete
Yeah that would be amazing. I am litterally working on a song where I would
love to play around with ryhtm a little bit. As I do a pretty good human beat
box I woule love to be able to explore a couple of things there.

As I said. I would buy it and I would probably turn it into a hosted service
if I were you as you probably need the cloud to do some of the computation?

Anyway. If anything else I think it would have a future as a plugin or
service. And for a lot of novice music enthusiasts it would be a great little
ipad/tablet app.

~~~
calippo
We had an app on the appstore for a while, but it was a prototype and a
different use case. The idea was to let the user create loops — you can see an
example here
[https://www.dropbox.com/s/03q02cdllsry2az/musiqueness.mov?dl...](https://www.dropbox.com/s/03q02cdllsry2az/musiqueness.mov?dl=0).

However, we thought we would need much more work to make a real product, and
we stopped working on it.

I'm really considering making a webapp for a simpler use case as you suggest.

~~~
ThomPete
Yeah that app needs work like quantization both for rythm and scales and so
on. It could be a really powerfull tool I think.

Make a webapp at least, allow me to export some useful format for logic.

Then you can work on other ways to use the voice to inform the sounds and
other things. As I said. Think about it like VocoderMidi instrument/sequenzer.
Something really useful can come out of that IMO.

------
amelius
Another approach to making music without playing an instrument: [1] (scroll
down for video).

[1] [https://www.microsoft.com/en-us/research/project/mysong-
auto...](https://www.microsoft.com/en-us/research/project/mysong-automatic-
accompaniment-vocal-melodies/)

------
marknadal
You have made my dreams come true. Wow, some of the most exciting things I
have seen on HackerNews in the last decade have shown up just in the last
year. Thank you for building this!

What did you guys use? Tensorflow? Any chance it will be Open Sourced so we
can learn from it?

~~~
calippo
Thanks! it's mostly DSP, sound processing algorithms. The idea was to
reimplement most of it using deep learning if we had the budget/chance — it
was meant to be a prototype.

------
sarreph
This is really exciting — as someone who spends far too much time messing
around making electronic music, I was thinking about how cool it would be to
have a voice-to-instrument converter. After all, if a computer can create the
sounds of an orchestra or band with just your vocal beatboxing / humming chops
and a sprinkle of machine learning, then that's all you need to make the next
hit-song, right?

Well — not trying to detract too much from this field — but, that's where I
think the expectations of grandeur, that some people will get about this
emerging technology, are kinda wrong. There are a few issues I can think of
that will need to be overcome (if even possible?) before this technology can
be considered groundbreaking.

From the 'person' side: (1) - Vocal range and ability >> This one is going to
be most hard as most people can only sing within a couple of octaves, and even
then, need quite a bit of 'tuning' help to identify the right notes /
intentions. (2) Musical theory >> Fitting into the above, if you don't have
any musical theory knowledge (about what key you're supposed to be in, or
about chords, or progression), then a machine learning process would have to
fill-in all the chords and textures / build-up. But that really can only help
you so much, unless you're just interested in the 'novelty' aspect of it.

From the 'machine' side: (3) 'Bum' notes and 'quantisation' >> I noticed from
the samples that the saxophone would produce flutter around a note, and also
the timings weren't 'on the beat' — but this should be quite trivial. (4)
Expressiveness >> Right now, the instruments might be adjusting to the input
velocities (loudness), but if you want a life-like sax sound then you'll need
to produce a level of expressiveness that makes it sound realistic... This can
be achieved with a mixture of better (available) virtual instrumentation, and
also ML to figure out what the best intentions are of the input

I think the two big key factors are musical theory and expressiveness. With
the former, I think anyone who has a good enough understanding of musical
theory probably wouldn't have much of a use for this other than a novelty
until it produces compelling, effortless results. This is because they're most
likely to have some kind of instrumental knowledge (keyboard most specifically
for virtual instruments) and music composition experience. The two groups this
kind of tech would really help are a) very good vocalists who have no
electronic composition experience but a good grasp of music theory — Michael
Jackson is a good example of this as this is how he made his song ideas[0],
and b) music producers / instrumentalists that want to 'fill-in' a lot of
parts quickly by humming / beatboxing — but again, they can probably do this
with a keyboard reasonably effortlessly. If the 'expression' is worked on a
lot, then it will become more interesting as expressive effects are hard to
get right without expensive hardware. [1]

As much as I appear to be 'bashing' this, I am reservedly excited — but can't
help but feel if there actually will be a worthwhile market for the tech!

[0] - [http://www.nme.com/blogs/nme-blogs/the-incredible-way-
michae...](http://www.nme.com/blogs/nme-blogs/the-incredible-way-michael-
jackson-wrote-music-16799) [1] - [http://www.musiciansfriend.com/keyboards-
midi/yamaha-wx5-mid...](http://www.musiciansfriend.com/keyboards-midi/yamaha-
wx5-midi-wind-controller)

EDIT: formatting

~~~
oddlyaromatic
The fella behind imitone ([https://imitone.com/](https://imitone.com/)) has
been grinding away at the details of this stuff for a while. I haven't checked
in on it for a while but I was a kickstarter backer. The last version I loaded
up and played with was pretty great.

~~~
calippo
We studied imitone before working on Reedom — it's a great and inspiring
project. The approach and use cases are slightly different. The main
difference is the realtime: imitone mainly focuses on realtime, while we
didn't see much value in that. Our real goal was to allow the user to create
complete songs, using loops created with Reedom. We had an iOS prototypal app
for that, but we didn't get so far.

~~~
oddlyaromatic
Oh cool, thanks for the response and explanation. For me, real time is a big
deal: I work with adults with developmental disabilities, and anything that
helps with having fun jamming with me and my musician friends is really
interesting. I had fun combining imitone (or other inputs) with loops created
on iOS with Figure. It's a really interesting space, and thank you for sharing
your work.

------
catshirt
the percussion mode is really cool. it would be awesome if it could work with
something like finger tapping (something like TableDrum).

