
Audio processing in TensorFlow - dariocazzani
https://medium.com/towards-data-science/audio-processing-in-tensorflow-208f1a4103aa
======
rryan
Anyone interested in machine learning for audio (or any other signals) should
keep a close eye on tf.contrib.signal. I just checked in some code to do
efficient framing, windowing and overlap-add. In a day or so I'll be checking
in an STFT and inverse STFT with GPU and gradient support, so you can make an
STFT part of learning (not just an input pre-processing step). Please file
bugs / feature requests and feel free to CC me (also rryan on github).

~~~
guscost
Just to confirm, this is being developed in the main TensorFlow repo?

~~~
rryan
Yup! The code is here:
[https://github.com/tensorflow/tensorflow/tree/master/tensorf...](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/signal)

~~~
guscost
Awesome, thanks!

------
gaius
I've been looking for something like this - I want to do some serious research
into my cats meowing. One of them will walk into a room, make eye contact then
emit a complicated sequence of meows, chirrups and squeaks. She is obviously
trying to say _something_ and it's my job to find out what.

------
anigbrowl
I find it perplexing that people are still using a serial text-based IDE to
write code for something called 'Tensor _Flow_ '. It's worth looking into
Reaktor and CoreDSP from Native Instruments to see how audio processing tasks
(and indeed neural nets) can be handled in a flow-based IDE.

The landing page is aimed at musicians but if you dig down there's extensive
documentation on how to do low-level DSP processing. [https://www.native-
instruments.com/en/products/komplete/synt...](https://www.native-
instruments.com/en/products/komplete/synths/reaktor-6/) and
[https://www.native-
instruments.com/fileadmin/ni_media/downlo...](https://www.native-
instruments.com/fileadmin/ni_media/downloads/manuals/REAKTOR_6_Building_in_Core_English_2015_11.pdf)

Yes, there's a learning curve to flow-based programming. But this is what all
IDEs will eventually look like. You shouldn't be writing everything in code
for the same reasons you shouldn't be writing your whole project in assembler:
your reinvention of the wheel is probably not as great as you think; every
time you forsake a standard modular component in favor of your own way of
doing it, you're creating technical debt for whoever has parse your code
later; A lot of the actual work in coding is syntax, glue, and scope checking,
and those jobs are _better done by a computer_ \- making people type out all
that stuff by hand is a distraction from the domain-specific problem to be
solved.

Music software may seem like a toy to non-musicians/sound engineers but for
them it's mission-critical real-time software with an extremely demanding user
base that frequently has advanced domain knowledge of its own. Tools like
Reaktor are built on the foundations of earlier tools like pd and
supercollider, which in turn were built on text-based languages like CSound.
People have been building neural nets and suchlike on these platforms for over
a decade already.

~~~
JosephRedfern
Do you think such tools are mature enough for wide adoption yet? How does, for
instance, version control and collaborative development fit into these flow-
based tools?

> every time you forsake a standard modular component in favor of your own way
> of doing it, you're creating technical debt for whoever has parse your code
> later

I don't disagree that you shouldn't re-invent the wheel whenever possible, but
modularity is absolutely not unique to flow-based programming. Well designed
libraries should provide good levels of abstraction and be highly modular.
Flow-based programming might stop people from shooting themselves in the foot,
but I'm not sure if the cost to flexibility makes it worth it in every case.

~~~
anigbrowl
Yes, look into Flowstone or NoFlow. Tools like this are very common in many
niche domains; it takes some mental effort to pull back and see the
limitations of path dependency in general purpose programming, and frankly an
awful lot of people are ego-invested in their current way of doing things with
no thought to the long term accessibility and maintainability issues.

