
Manipulate audio with a simple Python library - coppolaemilio
http://pydub.com/
======
stevetjoa
Thanks for making this.

Relevant plug: I have a Vagrant box [1] and GitHub repo [2] containing IPython
notebooks that we use in a workshop on music information retrieval. (Caution:
the IPython notebooks are under heavy development, i.e. incomplete. If you
have any feedback, please create a GitHub issue.) I just added pydub to the
latest version of the Vagrant box [3].

The nice thing about this setup is that, regardless of your host OS, everybody
has the same development environment so you don't have to go through the pains
of installing numpy, scipy, scikit-learn, essentia, and more.

[1] [https://vagrantcloud.com/stevetjoa/boxes/stanford-
mir](https://vagrantcloud.com/stevetjoa/boxes/stanford-mir)

[2] [https://github.com/stevetjoa/stanford-
mir](https://github.com/stevetjoa/stanford-mir)

[3] [https://vagrantcloud.com/stevetjoa/boxes/stanford-
mir/versio...](https://vagrantcloud.com/stevetjoa/boxes/stanford-
mir/versions/7)

~~~
jiaaro
You're welcome :)

Looks like a cool project!

------
Joeboy
A couple of things that people who like this might also be interested in:

Coursera.org is currently running a course called "Audio Signal Processing for
Music Applications" which I believe uses python. It's in its second week, so
you have time to catch up.
[https://www.coursera.org/course/audio](https://www.coursera.org/course/audio)

[http://aubio.org/](http://aubio.org/) is a library that does note onset
detection, pitch detection, beat / tempo tracking and various other things. It
has python bindings.

------
meowface
This looks really cool.

I'm a Python developer but have a little experience with audio/music
processing. Is there some other software or DSL that can manipulate audio
files with a high-level syntax like this?

~~~
acbart
It reminds me of VirtualDub's scripting language, which is meant for videos
but could do audio, if I recall correctly. It's been about half a decade since
I used it, though...

~~~
meowface
What do professional music producers and mixers use? I imagine GUI-type mixing
software is still the most popular, but I feel like there must be some
experimental musicians out there producing music by fuzzing different
parameters in some kind of scripting or configuration language.

If there actually aren't, I kind of feel like there should be.

~~~
erikschoster
There are many DSLs for music! Here are a few:
[https://en.wikipedia.org/wiki/Audio_programming_language](https://en.wikipedia.org/wiki/Audio_programming_language)

Many of which are inspired by or even directly based on Max Matthew's work:
[https://en.wikipedia.org/wiki/MUSIC-N](https://en.wikipedia.org/wiki/MUSIC-N)

~~~
meowface
Thanks, these are really cool. These were the kinds of things I had in mind.
I'm tempted to experiment with one myself, but my music skills and knowledge
are non-existent. And I'm pretty sure those are much more essential than
programming skills when using these languages.

~~~
jholman
That doesn't sound like a reason to avoid experimenting. If you're tempted to
experiment, jump in! Be bold!

------
emillon
I've been similarly frustrated with DAWs too. In the last few weeks I had a go
at writing a DSL that shells out to SoX for audio manipulation. This way I
don't have to manipulate audio samples myself.

It's way less polished than pydub but here it is if anyone is interested:

[https://github.com/emillon/tmc](https://github.com/emillon/tmc)

Example:

[https://github.com/emillon/tmc/blob/master/Music/TMC/Example...](https://github.com/emillon/tmc/blob/master/Music/TMC/Example.hs)

At first it was a pretext to play with free monads, a way of building EDSLs.
But right now I'm not sure it's not just a complication. Though, having an
intermediate representation before executing the SoX commands makes it
possible to write an optimizer (for example, collapsing two audio shifts).

~~~
conradfr
I know it's not available on Linux and not free but have you tried Reaper ? It
has lots of scripting / coding capabilities.

~~~
emillon
Interesting, I didn't know about these capabilities, thanks!

------
goldfeld
I'd be crazy for a library (any language) to distill a sound file into
discrete MIDI notes (or any notation), with configurable threshold levels.
Clustering those into different tracks by some unsupervised learning would be
a dream.

~~~
cannam
If what you're talking about is starting from an audio mix and estimating the
complete set of notes that produced it, you could try Silvet
([http://code.soundsoftware.ac.uk/projects/silvet](http://code.soundsoftware.ac.uk/projects/silvet)),
a (C++) implementation of a polyphonic note estimator from audio.

It's realised as a Vamp plugin which you can run in a host like Sonic
Visualiser to review the results, play them back, and export as MIDI. (I'm
involved with both projects.)

The general shape of this method, and of many related methods, is:

* convert audio to a time-frequency representation using some variation on the short-time Fourier transform

* match each time step of the time-frequency grid against a set of templates extracted from frequency profiles of various instruments, using some statistical approximation technique

* take the resulting pitch probability distributions and estimate what note objects they might correspond to, using simple thresholding (as in Silvet) or a Markov model for note transitions etc

Silvet is a useful and interesting implementation, but if you try it, you'll
also learn the limitations of current methods when used against complete
musical mixes. (Some of this is intrinsic to the problem -- the information
might not be there, and humans can't always transcribe it either.)

~~~
Joeboy
I've heard Melodyne solves this problem very successfully, and the demos look
impressive. Any idea what it's doing? Is it patented / secret / witchcraft? Or
just has more templates?

~~~
cannam
I don't have any worthwhile insight, I'm afraid. I expect it's partly high-
quality methods, partly a lot of refinement for common inputs and use cases.

Academic methods tend to be trying to work towards a very general problem such
as "transcribing a music recording". A tool intended for specific real users
can approach the problem from a perhaps more realistic perspective.

------
bravura
Cool project.

The echonest remix API has similar capabilities, and can quantize into
granularities of measures, beats, tatums.

[http://echonest.github.io/remix/](http://echonest.github.io/remix/)

------
r0muald
> last_5_seconds = song[5000:]

I think it should be song[-5000:]

EDIT it was already reported
[https://github.com/jiaaro/pydub/issues/65](https://github.com/jiaaro/pydub/issues/65)

~~~
jiaaro
oh damn, updated the readme but not the dot-com.

Fixed. I really need to automate that :/

------
DarkIye
I started writing a sample-level audio suite in Java, for musical purposes:

[https://github.com/williamberg/audioDCB](https://github.com/williamberg/audioDCB)

which I'll probably discard because Java is not the right tool for the job.
Inspecting pydub/pydub/pyaudioop.py , I see there are methods for working on
the sample level. I'll make a mental note to come back to this project when I
get back on the computer music bandwagon again.

------
tdicola
Nice! I love little focused Python libraries like this. Will have to keep it
in mind if I ever do audio stuff in the future.

------
urs2102
What is the best way to break apart an MP3 file (similar to ID3Lib) so you can
see how it's encoded and edit the artist name, song name, add an image, change
the encoding format, etc?

Is there a good python library for re-encoding an MP3?

~~~
jpasden
This might do what you're looking for:

[http://eyed3.nicfit.net/](http://eyed3.nicfit.net/)

------
prezjordan
Cool idea! I like the overloaded operators.

~~~
mobiuscog
I would if they were consistent, but having some to change volume and some to
change repeats is somewhat jarring.

~~~
jiaaro
I _almost_ used a decibel type but in practice, when you write

    
    
        my_sound + 6
    

it feels pretty natural to think about that as "add 6dB" and

    
    
        my_sound * 3
    

as "my_sound 3 times"

Also, since those are the only odd overloads, it is quite easy to learn (at
least I think so, but I'm the author =P )

------
jestinjoy1
Suggestions on some cool academic projects that could use this library?

