Relevant plug: I have a Vagrant box  and GitHub repo  containing IPython notebooks that we use in a workshop on music information retrieval. (Caution: the IPython notebooks are under heavy development, i.e. incomplete. If you have any feedback, please create a GitHub issue.) I just added pydub to the latest version of the Vagrant box .
The nice thing about this setup is that, regardless of your host OS, everybody has the same development environment so you don't have to go through the pains of installing numpy, scipy, scikit-learn, essentia, and more.
Looks like a cool project!
Coursera.org is currently running a course called "Audio Signal Processing for Music Applications" which I believe uses python. It's in its second week, so you have time to catch up. https://www.coursera.org/course/audio
http://aubio.org/ is a library that does note onset detection, pitch detection, beat / tempo tracking and various other things. It has python bindings.
I'm a Python developer but have a little experience with audio/music processing. Is there some other software or DSL that can manipulate audio files with a high-level syntax like this?
If there actually aren't, I kind of feel like there should be.
Many of which are inspired by or even directly based on Max Matthew's work: https://en.wikipedia.org/wiki/MUSIC-N
Just found this example song apparently coded live in Overtone: https://soundcloud.com/meta-ex/spiked-with-recursive-dreams
Not too much documentation right now, and there's barely any example usage. Think it's about time to remedy that! I have plans for an open-source album created with it, but it's just an idea at this point.
I've been dogfooding it for my own music since I started working on it, here's a recent-ish album made with it: http://music.hecanjog.com/album/solos-for-unattended-compute...
It's way less polished than pydub but here it is if anyone is interested:
At first it was a pretext to play with free monads, a way of building EDSLs. But right now I'm not sure it's not just a complication. Though, having an intermediate representation before executing the SoX commands makes it possible to write an optimizer (for example, collapsing two audio shifts).
relevant comment in the audioop source: https://github.com/python-git/python/blob/master/Modules/aud...
It's realised as a Vamp plugin which you can run in a host like Sonic Visualiser to review the results, play them back, and export as MIDI. (I'm involved with both projects.)
The general shape of this method, and of many related methods, is:
* convert audio to a time-frequency representation using some variation on the short-time Fourier transform
* match each time step of the time-frequency grid against a set of templates extracted from frequency profiles of various instruments, using some statistical approximation technique
* take the resulting pitch probability distributions and estimate what note objects they might correspond to, using simple thresholding (as in Silvet) or a Markov model for note transitions etc
Silvet is a useful and interesting implementation, but if you try it, you'll also learn the limitations of current methods when used against complete musical mixes. (Some of this is intrinsic to the problem -- the information might not be there, and humans can't always transcribe it either.)
Academic methods tend to be trying to work towards a very general problem such as "transcribing a music recording". A tool intended for specific real users can approach the problem from a perhaps more realistic perspective.
I simply don't know where to start and have not had the incentive to discover. It's probably as laughable to others as say someone turnin up and saying "yeah SQL - I sort of understand it's to do with tables right?" But audio and video seem like closed worlds of programming. There seems to be no gateway from here to there.
So something like this is v exciting - it might be the gateway drug.
And being good at it means writing your own ticket, if you're a careerist.
1. Write some basic synthesis code(naive oscillator and volume envelope) and some way of sequencing it. Write little toy trackers and some procedural audio with this technique.
2. Try and fail a few times to write dataflow engine for audio like Pure Data. (this is kind of a big project and it turns out, you really don't need it to experiment.)
3. Write Standard MIDI Format playback system for a one-voice PC beeper emulation. This turns out to be a gateway drug for me learning more in depth because all you have to do is add "just one more" feature and every MIDI file you play sounds a little better.
4. Expand MIDI playback and synthesizer in tandem. End up with polyphonic .WAV sampler, then Soundfont playback. Learn everything about DSP badly, and gradually correct misconceptions. (DSP material can be tricky since the math concepts map to illegible code, and a lot of the resources are for general engineering applications instead of audio.)
5. Rewrite things and work on custom subtractive synthesizer, after getting everything wrong a few times over in the first synthesis engine. Still do not know to write nice sounding IIR filters; steal a good free implementation instead.
And that is where I stand today. I know enough to engineer complex signal chains, how some of the common formats are structured, and some tricks for improving sound quality and optimizing CPU time; what I miss is the background for writing original signal processing algorithms, which gets really specialized(there are some people who devote themselves entirely to reverbs, for example). These algorithms and their sound characteristics are effectively trade secrets, so the opaqueness of the field is not just a matter of the problems being hard - DSP just hasn't become as commodified as other software.
I think I will give it a go shortly - close your ears :-)
1. write an MPEG-4 parser -- this is much simpler than you probably think;
2. decode the H.264 metadata;
3. decode the H.264 picture data and write it to files, one per frame -- do not be ashamed to use an existing decoder!
4. put these frames back into a MJPEG, for instance;
5. try your hand at developing a DEAD SIMPLE I-frame only video codec, using e.g. zip for the frames.
This could teach you if you are interested in video without too much conceptual overhead. I had a friend and coworker who did #5 in Ruby, so don't think you need to get into GPU vectorization and signals theory right away.
I'm not aware of any open libs for this task though. I'm not really sure how you would go about this either. Something with wavelets would be my first guess? There is a wavelet lib for python . You'd have to determine the correspondence between wavelet scale and midi note frequency.
This assumes audio tracks are separated. Separating mixed tracks seems like an even bigger can of worms.
The echonest remix API has similar capabilities, and can quantize into granularities of measures, beats, tatums.
I think it should be song[-5000:]
EDIT it was already reported https://github.com/jiaaro/pydub/issues/65
Fixed. I really need to automate that :/
which I'll probably discard because Java is not the right tool for the job. Inspecting pydub/pydub/pyaudioop.py , I see there are methods for working on the sample level. I'll make a mental note to come back to this project when I get back on the computer music bandwagon again.
Is there a good python library for re-encoding an MP3?
my_sound + 6
my_sound * 3
Also, since those are the only odd overloads, it is quite easy to learn (at least I think so, but I'm the author =P )