Also, I understand json is very easy to use, etc. But those big dumps cry for a binary format. Or at least add zlib/lzma compression so people don't waste bandwidth on uuencoded binary data in json.
The code data is compressed using zlib (and then base64'd.) It's all on s3-- in our experience, big data dumps like this get relatively little traffic after the original hype dies down and a torrent doesn't make sense. We're pretty sure amazon can handle the load :)
Distinction: we handle small pieces of audio from anywhere in the song (20s), ours works "over the air" via microphone, we have a huge database via our content partners.
The codegen is very different (instead of relying on echo nest chroma, it does its own onset detection) but the back end side is almost the same.
We are working on something that will benefit both parties (us and you guys) immensely. Is there a way to contact you? No info on your HN profile