From my brief last.fm experience, industry standards are a huge mess. I think we actually had our own program for doing at least some of the conversions, but it was just a couple of hundred lines calling standard libraries; commandline ffmpeg would not be unreasonable.

And yeah, it's an actor-like model, except you don't really have a problem with concurrency. You just need some kind of task queue that you add encode jobs to and a bunch of worker machines that take tasks off this queue, run them, and respond. Almost every big system seems to start looking like this.

