I ask because I'm intrigued by this kind of design, but not the server cost that seems to be associated with it for a newly launched (and potentially unproven) product.
While I think these techniques can scale down, the current crop of Big Data technologies (esp. Hadoop) don't scale down very well. That is, they have a lot of overhead for small amounts of data. So while these techniques can work for "small data", it's going to be relatively more costly. For big data, the overhead is amortized. In the future, I do see scaling down as an important evolution for these technologies.
Not being huge into Java isn't helping either. Would I be better served by biting the bullet and doing things in Java initially or can I skip right to jython or jruby or clojure or something?
Sam wrote the pallet-hadoop tool which can spin up Hadoop clusters at the click of a button ( https://github.com/pallet/pallet-hadoop ). Although if you're on AWS you're better off just using EMR.
You don't need to use Java. I do everything in Clojure (using Cascalog and Storm's Clojure DSL).
have you tried
and then migrate to multi node
Of course, this won't net you any benefit (in fact, performance will be slightly worse), except that it will be relatively easy to scale out and add servers later on.