There is other issue with running hadoop on EC2 (w/o S3). Instance storage is relatively small - about 3.6 TB on largest instance and 1.5 TB on other "large" instances. In typical Hadoop machine I would expect about 8TB. So local storage is prohibitively expensive for the big data tasks.
In the same time - if we use local storage we a loosing elasticity - we have to run cluster all the time, even there is no jobs to run. It kills main point of using hadoop in the cloud - to pay for the computational resources on demand.
I would expect about order of magnitude speed difference between interpreted language and optimized machine code. I can recall case when reducing analytical request from 10 hours to 10 minutes changes the qualities of research company was doing - since analysts where able to do more queries selecting better dataset for the report.
Order of magnitude response time might also be go/no go for interactive analytics.
In case of clouds where we can assume infinite resources it can mean 1/10 of cost.
For the private clusters - it can be differenace in buying 10 machines (something common in hadoop's word) and buying 100 machines - something very few groups can get.