Hacker Newsnew | comments | show | ask | jobs | submit | login

There is other issue with running hadoop on EC2 (w/o S3). Instance storage is relatively small - about 3.6 TB on largest instance and 1.5 TB on other "large" instances. In typical Hadoop machine I would expect about 8TB. So local storage is prohibitively expensive for the big data tasks. In the same time - if we use local storage we a loosing elasticity - we have to run cluster all the time, even there is no jobs to run. It kills main point of using hadoop in the cloud - to pay for the computational resources on demand.

-----


I would expect about order of magnitude speed difference between interpreted language and optimized machine code. I can recall case when reducing analytical request from 10 hours to 10 minutes changes the qualities of research company was doing - since analysts where able to do more queries selecting better dataset for the report. Order of magnitude response time might also be go/no go for interactive analytics. In case of clouds where we can assume infinite resources it can mean 1/10 of cost. For the private clusters - it can be differenace in buying 10 machines (something common in hadoop's word) and buying 100 machines - something very few groups can get.

-----


Applications are open for YC Summer 2015

Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: