Hacker News new | past | comments | ask | show | jobs | submit login

It depends a lot on what you're doing, but 2 TB in general seems a lot. If you have to perform an out-of-disk sort, you probably need a distributed setup. The funny thing is that most setups that use EMR defeat the data locality principle and I think that's the speedup people experience when they run it on a single laptop for example. Reading the original Google paper helped me a lot in understanding this.



If you have to perform an out-of-disk sort, buy another hard disk and now it's not out-of-disk anymore. This will suffice for nearly any data set you would ever need to sort.

A 4 TB drive costs about $120, and you'll spend way more than that on software development and extra computers if you do distributed computing when you don't need to.


Just to clarify, 2 TB everyday right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: