I am quite the novice when it comes to building computers, and in this case I am not even sure if I need one or can substitute it with a bunch of PS3s.
Here are my two main requirements:
1) Crunch heaps of data. I am talking say a 100 million high-dimensional data points, and I may need quite a large number of those handy (that probably means, in memory) to run some machine learning (ML) algorithms on it.
2) A storage device to store up to a terabyte of data.
And some tertiary requirements:
While I'll probably be running hand-crafted algorithms on my data, it would be great to know of existing database technologies that do data management well and also have bundled ML code.
On the coding side, If you suggest python please also suggest a good resource for python-based ML scripts or package (pyML?)
Thanks for any help, pointers, and/or suggestions!
I know a lot of people will line up to beat the drum for Amazon Web Services but it really is one of the most fantastic resources for startups since Linux.