I really want to start testing and developing methods for scaling and speeding a project of mine, and also just for general knowledge. I can't think of a way to do this without creating a bunch of amazon ec2 instances, or buying more computers and doing this from house.
Can anyone think of think of way to do this for free or very little money? is their something I am not thinking of.
Background
I am creating a website http://www.tastestalkr.com. Right now I run it on http://www.dreamhost.com. They are great when it comes to serving webpages, but not if you want to run a crawler. I run the crawler from a computer in my house. This works great for now and I understand that I don't need scale for a long time, but I am doing this all by myself. I am using django and pythong. I am thinking that mayby something like hadoop is going to be my best bet.
2) Cache. Google "Memcached". Django has support for this. This will alleviate huge database bottlenecks (if your app is read heavy). Also possibly consider some sort of FS (file-system) structure specific for your app. It looks like you deal with a bunch of media files, that could get hard to manage without a proper FS plan.
3) Write clean code. There is no sense in profiling, optimizing, caching, etc if your code is horrible. Those things will lead to even worse, unreadable, unmanageable, possibly slower code.
From your description you asked slightly more specified questions, but you really need to understand the basics of above before specializing for your app. At least that's what I believe.