Hacker News new | past | comments | ask | show | jobs | submit login
Scaling: what to worry about first? and then? and then?
3 points by niels_olson on Oct 19, 2008 | hide | past | favorite | 3 comments
Our goal is 'making med school easier, one less click at a time'. We have no business model, just trying to make our own lives easier. (tmedweb.tulane.edu). We went from bluehost to our own server on the university's network, and we stayed with CentOS along the way. The box is a spare dual-core dell desktop (highest load I've seen was about (0.30 0.14 0.10)), we have a 10 Mbps card and 100Mbps card (our current bottleneck), a little 500 GB RAID from G-Tech (total used ~ 8.4 GB, but if we take on lecture audio or video, this will get eaten fast), UPS (enough to drive in if needed), etc. We end up working a lot with the university's very helpful netsec guy for ports, bandwidth throttles, that sort of thing. We get about 500-600 visits a day.

I'm also getting ready for a different project, to take on the military's medical records system, which could get huge fast. So the tmedweb project is also a bit of a sandbox for what to do for that bigger project.

Gigabit ethernet is probably next on the todo list; what else should we be getting ready for? In what order to people generally arrive at which scaling issues? From where we're at now to "big". Not Google big, but, let's say, Navy big.

Here is my advice having struggled with scalability for the last two years, please take with a grain of salt... Regarding the previous discussion of EC2 etc - EC2 just gives you headless linux boxes that you can do with as you please. You have to configure the disks specially to gain persistence of data or use S3 or SimpleDB. There is a lot of background knowledge of linux that is going on. For scalability, I would first concentrate on the engineering quality of your codebase. Get lots of tests and get them running nightly and automatically. Automation of tasks is your friend, so if you don't already know ant spend the 2 days it takes to learn it. To really scale well, you are going to need a cluster of machines - this means your app is going to be running in multiple address spaces so you will need to handle that issue - if you haven't already done so, look up memcached. I think you should focus on stability first, make sure your app can run "forever" - the two big blockers to you there are memory usage and concurrency issues. Memory usage means cleaning up after yourself and avoiding memory leaks, instrumentation and a profiler are your friends here. For concurrent issues, you should do a thorough study of thread management and make sure your locks are consistent. Once you have all these stability issues under control, I think it is about having good interfaces for key components of your code, identifying bottlenecks in performance (again, instrumentaiton) and then rewriting these bottlenecks. There is also the issue of making sure that you are transmitting the minimum amount of data required and strategies for scaling up. If you are doing work for the Navy, be prepared for fairly serious scrutiny of your security practices as well - security, like scalability, it much harder to tack on later. Last but not least, remember that premature optimization will kill your time.

Maybe you should focus on building your product and utilize a service that will be able to scale better than you will be able to do on your own. Check out Amazon Web Services http://aws.amazon.com/ . There are a lot of big sites using their EC2 and S3 services...you should check these out. That said, I am not a scaling expert, so if you're hell bent on doing it yourself, good luck! I'm sure somebody on here will be able to offer you "do it yourself" advice.

oh, we have no desire to do it ourselves. We only started using our own server because of the politics of hosting university content offsite. It makes the relations with the professors easier.

I've been looking for ways to move storage and bandwidth to AWS, but just set up my own personal account with JungleDisk a couple of weeks ago. What utilities do you recommend for shifting webserver stuff to AWS? What can be pushed? As I understand it, it's strictly blobs, and all data manipulation still has to come back to the site's CPU for processing. Am I mistaken?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact