Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do I prevent database bottlenecks on my web app?
8 points by suckerfish on Aug 24, 2010 | hide | past | favorite | 7 comments
I have done web apps before but only in the classroom so I've not dealt with real customers and heavy database loads. Now I am working on my first serious web app for my startup which provides each customer with their own unique subdomain.

I understand that I will have to learn on to go but I would like to avoid the dreaded database bottlenecks I hear about as much as possible. To this end, what guiding principles can I use to design the schema? Will the relational database model I learnt in school be able to scale well or will it make bottlenecks imminent?




Relational databases handle systems much, much bigger than your app will ever be. You are highly unlikely to have scaling problems. If you do, yay. Concentrate on other sources of risk to your business, such as building someone that no one wants, or targetting a market of people who hate paying money for things.


My advice about scalability for someone at your stage is to focus on making your data model clean and well-organized. Try to avoid obvious mistakes (such as querying for the same data over and over again). Make your code as clean as you can.

If you followed this advice, you shouldn't break until dozens of dynamic pages per second. And then adding a caching layer like memcached for the most common queries should take you up into the hundreds.

At that point you need more scalability work, but should have the finances for it as well. This would be a Good Problem to have. Don't worry about it until you get there.

Incidentally the main reason to keep your data model clean is not to help scalability (though it does surprisingly well for scalability issues), but to make maintenance of your code easier. Because as your vision of the product changes, you will find your code changing. And a solid data layout will make that process easier. (The end result is never ideal. Be pragmatic in your trade-offs, but not cavalier.)


Normalize until it hurts, denormalize until it works.

Build your database tier using everything they taught you in the classroom. Normalization is your friend here. Why? When you've effectively normalized your database, you can monitor the results of queries and eventually introduce the optimizations that you need to make things run faster.

The problems that sites like highscalability.com deal with are far beyond the scale of most endeavors. If you run into massive scale problems, be very happy because it means you're building something that people want and you can afford to throw some money at the problem ;)

I know people who have production databases with 30+ terabytes that is sitting in a relational database. If you design well, you shouldn't have any problems reaching those goals.


Defining the right indexes is the key until you outgrow the RAM in the server. You can be crawling with one schema, define a few indexes, and suddenly be able to handle a hundred times more activity. What indexes are right depends on what queries you need to execute, not just what data's in the tables, so read up on how indexes are used by your RDBMS, not just how they're defined.


SHOW PROCESSLIST in mysql should show running queries. A useful way to see which queries are taking up time and where you need to add indexes.


I agree with patio11.

Just build the app with whatever technology stack you are comfortable with (or want to do with). Keep the scalability problems for later. Get your customers first.

I usually (done it just twice) launch with sqlite. Makes it easier to move around the db or even the site. But beware of sqlite. Caution: Migrating from sqlite to postgresql might be a bit of a 'solvable' problem.


Google for nginx, Cassandra, memcache, node.js, RabbitMQ, and visit highscalability.com. I'm sure folks will chime in with some other useful nouns and urls.

Highscalability.com has links to videos and articles of several major sites and their architecture.

The youtube video was quite informative.

On to the schema...without knowing your specifics, I would say to map your entities in a normalized form, but, you will most likely bust most of them out and denormalize as influenced by your display page (view) or particular portions of it.

Also play around with Amazon AWS. You can play around for several dollars an hour without having to invest in your own hardware.

Also, lurk around stackoverflow.com




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: