

Ask HN: How do I prevent database bottlenecks on my web app?  - suckerfish

I have done web apps before but only in the classroom so I've not dealt with real customers and heavy database loads. Now I am working on my first serious web app for my startup which provides each customer with their own unique subdomain.<p>I understand that I will have to learn on to go but I would like to avoid the dreaded database bottlenecks I hear about as much as possible. To this end, what guiding principles can I use to design the schema? Will the relational database model I learnt in school be able to scale well or will it make bottlenecks  imminent?
======
patio11
Relational databases handle systems much, much bigger than your app will ever
be. You are highly unlikely to have scaling problems. If you do, yay.
Concentrate on other sources of risk to your business, such as building
someone that no one wants, or targetting a market of people who hate paying
money for things.

------
btilly
My advice about scalability for someone at your stage is to focus on making
your data model clean and well-organized. Try to avoid obvious mistakes (such
as querying for the same data over and over again). Make your code as clean as
you can.

If you followed this advice, you shouldn't break until dozens of dynamic pages
per second. And then adding a caching layer like memcached for the most common
queries should take you up into the hundreds.

At that point you need more scalability work, but should have the finances for
it as well. This would be a Good Problem to have. Don't worry about it until
you get there.

Incidentally the main reason to keep your data model clean is not to help
scalability (though it does surprisingly well for scalability issues), but to
make maintenance of your code easier. Because as your vision of the product
changes, you will find your code changing. And a solid data layout will make
that process easier. (The end result is never ideal. Be pragmatic in your
trade-offs, but not cavalier.)

------
peschkaj
Normalize until it hurts, denormalize until it works.

Build your database tier using everything they taught you in the classroom.
Normalization is your friend here. Why? When you've effectively normalized
your database, you can monitor the results of queries and eventually introduce
the optimizations that you need to make things run faster.

The problems that sites like highscalability.com deal with are far beyond the
scale of most endeavors. If you run into massive scale problems, be very happy
because it means you're building something that people want and you can afford
to throw some money at the problem ;)

I know people who have production databases with 30+ terabytes that is sitting
in a relational database. If you design well, you shouldn't have any problems
reaching those goals.

------
SingAlong
I agree with patio11.

Just build the app with whatever technology stack you are comfortable with (or
want to do with). Keep the scalability problems for later. Get your customers
first.

I usually (done it just twice) launch with sqlite. Makes it easier to move
around the db or even the site. But beware of sqlite. Caution: Migrating from
sqlite to postgresql might be a bit of a 'solvable' problem.

------
dangrossman
Defining the right indexes is the key until you outgrow the RAM in the server.
You can be crawling with one schema, define a few indexes, and suddenly be
able to handle a hundred times more activity. What indexes are right depends
on what queries you need to execute, not just what data's in the tables, so
read up on how indexes are used by your RDBMS, not just how they're defined.

~~~
trevelyan
SHOW PROCESSLIST in mysql should show running queries. A useful way to see
which queries are taking up time and where you need to add indexes.

------
chopsueyar
Google for nginx, Cassandra, memcache, node.js, RabbitMQ, and visit
highscalability.com. I'm sure folks will chime in with some other useful nouns
and urls.

Highscalability.com has links to videos and articles of several major sites
and their architecture.

The youtube video was quite informative.

On to the schema...without knowing your specifics, I would say to map your
entities in a normalized form, but, you will most likely bust most of them out
and denormalize as influenced by your display page (view) or particular
portions of it.

Also play around with Amazon AWS. You can play around for several dollars an
hour without having to invest in your own hardware.

Also, lurk around stackoverflow.com

