
Ask HN: Server under heavy load. Any downside to this hack? - webvet
The server hosting one of our live apps is under high loads and gets into a state where it stops responding to http requests several times a day. These outages last typically anywhere from a few minutes all the way up to half an hour or so.<p>The major culprit here is MySQL load and we&#x27;ve been working on optimizing this (those interested can refer to my previous thread on this: https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=6348903).<p>For now we&#x27;ve found that restarting the httpd and mysqld services brings things back to normal almost immediately.<p>While we continue to work on a - more elegant - solution, we&#x27;re thinking of writing a bash&#x2F;shell script that runs on hourly cron, checks load average (via uptime or &#x2F;proc&#x2F;loadavg) and if found higher than a threshold, restarts the services.<p>Can anyone think of any downside to this (used at least as a temporary measure)?
======
AznHisoka
I've had load issues in Postgres, not MySQL before and it was due to
autovacuum running on tables getting updated/inserted frequently. I'm not sure
what the equivalent in MySQL is, but if you have a ton of insert/update
queries, consider archiving your tables after a certain period of time, so
that your main table doesn't have ton of rows. You can consider sharding of
course, but also consider sharding the actual tables in the same database. An
insert query on a table with 1000 rows will take much much less time than an
insert query on a table with 100 million rows, all things considered.

Also, consider creating a buffer in the application layer that buffers
inserts/updates and executes them once as a single transaction, if they don't
need to be executed immediately. It puts less stress on the database. Of
course, this would require a lot of rewriting in your app, so not sure if you
want to go through this route.

Indices are another area. I'm sure plenty of people have told you to optimize
your indices, but also consider REMOVING unnecessary indices. Do you have an
index on a text column, or multiple varchar columns? Those can be killer after
awhile because inserts will slow down. Consider changing indices on varchar
columns to indices on an int column by hashing those strings.

A quick suggestion: Install NewRelic (it's free for a certain period), and
check out the database transactions that are taking up the most CPU load.
Sometimes there's that 1 query you overlooked that is table scanning and could
be the main culprit.

Also, are you using Rails by any chance? If so, there are other areas I can
suggest.

And please post your server specs. Maybe your VPS just does suck (no offense),
and the easiest route is just to upgrade your server.

~~~
webvet
Thanks for your in-depth reply and suggestions.

MySQL performance tuning and table structure/query optimization-wise, we've
left few stones unturned (but I'm sure there still is a scope for further
improvement - isn't there always?).

Not using Rails.

Server specs: 4 cores, 3.2GB RAM

>and the easiest route is just to upgrade your server.

Please refer to the other thread for more on this.

Thanks again.

~~~
AznHisoka
I hope you're paying less than $10/month because you can get a dedicated
server with 10 times more RAM here: [http://www.ovh.com/us/dedicated-
servers/](http://www.ovh.com/us/dedicated-servers/)

And how many reads are there usually a min? writes?

~~~
webvet
Paying way more than $10/mo for the VPS, close to 200 - and hey!! thanks for
the ovh.com link - looks very interesting, will certainly try them out soon.
Thanks.

------
cpncrunch
Restarting the processes isn't really solving the problem. What is the actual
bottleneck? It should be possible to figure this out.

I think your first problem is that you are using a VPS. You should never use a
VPS in a high load situation like this - buy a dedicated server! They only
cost about $70/month, which you should be able to afford if you have a
successful site. Ideally you should get as much RAM as you can afford and/or
an SSD drive.

I know you said you didn't want to throw hardware at the problem, but there
are limits - you can't run a massive database on crappy hardware and expect it
to work smoothly.

~~~
webvet
>What is the actual bottleneck?

As mentioned in OP, it is MySQL.

>you can't run a massive database on crappy hardware and expect it to work
smoothly.

Absolutely... but we do wish to milk the VPS for every cent of its worth and
we're not sure we've got there yet.

Thanks for your input.

~~~
stevekemp
> As mentioned in OP, it is MySQL.

That's not specific enough.

Yes, the problem is manifesting itself via MySQL, but the bottleneck will be
somewhere else; whether RAM, CPU, or I/O-wait.

~~~
cpncrunch
Exactly. If the OP can post the output of top, vmstat and iostat both when the
problem occurs and after restarting the processes, that would be a start.
There might be some simple thing you can tweak to improve things.

~~~
webvet
Will try and post within a few hours. Thanks.

------
lutusp
> For now we've found that restarting the httpd and mysqld services brings
> things back to normal almost immediately.

You need to examine the restart process and analyze why it resolves the issue.
If the reason is the abandonment of dead parasitic processes and memory leaks,
you need to find out why and correct them. If the reason is that the restart
unceremoniously drops all the current transactions, you need to increase
capacity.

> Can anyone think of any downside to this (used at least as a temporary
> measure)?

I certainly can -- a bunch of really irritated visitors, whose transactions
are abandoned. But that's only true if that is actually what's going on. Make
sure you don't have software issues that are preventing efficient operation.
If that's not the issue, you need to grow with your customer base -- increase
server capacity.

~~~
webvet
>You need to examine the restart process and analyze

Like I said in OP, we've identified MySQL to be the primary bottleneck and are
already working on resolving this.

>whose transactions are abandoned.

At least some saving grace here... we're not selling anything... it is purely
a content site

Thanks for your input.

~~~
lutusp
>Like I said in OP, we've identified MySQL to be the primary bottleneck and
are already working on resolving this.

Ah, yes. I remember from your prior post that you have very large databases
and table sizes and are considering (or have begun) partitioning the largest
tables. It turns out there is an innate partitioning scheme built into the
most recent MySQL versions, but it has to be compiled into the running binary
by way of a compiler flag:

[http://dev.mysql.com/doc/refman/5.1/en/partitioning.html](http://dev.mysql.com/doc/refman/5.1/en/partitioning.html)

I apologize if you've already heard this, or if it's already in place as a
partial solution.

~~~
webvet
Yes, we did investigate MySQL's internal partitioning option briefly before
deciding to roll our own scheme (which, after working quite well initially, is
now beginning to create problems of its own).

Perhaps, it is time for us to revisit this. Thanks again Paul.

------
joshbaptiste
Crude check but since it's temporary, an hour seems far too long an interval,
you check at 10GMT , by 10:05GMT your server is in trouble and has 55 minutes
to crap out. I would check every */5 minutes at least.

~~~
webvet
Good call - thanks :)

------
cbhl
I would be really worried about losing requests in-flight or that take a long
time to run.

Is it prohibitively expensive/time consuming to get (or borrow) a bigger
machine (on EC2, or in your colo, or what have you) to run MySQL on until
you've figured out how to shard / scale out your application?

~~~
webvet
We are actually more than worried too, because we _know_ we're losing plenty
of requests daily so trying to cut down the losses temporarily while we figure
out a more long-term solution.

For the rest, this thread might give you some more info on the issue:
[https://news.ycombinator.com/item?id=6348903](https://news.ycombinator.com/item?id=6348903)

------
trevelyan
If you haven't already done this, when the mysql server is having trouble make
sure you connect through the terminal and try:

> SHOW PROCESSLIST

This will show all active queries and the time they have taken to execute. The
fact that the server seems to churn to a halt and then work its way through
the problem suggests the issues are related to specific queries you can catch
this way. Then use the EXPLAIN command on the slow queries to figure out why
they are hanging your server and add indexes or tweak that part of your code
(avoid joins on large tables, etc.) as necessary.

~~~
webvet
Yes, as mentioned in the other thread, we're way ahead of this - but thanks
for your input all the same.

~~~
trevelyan
k, good luck!

------
staunch
It seems pretty clear that you don't have enough experience on your own to
resolve this properly, so call in some help. It may be possible to make an
architectural change that significantly reduces the resources you need, or
perhaps you'll find that you unavoidably need more resources to do what you
want. Someone who knows how to diagnose and analyze this properly can tell you
that.

~~~
webvet
>so call in some help.

As is being discussed in the other thread.

------
motilevy
while the root cause should be fixed ( sound like you're working on it ).
consider using monit for the restarts instead of cron :

[http://mmonit.com/monit/](http://mmonit.com/monit/)

~~~
webvet
Wow!!! Didn't know about this - took a quick look and found it very
interesting. Thanks.

