Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Dealing with a huge MySQL database - help
32 points by webvet on Sept 8, 2013 | hide | past | web | favorite | 62 comments
We have been dealing with a huge MySQL database on one of our projects.

Total byte size: > 150GB, Record count: > 100M (and growing daily)

Primary issue is of performance (this is a live site), currently hosted on a (fairly powerful) VPS.

Not having had in-house prior experience with such data volumes on MySQL, we hacked our own table partitioning architecture of sorts and also optimized the queries, table structures etc. to the max of our abilities. All this has been working reasonably well for us so far.

But now, with rising traffic levels, things are beginning to slow down again.

At this point, we're considering the following 3 options:

1) Throw more hardware resources at it Pro: Instant gratification :) Cons: Cost + vicious cycle(?)

2) Try to optimize the database/config, table structures and queries further Pro: Always useful Cons: Time + Effort

and

3) Migrate to PostgreSQL (or something else?) Pros/Cons: Still studying

What do HNers with experience in such matters suggest? We're open to all suggestions. Thanks in advance.




There is no silver bullet.

1) Analyze your indexes on all tables. Start with the high volume insert/read tables.

2) Look at your slow query log. Pick the slowest three queries. Optimize them. Setup a recurring meeting with your team to do this every week.

3) Start looking at what data can be flattened and is less important and could be stuck in a key value datastore.

4) Consider loading a replica slave server. Off load some of high volume read queries to the replica. Also serves as redundancy.

5) Consider where it may make sense to move tables to InnoDB. Table-level locking can be extremely time consuming when traffic is high. Row-level locking can help here.

6) Figure out what your hardware upgrade path looks like. You will have to do this eventually. Identify when you might hit a hardware ceiling with your hosting provider and get a plan in place. Coordinate with your marketing/sales team and upgrade before you hit a traffic peak, not during.


Thanks. 1) and 2) - we've done to the max. 4) already in place, with some differences.


Unless your development is completely frozen, #2 is never done


Development froze almost 2 years ago :) Since then, just the data volume keeps growing.


Wouldn't it be interesting to unfreeze the development and see if you can't refactor to avoid unnecessary queries? Code topology can heavily affect the amount of queries.


What I meant was feature development... that's what got frozen long ago... code revision exclusively for performance optimization has been ongoing continually.


"Not having had in-house prior experience with such data volumes on MySQL ... At this point, we're considering the following 3 options:"

4. Hire or contract a consultant/expert. Part of their assignment could be to teach you.


Hey, thanks :) We are actually considering this, but it skipped my mind while making the post.


This is probably the right answer. Look at it from a cost-benefit POV - how much money will getting the right answers now save you in the long run?


good call. there's a lot of free resources / open source projects with tools for optimization and locating pain points/bottle necks. i'd check those out first. see the types of queries that are slowing you down. then optimize.


What exactly is getting slow? Inserts? Updates? Queries? Which ones? Have you looked at the slow query log? Have you checked the plans for the slow queries?

Are these MyISAM tables or InnoDB?

150GB and 100M records is not huge, not even close. You are looking for a quick fix and there isn't one. You need to start with (2).


Yes, we enable slow query logs from time to time, look at those and try to optimize for those. MyISAM tables.

We've been cycling frequently (and painfully) through (2) (and less frequently through (1)) for the best part of three years now, so please don't accuse us at least of looking for a quick fix :)


With MyISAM tables you could have integrity problems. And if you are mixing innodb and myisam you are splitting buffers memory that well could go to just one kind (i.e. most memory for innodb) if you switch to that. If innodb is slow in insertions, you could try using TokuDB as storage engine, that is pretty fast in that and have a bunch of advantages. And if you are using MyISAM for some particular feature (i.e. full text indexing) MySQL 5.6 already have it in innodb, or you can separate that search to a sphyinx server.


Thanks. We'll check out TokuDB.


> 150GB and 100M records is not huge, not even close.

Close to what, if I may ask?


Agree. 1B rows is big. 100B rows is huge. 100M rows is large, but totally doable.


not even close to 'huge,' although my english syntax is a little rusty.


Regardless of your English syntax (rusty or not), what would you define as 'huge' for a MySQL app?


Caching is definitely a must have, both an external cache (Memcache, Redis) and tuning MySQL's own caches in the configuration, using InnoDB and MySQL means you can configure a large buffer in the MySQL config to speed things up considerably

Disk speed could also be a huge issue, especially on a VPS where resources are shared between many virtual machines, consider migrating your database off to a dedicated server (Preferably one with an SSD), depending on where you are in the world OVH might be a good choice.


Here are some guiding questions:

1) Do you actively need all 100M records, or is there a period after which you can archive them?

2) Have you partitioned data across multiple servers, or just multiple tables on one server?

3) What are the nature of your expensive queries? Are you generating reports? User dashboards?

I think a general idea of what your application is doing would help us provide some guidance.


1) Yes, and no 2) Both (3 servers) 3) We've analyzed/optimized this to the max (of our abilities)


In my experiences, the size of the database nor the number of records really matter. It's the number of queries that you are handling at one time that can really affect performance.

Sure, if you have a single table with a ton of records this can slow things down - but not nearly as much as dealing with a high number of queries.

I highly recommend this book - http://www.amazon.com/dp/0596101716/?tag=stackoverfl08-20

Yes, caching can really help. Logically placed indexes can too. Switching to postgres will not help. The bottleneck will still be there. The key is to find that before you start tuning/optimizing: http://www.singlehop.com/blog/analyzing-mysql-performance-an...

Good luck!


Will get the book and check out the blog soon. Thanks. :)


It sounds like your database is roughly the same size as the database that we work with, and MySQL performance has been a recurring issue for us. Assuming that you've done a reasonable job of option 2, I gravitate towards option 1 and maybe option 4 (depending on how confident you are of your team's technical capabilities and how expensive consultants are relative to additional hardware).

These are intermediate steps that have helped, giving us some time to figure we whether want to migrate to a different database:

1) We started using InnoDB a few years ago.

2) One of our common queries joined parts of multiple large tables; we essentially cached the useful parts of the join and put the results into a memory table.

3) Our non-production server is under a lot of load from our back-end processes, and we decided to move the database to a RAM disk, which greatly improved performance. It's important to note that, while inconvenient, a power outage is not catastrophic in our case (e.g. we don't deal with financial transactions); we do daily backups to non-volatile memory. For around $5,000, you can build your own server that has over 250 GB of RAM and then put it into a co-location facility.

We seriously considered migrating to Postgres, but #3 has bought us some time on that front. It might be that we eventually still migrate, but we prefer to do it after the company is on better financial footing. It might also be that while we wait, other less painful options open up over time (e.g. maybe MariaDB will suffice).

I hope the situation resolves well.


Consider that 37Signals probably has a much larger dataset and is still happily scaling vertically due to Moore's law.

http://highscalability.com/blog/2012/1/30/37signals-still-ha...

It's probably the cheapest, quickest, and least risky option if your engineers' time is expensive (which it most likely is).

Of course, you'll ultimately also want to find bottlenecks and tune the DB and cache the hell out of everything, but scaling vertically will at least buy you some good time so you can do that in a more relaxed time-table.

Related links:

http://37signals.com/svn/posts/3202-behind-the-scenes-the-ha...

http://37signals.com/svn/posts/3090-basecamp-nexts-caching-h...

http://37signals.com/svn/posts/3112-how-basecamp-next-got-to...


Hmm.

Bringing in a consultant is probably your best bet. Database optimization needs to generally take into account YOUR data and usage patterns.

If you want to learn a bit yourself, pick up a copy of "High Performance MySQL." I've found that to be the most useful MySQL scaling book I own.

Figure out your slow queries. Run "SHOW PROCESSLIST" on the server and see what's typically running.

Look at the actual queries - if you're frequently running a query that looks through most rows of your biggest table, that's going to be a tough query to optimize. Instead, look to either caching or regular pre-computation of the results.

I've found that an unfortunate number of people aren't aware of "EXPLAIN" and its use to help figure out query issues. Learn and use it.

You're on a VPS. That may be fine, but it may have horrible I/O throughput. If you're writing a lot of data, or having queries that hitting some sort of mysql or OS cache, IO will be your bottleneck. Make sure it's fast enough. Look at average IO wait times. Test max. throughput.

MyISAM tables may be an issue. MyISAM tables use table level locking - only one session at a time will be able to update the table. This is quite possibly a problem if you're doing any updates or inserts to a table that is also frequently read from. Look to move to InnoDB soon. See: http://dev.mysql.com/doc/refman/5.1/en/internal-locking.html


>Database optimization needs to generally take into account YOUR data and usage patterns.

I'd say this is quite true.

>I've found that an unfortunate number of people aren't aware of "EXPLAIN" and its use to help figure out query issues. Learn and use it.

Yes, we discovered this some years ago.

Thanks for you input.


One of the less documented tuning parameters I found is the threa dcache size. Normally it is a very small value for the number of simultaneous connections you likely support. Increasing it can result in elimination of random pauses/slowdowns.

In general, it is impossible to "tune MySQL" though. You can allow it to use more system resources and in some cases you can get it to be less durable to gain write performance, but to get orders of magnitude increases you need to restructure your data. Try sharing or partitioning data. Increase or reduce the number of indecies. Denormalize data. Archive logs to flat files. Put the OS, /var/lib, and /var/log on separate physical drives. Run MySQL on bare metal since disk IO is often a bottleneck and virtualization can add overhead. Restructure the data. At one point a write heavy app I was optimizing was logging things to the database (that needed to be queries). Each log entry contained extra data we did not query. One speed up that worked was throwing the majority of the extra data into a single gzipped JSON blob. This made it easier for MySQL to fetch fewer pages. Less IO, faster operation.


I believe we did tune and tweak the heck out of my.cfg, but will investigate thread_dcache_size all the same.

I also believe we're archiving at least some logs (and such) to flat files already, but will re-visit this soon too.

Thanks for all your input.


I would not migrate to postgres without knowing the issue first. Here are some suggestions: Enable replication master/slaves+, assuming you have a read heavy architecture. Enable the "The Slow Query Log" [1], which will give you something like /var/log/mysql-slowquery.log, then you can look in here to find queries that take a long time. Find out what the bottleneck is, I'm willing to best you have limited I/O and that is the source of your issues, can you use memcached [2] to limit the database hit rate? You need to remove/limit direct access to the database when people get duplicate data out.

Before you start to optimize I would profile things to get a baseline for how many select/inserts you are doing and how long they are taking, and system load, etc. Based off your 100M statement, you are doing roughly 1157.4 inserts/s. Maybe that's 250 inserts/s during a slow time and 3,000 inserts/s at peak, but it would be nice to know.

  100,000,000 (inserts) / 86,400 (seconds/day)
  = 1157.4 (inserts/s)
I guess it all depends on the insert size and index but you should be able to scale this. If you cannot then you need to partition/shard your data. After reading all this, you are probably thinking, I should just throw more hardware at the problem ;)

ps. Have you stopped to ask if you really need all this data?

[1] http://dev.mysql.com/doc/refman/5.6/en/slow-query-log.html

[2] http://memcached.org/


[1] Been doing already and [2] Will re-visit soon

Thanks :)


Inline with questions 2 & 3 - what's the bottleneck? reading or writing or both? if you haven't already, it's worth looking into a master (writes) / slave (reads) setup - provides a bit of redundancy and lets you scale reads very easily by adding more slaves. It also helps with writes because it can take a lot of pressure off the master.

...assuming you're webvet.com it looks like you're scaling drupal, so these would be my first questions: are you using the views module? if so, kill that, as well as anything else by merlin (panels, etc). and look at your indexes. setup pressflow + percona. master & slave db. reverse proxy caching (varnish or nginx) are all good places to start. hostwise if you're not already on amazon, you can get a lot of mileage out of a service like voxel that lets you mix VPSs (cheap webheads) with physical hardware. it's not great and it may well be worth biting the more expensive bullet and going to AWS for long term needs, but it's a lot better than linode for an underwater db.

my email is in my profile if you'd like more detail - drupal is a bear at first but it can be scaled for a while.


...looks like you've (again, assuming webvet.com) also got an openx server. that'll benefit a lot from the master/slave replica. i'd also look at making spc.js non-blocking as that may be why the pages are blank for a few seconds even after content has been delivered...


Thanks for your input - and no, we're not webvet.com


ah ok, cool - good luck!


Hi!

There are some good suggestions in this thread already. Step #1 is to determine why it is slow.

If it is slow under load, then I am going to suspect that you are hitting table locks - which means you need to migrate to InnoDB (row locking + MVCC). Important to point out: a bigger server may not actually help enough here, because while you are waiting on locks nothing can be done while there is free capacity waiting to be used.

If it is just generally slow related to growth, it can probably be improved by indexing - which will help you reduce what data needs to be in RAM.

In either case, two third party tools to suggest: - pt-query-digest: aggregate your slow query log. You want to use 0 seconds as the threshold, record 20 minutes of slow queries during regular activity. Methodology described here: http://gtowey.blogspot.ca/2012/07/slow-query-log-is-not.html

- pt-online-schema-change: you can migrate from MyISAM to InnoDB online via triggers (not online by default).

Both tools part of http://www.percona.com/software/percona-toolkit


We did use a tool (perl script) called mysqltuner.pl on this sometime last year and it did give us some useful/actionable insight. Will check out what you've suggested too.

Thanks. :)


Have you considered rewriting MySQL in Go?

jk. What is the real nature of the queries, what kind of caching is being used? MySQL should not really get hit that often if you have 150GB of data and a site, unless it's not getting content but running some sort of deep processing. In such a case MySQL seems like the wrong tool for the job.


Without doing an analysis, which is really warranted, here are some coarse grain recommendations:

1) Get off VPS if you can. If you can't check out linode.com; they offer better price/performance than most cloud services.

2) Take a look at the Percona build of MySQL - it is faster, especially with more cores.

3) Use in memory tables if possible

4) Switch to InnoDB or XtraDB (in the case of Percona) tables... MyISAM implements table level locking, whereas InnoDB and XtraDB implement only row level locking.


My first recommendation would be to get a dedicated server with a decent amount of RAM (32GB or more) and put MySQL on that.

1. Greater memory bandwidth 2. dedicated disk resources 3. you can ensure that extra RAM is used as read and write cache.

I am not the cheapest provider by far, and I rent these kind of systems for $200 a month. If you go with someone else the price could be under $100 a month.


Cleanup - Probably one the most effective but often ignored tips for DB management. Do you really need all that data in real time? Are you querying all of it in your app? If not, it makes sense to archive parts of it to another disk. The size reduction would boost performance tremendously.


>Do you really need all that data in real time? Are you querying all of it in your app?

Yes and yes. ALL the data records could potentially be queried via site navigation.


Do you have any sort of caching in your app? Memcache, for instance, can go a long way in helping you scale.


We don't have Memcache (or anything similar) presently, though I do believe we looked at it sometime last year. Will get a re-check done on this - thanks. :)


You should look into caching. If you are talking about webvet.com it looks like a perfect site for using something like Varnish. Most of the content probably doesn't change often so you should be able to have a long TTL.


Option 4) would be to hire an expert in these matters.

100M rows is not a lot of data nowadays.


Thanks - 4) has been on the cards, I just forgot to mention it.


Not particularly large for MySQL. Consider using master / slave to separate reads and writes, cache heavily and consider sharding if appropriate.

Alternatively, as suggested elsewhere, give Percona a ring.


You give us two values (150GB and 100M+ rows). That is not enough to give you ANY suggestions other than give us more info.


Please let me know what more info would be helpful and I'll be happy to provide the same.


are queries uniform over the data. For instance most of queries conerns only a little fragment of one table for instance most recent rows. Can you shard them ?


How slow is "slow" for you right now? And writes vs. read? Which MySql storage engine are you using?


Let me try to answer: Slow - as in average Top loads (on CentOS) staying between 1 and 2 pretty much throughout the day; closer to 2 rather than to 1, peaking ALL the way up to 10 and even beyond several times a day, for periods as long as 15 to 30 minutes.

Storage engine: MyISAM


Do you have some other measurements? slow_log, queries without indexes, low key cache usage? Load is just one metric and can be very deceiving. Do you have any backup system which hits the disks at the same time, batch jobs, or something similar? Depending on your workload, "load" can vary - it just means the writes are being queued up.

It's definitely not a good sign, but try to get more specific. I've seen servers doing heavy network I/O with "normal" load over 5 times the number of cores.

Also, unless you're doing loads of selects and very few modifications, you could gain a lot by switching to InnoDB, or XtraDB, rather than MyISAM.


Have to second this. You should monitor load for sure... but depending on how many cores you are running a load of 1 or 2 could be absolutely nothing in the grand scheme of things.

Personally, the most important metric for us is average query response time.

Additionally as others have stated where you can use memcache or some other key/value store you should be working on implementing that.


The spike to 10+ sounds like it might be io blocking rather than CPU (which would be stuff like queries). If you look at your time wait (%wa) in top, and the output of iostat, you should be able to get an idea. If you're on VPS systems with older SATA drives that could be the bottle neck. If the Kernel is waiting for the disk(s) to be able to write more data to the storage bus/buffers it will block the storage process which will result in blocking for any application trying to access storage. As more and more processes block they'll go into poll/sleep loops and all those instructions will seem to spike the CPU load and make the CPU look busy even if the CPU is just sitting there saying "storage is still busy" over and over again. You probably have 1.00 to 2.00 true CPU load (queries, etc) which isn't that bad. But, you generally want to keep your CPU load below 1.00/per cpu. Otherwise, there is CPU level blocking. If you were swapping a lot (which you can also tell from top) that would aggravate and storage subsystem overloading. I would just post the output from top and iostat during one of the spikes here and see what people say rather than hiring an expensive consultant. You could probably find an experienced sysadmin to look at it as a favor, as well. They should be able to tell you quickly if you really need someone to look at your MySQL and app code to address this. Or, if you just need to add more nodes to your MySQL cluster (or look at something else like a NoSQL type setup).


We've suspected that our problem is mainly of I/O, very close to what you've described above.

>I would just post the output from top and iostat during one of the spikes here and see what people say rather than hiring an expensive consultant.

Thanks for the suggestion. Will try and do just that soon.


Call Percona first. $10K well spent.

Then (probably, and assuming you have the basics right) just throw hardware at it.


I have a little bit of experience dealing with large datasets and MySQL. This is going to be a little counter-intuitive to standard startup culture advice, but generally, the engineering effort to optimize databases past the initial live flow is not worth throwing money at the problem. You have to weigh the fixed cost of the optimization and the variable cost of the upkeep of the system against simply patching the system for now.

If you are willing to throw money at the problem, which you seem like you're wavering on, here are some options on how to throw money at the problem:

1. If you are willing to be hands on and maintain your MySQL cluster yourselves, you should move away from EC2 (I am presuming) and to a local collo where you can install super fatty SSD computers with massive RAID setups that will greatly improve your performance per server (on the magnitude of up to 10x)

2. You can do a hybrid solution with your existing infrastructure. It sounds like a lot of the data is not necessarily going to be used that much (tracking all events probably, from what I am assuming, or something big data intensive like that), you can throw that into a separate database infrastructure.

If my assumption is correct, I am presuming you are using a large portion for the database for stat tracking, and not user information, then the "proper" way to throw it somewhere is to put it into S3 as fat logs that you can later use something like map reduce to process. Other options for more efficient storage are Redis and MongoDB.

3. We have taken this to the extreme and actually moved all of our live data to DynamoDB. It is a NoSql database storage system operated by Amazon. This has let us concentrate on features and the user experience. The cost is our database is probably 5-10 times more expensive than a self rolled Redis SSD backed equivalent hosted at our local collo.

------

Just some other general database scaling advice for MySQL. The steps to scaling your MySQL database for live use are as follows:

1. Do not use any relational calls on your data. Relational calls are useful for databases that aren't used to manage live web transactions as it can seriously back up your system.

2. Put indexes on all of your heavily used columns for finding data.

3. Look at the slow queries and optimize those queries out.

4. Call up Percona and ask for a full consultation to make sure your configs and systems are set up properly for MySQL.

http://www.percona.com/products/mysql-consulting/overview

5. Email me if you have more questions charlesju gmail


Hey, I'll have to check with the others here, but I think we have not been aware of 1. !!! Will certainly investigate this ASAP - thanks.

2. and 3. - already implemented to the max (of our abilities :)).


I would put Percona higher in that list if you have the money.


stackoverflow -> this way




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: