
How far can you go with MySQL or MariaDB? - CharlesW
http://jfg-mysql.blogspot.com/2017/07/how-far-with-mysql-mariadb.html
======
willvarfar
The elephant in the room is that there is no mention of TokuDB here.

My personal experience is that InnoDB performance drops off a cliff as tables
grow and parts of tables stop fitting in RAM. TokuDB just keeps on going.

TokuDB also compresses your data really well.

My own stats:
[https://williamedwardscoder.tumblr.com/post/160628887798/com...](https://williamedwardscoder.tumblr.com/post/160628887798/compressing-
mysql-databases)

~~~
fweespeech
> My personal experience is that InnoDB performance drops off a cliff as
> tables grow and parts of tables stop fitting in RAM. TokuDB just keeps on
> going.

Yes but at least in my experience as things fit in RAM, InnoDB is still the
correct choice.

TokuDB is great once you run out of RAM-related solutions but given the price
point RAM is at these days, we are talking 100GB+ tables which probably
shouldn't be one a single machine anyway.

~~~
willvarfar
Ha! A few gb ram and a few tb disk is real cheap in the cloud.

If your dataset fits in ram, and when it doesn't you have to buy more boxes
with more ram rather than just buying more disk, then I think you are making a
very uneconomical trade off.

~~~
fweespeech
> Ha! A few gb ram and a few tb disk is real cheap in the cloud.

> If your dataset fits in ram, and when it doesn't you have to buy more boxes
> with more ram rather than just buying more disk, then I think you are making
> a very uneconomical trade off.

It is all in what lower latency is worth to the bottom line and how well you
scale horizontally.

~~~
willvarfar
My link described performance too. A cluster of innodb boxes is going to be
slower than a single tokudb box.

(Src: have run each, and stuff like shard-query to parallelise queries and
scale them out)

------
olavgg
I manage a multi-TB system on PostgreSQL. There are no performance issues
other than making sure you have good indexes and make sure the queries use
those indexes.

Backup isn't that bad either as this run on ZFS that support blazing fast
atomic snapshots of all related filesystems like pg_xlog and tablespaces.

~~~
tsuraan
I'd love to hear more about your tuning around both PG and ZFS. I assume
pg_xlog is on its own fs with 8K records, but beyond that, anything
interesting, or any guides that you found especially useful?

~~~
olavgg
pg_xlog has 8k records with lz4 compression. You can disable full-page writes,
though I've never had any issues with it on either.

For tablespaces the recordsize can be anything from 8k to dynamic / 128k. They
all have pro/cons. Greater record size improve compression and sequential read
performance but small random updates becomes a lot more expensive. I feel a
recordsize of 16k or 32k with lz4 compression is a safe setting for optimal
performance with good compression.

Other useful topics related to performance are: Use multi-
column/functional/partial indexes, as this will help you design a database
where most of your 'hot' indexes can fit in RAM. If you have too many indexes,
you will do index scan on disk, which is painfully slow. I many cases a full
table scan (sequential read) would be faster instead.

BRIN indexes are great for large tables that can be sorted on for example on
timestamp.

Common Table Expressions, for recursive queries.

The query planner sometimes needs help as it will start doing full table scan
on a table when it could just have used an index. This is related to table
statistics which you need to alter. You can read more about this here:
[https://www.postgresql.org/docs/9.5/static/planner-
stats.htm...](https://www.postgresql.org/docs/9.5/static/planner-stats.html)

Adjusting the seq_page_cost and random_page_cost can also help the query
plannner.

There is probably a lot more, but you learn as you go :)

~~~
aeorgnoieang
Unrelated, but the link in your profile to your site Boost AI is HTTPs but the
site itself only seems to support HTTP.

~~~
olavgg
Thanks!

------
idlewords
The limits of properly-tuned MySQL on appropriate hardware are impressive.
Note that the size constraint in the examples given here is almost always
administrative (ease of doing backups, replication, archiving) rather than
hitting insurmountable performance limits.

I have a theory that a lot of people who are dismissive of modern relational
databases for 'big data' simply never took the time to learn to tune them
properly as the data sets grew in size. It's a painful process, and there is
an intimidating number of switches and knobs, some of them in the filesystem
and kernel.

In particular, running large databases on virtualized hardware like AWS really
brings the pain.

~~~
falcolas
Agreed on all points. They (MySQL and PostgreSQL) were written and optimized
to run on constrained hardware. When you remove those constraints (which is
where the configuration pain comes in), they really are capable of absurd
volume.

> running large databases on virtualized hardware like AWS really brings the
> pain.

The pain, I've found, mostly stems from the fact that it is a VM, and the
performance costs associated with that. When you assume that your disk is
local, and it's actually a network volume with shared network bandwidth,
you'll get kicked right in the assumption. Or you assume that you have a
static amount of memory, but the balloon drivers punt you into swap, or that
you will get reliable interrupts from the OS...

It's gotten better over the years, but it's still a mess.

Oh, and I've said it before, so I'll say it again. If you're running MySQL in
RDS, go in and turn up the 'innodb_log_file_size' setting to a bare minimum of
2GB, 15GB if you have the disk space. Go, do it now!

~~~
bulldoa
Can you explain why turning up innodb_log_file_size is good?

~~~
falcolas
In short, the default is 128MB, which results in block-the-world dirty page
flushes when it fills due to write activity. By going to 2G (I frequently
recommend 15 when you have a lot of memory), you can buffer a lot more writes
during times of peak load, and the idle flushing of dirty pages can generally
keep up.

The practical effect of changing this value: a 2xlarge RDS host goes from 85%
CPU utilization to 10%, with faster response times, for a moderate write load.

The downside, and why it's not the default for RDS, is that it eats your RDS
storage by 2x the value set.

~~~
bulldoa
ah that makes sense, is there a innodb_log_file_size equivalent in postgres
btw?

~~~
anarazel
Yes. Before 9.5 it was named checkpoint_segments, now it's called min_wal_size
(reserved, won't go below) and max_wal_size (soft max space limit, will
checkpoint more frequently).

------
ahachete
In 2013 I co-founded an experiment with a similar goal: test the maximum
number of tables (empty tables, btw) that you could create on a PostgreSQL
database.

The answer lies in the "Billion Tables Project" :)
[https://www.pgcon.org/2013/schedule/events/595.en.html](https://www.pgcon.org/2013/schedule/events/595.en.html)

------
jjirsa
Missing from the article:

\- HA

\- How to recover some subset of 37TB of data without restoring all of it

\- What happens when your write throughout needs to exceed the capacity of a
single disk

Nobody gives up features like joins in favor of a distributed shared nothing
DB because they feel like they need more space, they do it for HA, more cores,
more spindles, fault tolerance and isolation.

------
cdevs
I think some of the advice here is good for high traffic sites but my company
only sees about 1400 people a day use our data although we do processing of
information for each client in millions of jobs a day. In our sceneraio we
have one MySQL instance on aws aurora, about 50 tables and 3 terabytes of info
total. Most of the size comes from about 10 tables and are mostly used for
joins after data is quickly searched on elastic search. I will put down the
myisam search we use to use before elasticsearch which was drowning as we
reached 100M to 200M rows when things weren't pushed over to their own machine
and could hog up ram. When fulltext is planned from the start along with
backups and other things it can save a ton of headaches.

------
overcast
Facebook is how far you can go.

~~~
jimbokun
Right, but that means writing your own storage engine [1] to get to their
scale, right?

[1] [https://code.facebook.com/posts/190251048047090/myrocks-a-
sp...](https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-
write-optimized-mysql-database/)

~~~
tyingq
_" While there are some databases at Facebook that will still use InnoDB,
we're in the process of migrating to MyRocks on our user database (UDB) tier"_

So, still impressive to me. That custom storage engine is just for the "user
database" tier. Given their user count, that doesn't seem unreasonable.

~~~
felixhandte
MyRocks is intended to be a general purpose storage engine for MySQL. It was
not built specifically for the UDB tier.

~~~
tyingq
Right, but "some databases at Facebook that will still use InnoDB" was a
direct quote.

The context being that InnoDB got FB pretty far, and that UDB appeared to be
the first area where any pain was notable enough to need something else.

~~~
evanelias
It's not really a question of InnoDB "pain" though; Facebook's primary
motivation for MyRocks is substantially better compression rate than InnoDB.
It adds up to very compelling cost savings at FB scale.

This comparison is a bit apples to oranges anyway, with respect to the
original article :) JFG's post was talking about extreme cases within a single
MySQL instance, whereas Facebook is extremely heavily sharded.

~~~
tyingq
"pain" is just shorthand for whatever drove the switch.

Was just trying to reinforce the point that MySQL gets you pretty far.

------
derekperkins
Vitess deserves more love here on HN. It's incredible the way it gives you the
best of both worlds (RDBMS and NoSQL). Most operations end up in a single
shard, so you have no performance impact and still have the benefit of built
in MySQL joins and relational integrity. It is built to be container-native,
so the recommended usage is to split into 100-300 GB chunks, which
simultaneously makes schema changes take 1/(n shards) of the time for a normal
ALTER TABLE.

------
jimbokun
This is very impressive, but at some point it becomes more cost effective to
rack up more reasonably sized commodity hardware with a "big data" database,
no?

(Using "big data" to mean sacrificing low latency joins and full transaction
support, with the caveats that many "big data" systems are making progress at
getting those things to work.)

Of course you can shard, but there is a significant maintenance cost to that
as well, and requires your data model can be easily sharded.

~~~
falcolas
Under 1 TB is still commodity hardware with many of these DBMS. 5-10 TB is
commodity hardware plus a few extra drives.

Sharding, or splitting out into "big data" databases has its own cost, a cost
which is frequently greater than the $20k worth of hardware.

~~~
jimbokun
An example in the article was over 37TB, and another was 205TB containing
"hundreds of thousands of uncompressed InnoDB tables".

Maybe that's a system that could have been better served by something like
Cassandra? (Depending, of course, on access patterns, complexity of queries,
etc. etc.)

