At $work we are managing a fairly heavy MySQL instance and having scaling issues (~5TB, lots of blobs, and frequent user-managed schema changes for interactive/batch data analysis, many concurrent complex joins involving multiple 1M+ row tables) and after reviewing the various MySQL derivatives, it seems that the percona ppl are really focusing on 'real deal' operational reliability/scalability issues... (e.g. minimizing transaction locking issues for coherent backups, real-time db cloning, replication data integrity testing, etc). I strongly recommend anyone looking at mysql flavors to not overlook their offerings. Looking forward to doing some production tests of their XtraDB mysql flavor in the coming months (for now usage has been limited to unit tests and using the percona toolkit/xtrabackup)
Also: no I am not a paid sponsor.
If everything is behind an ORM, have you considered trying out PostgreSQL w/ the MySQL FDW? pgsql 10's improvements to FDW's make it a truly viable option.
Also to clarify: I'm just saying this as pgsql seems to handle queries that involve disk access and large datasets much better. I have found that as long as your hot dataset fits into the innodb buffer pool and/or you're doing only key lookups (e.g.: select * from tbl where pk=1;) MySQL is most certainly faster for real-world usage. If you're pulling and sorting/filtering millions (or billions) of rows per query, I find that pgsql stands up extremely well to that. Even in a single huge server setup.
Also a semi-unrelated ditty, but as someone who has to run migrations on pretty big (100m rows) tables, ALTER TABLE LOCK=NONE ALGORITM=INPLACE is pretty great (here's a table of what you can user per-situation https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-...)
If you made it this far in my comment you'll probably realize I didn't grasp the article well. I just re-read it and tl;dr, MyRocks works well (as expected) with datasets that don't fit into the innodb buffer pool. :)
Would be good to dig a bit deeper into this - thanks for the pointers.
edit: I'm getting nostalgia of doing things with the toolkit like running a 24 hour lagged slave because one Developer kept dropping tables from the live primary database he'd decided were "obsolete" (when they weren't). Sadly we could never remove his root access.
Also of particular value: pt-online-schema-change, pt-table-checksum and pt-query-digest. The latter you can do some neat stuff, like tcpdump live traffic and pump it in to find out interesting stuff about live performance.
Back in the day I used to be a Cloud Engineer at a small ISP/VPS provider + managed support(largely LAMP). Percona server was our MySQL install and we used their other tools pretty heavily for operations(backups, migrations, etc).
If I ever really needed a MySQL consultant, they would be top of the list for sure. I would say they are Tier N support, and I'm somewhere around Tier N-Y lol.
Certainly not by MySQL users. Percona has run the yearly MySQL conference (now Percona Live) for years and has done excellent work on tools, DBMS distributions, and performance. Vadim's work in the cited article is pretty typical of the high standard of performance analysis.
(Nice work Vadim.)
for those of us less 'in the know', i think there is a tendency to look at mariadb and oracle, and kind of overlook percona - this is what I was getting at.
From these results you would think it makes sense to have RocksDB the default for MySQL and then have InnoDB be there for the larger users.
Like most things, the devil is in the details.. And the use case.
With a lot higher compression than possible with Innodb MyRocks can be in memory (File Cache) when Innodb workloads are already IO bound. Plus you can save a lot on disk storage (and IO if you're paying for it)
LSM however does not require as much disk IO for inserts even if data is much larger than memory
Myrocks uses a clustered key for the primary key from what I have read. Isn't this the same as innodb? Is it not a B-Tree?
Could you elaborate on what's special about indexes - primary or secondary in Myrocks?
I think I may have misinterpreted your originalcomment.
Fractal Tree Indexes are interesting as well as they are optimized for hitting the disk:
Benchmarks are nice to show that somethings work better than others in theory or "perfect" examples, but real databases tend to be much more complicated and messy, and sometimes just writing smarter queries, creating an index, or changing application logic may have a much bigger impact.
Under SQL engine there tend to be some lower level API