
Dynimize: Speed Up MySQL with CPU Performance Virtualization - nwrk
https://dynimize.com/
======
Rafuino
Does the presence of a HDD or SSD across the three architectures make a
difference, or is this purely a CPU problem to be solved? I see the testing
made sure to fit all tables in memory with a large enough innodb buffer pool,
so was storage even a factor?

[https://dynimize.com/blog/discussions/dynimizer-mysql-
cross-...](https://dynimize.com/blog/discussions/dynimizer-mysql-cross-
microarchitecture-analysis/)

It seems to me the only system with SSDs (Ivy Bridge) sees higher transactions
per second improvement than the other systems.

[https://dynimize.com/performanceSpeedup](https://dynimize.com/performanceSpeedup)

~~~
davidyeager
These specific tests we did were read only where the working set fits into
memory, so SSD vs HDD doesn’t matter because they were CPU bound tests to
highlight the performance improvements. So storage isn’t a factor here. If
that wasn’t the case then faster storage would help make it more CPU bound and
Dynimizer would make a bigger impact when using SSD vs HDD.

------
willvarfar
This doesn't seem to be JITing actual queries (a la
[https://www.pgcon.org/2017/schedule/events/1092.en.html](https://www.pgcon.org/2017/schedule/events/1092.en.html))

It seems to be JITting the generic binary code.

Would these same gains be seen from compiling MySQL for the actual target
architecture gentoo-style e.g. -farch=native ?

~~~
viraptor
It looks like it's taking measurements from live process, so probably not
arch=native. But most likely the runtime profile guided optimisation would do
a similar job.

------
leesalminen
Nice website, and very cool looking product!

Has this been run in production anywhere? Were there any instances of
corrupted writes? Inaccurate reads?

What’s your plan for monetization down the road?

~~~
davidyeager
Yes it’s definitely being used in production. We’re starting to collect
production use cases and will provide some on our website soon. Here’s an
example website with growing traffic using MariaDB + Dynimizer with Wordpress,
and they found Dynimizer very helpful:
[https://www.cgmagonline.com](https://www.cgmagonline.com)

In terms of innacurate reads or corrupted writes, that would be a bug if it
ever happens. That would not be part of normal operation and would not be
expected. That said, all software including MySQL, gcc, and Linux are full of
bugs and Dynimizer is not immune to that of course. However it has been stress
tested thoroughly with MySQL, MariaDB, and Percona Server up to MySQL 5.7,
MariaDB 10.2.

~~~
leesalminen
Understood, and thank you for the reply. I look forward to seeing some use
cases published. Definitely bookmarking the site now.

Best of luck with this. Exciting stuff!

------
desdiv
Very, very cool. Section 7 of the manual[0] gives some hints on how this black
magic works:

> 7\. Workload Requirements

> To obtain benefit from the current version of Dynimizer, all of the
> following workload conditions must be met:

> A small number of CPU intensive processes - On a given OS host where the
> workload is running, the workload must be comprised of one or a few CPU
> intensive processes. Optimizing a large number of processes at once is not
> recommended.

> Long running programs - The processes being optimized have long lifetimes,
> and their workloads are long running in order to amortize the warmup time
> associated with optimization.

> x86-64 - Optimized processes must be 64-bit, derived from x86-64 executables
> and shared libraries, which must comply with the x86-64 ABI and ELF-64
> formats. Most statically compiled applications on Linux meet this
> requirement.

> Dynamically Linked - Target processes must be dynamically linked to its
> shared libraries. Statically linked processes are not yet supported. Most
> Linux programs are dynamically linked.

> No self modifying code - The target application must not be running its own
> Just-In-Time compiler such as those found in Java virtual machines. This
> therefore excludes Java Applications.

> Front-end CPU stalls - The workload wastes a lot of time in CPU instruction
> cache misses, instruction TLB misses, and to a lesser extent branch
> mispredictions.

> User mode execution - Much of that wasted time is spent in user mode
> execution (as opposed to kernel mode), as Dynimizer only optimizes user mode
> machine code.

> Because of these requirements, Dynimizer takes a whitelist approach when
> determining if programs are allowed to be optimized, with MySQL and its
> variants being the currently supported optimization targets on that list for
> this early beta release. Other programs are not currently supported, and
> while they can be used with Dynimizer, they should be very thoroughly tested
> by the user or system administrator before being deployed in a production
> environment.

> Future versions of Dynimizer may eliminate many of these workload
> requirements, broadening the variety of applicable scenarios as well as
> further increasing the performance delivered in previously beneficial cases.

[0] [https://dynimize.com/manual](https://dynimize.com/manual)

~~~
jnwatson
The real important bits: "Front-end CPU stalls - The workload wastes a lot of
time in CPU instruction cache misses, instruction TLB misses, and to a lesser
extent branch mispredictions".

My educated guess is that it relocates the hot path of the text segment to
better pack into the instruction cache. Cool.

~~~
davidyeager
Sure does.

------
jrk
Cool, but a bit grandiose and historically arrogant:

"[T]he industry's first CPU performance virtualization software"

"A New Frontier For JIT Compilers… JIT compilers use as input a virtual
machine code format… Dynimizer [uses] real machine code as input instead"

Just 20 years after Mojo, Dynamo, DynamoRIO, etc:

[http://program-transformation.org/Transform/BinaryOptimisers](http://program-
transformation.org/Transform/BinaryOptimisers)
[http://www.dynamorio.org](http://www.dynamorio.org)
[https://www.complang.tuwien.ac.at/andi/bala.pdf](https://www.complang.tuwien.ac.at/andi/bala.pdf)
[http://cseweb.ucsd.edu/~lerner/mojo.ps](http://cseweb.ucsd.edu/~lerner/mojo.ps)
…

~~~
davidyeager
Of course these projects were a major source of inspiration for Dynimizer.
However they are not JIT compilers. They are more like virtual machines or
binary translators. Today DynamoRIO and Mojo (which ended up as Intel PIN) are
used for program introspection and analysis, not for application acceleration.

~~~
jnwatson
"Dynamic binary translation" is the term of art. Which of course VMWare and
VirtualPC were doing 20 years ago in dynamically translating x86 ring-0 code
to ring-3 code.

Dynamizer is translating x86-64 to faster x86-64, but the concept is the same.

DynamoRIO was actually talked about for application acceleration. There was at
least a PoC that did dynamic function inlining.

------
ShroudedNight
This is _very_ cool stuff.

I would be very interested to hear a sampling of the war stories that came out
of building this. I had a friend working on the IBM zPDT JIT at one point, and
while I unfortunately can't remember many of the details at the moment, I
remember boggling (in that sort of emergently satisfying way) at some of the
'oh shit' moments that came up.

~~~
davidyeager
Lots of ‘oh shit’ moments. Lots.

------
davidyeager
Here are the slides from Percona Live 2018:

[https://www.percona.com/live/18/sites/default/files/slides/A...](https://www.percona.com/live/18/sites/default/files/slides/Accelerating%20MySQL%20with%20JIT%20Compilers%20-%20FileId%20-%20129518.pdf)

~~~
SafPlusPlus
Mentioned as the installation method in those slides:

    
    
      sudo bash ­c 'bash <(wget ­O ­ https://dynimize.com/install) ­default'
    

Come on, please don't teach people horrendous security practices... :(

~~~
stephenr
It’s depressing how commonplace `curl|(ba)sh` has become.

This will sound clichéd but I blame the rise of “poor mans devops” whereby
management fires all the ops, and lets developers manage infrastructure.

~~~
da_chicken
I agree, it may be cliche, but I think the exact same thing whenever I see
this kind of practice, too. Or that the developer that has never had to manage
a live system with users that know his phone number and his boss's phone
number.

"Oh, this is just for a test mock up. Nobody is supposed to actually use this
to install it for real."

Well, to experienced people it makes you look moderately stupid, and to
inexperienced people it looks like an elegant solution. It's actively hostile
to secure system planning.

It reminds me of the NPM left-pad debacle[0] and some of the criticism[1] that
came up from that.

0:
[https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/](https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/)

1: [http://www.haneycodes.net/npm-left-pad-have-we-forgotten-
how...](http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-
program/)

~~~
stephenr
I’ve given up waiting for nodejs to become a reliable environment. Just
recently the `is-even` package came to light and highlighted that things
aren’t getting any better than when leftpad was a thing.

I can’t wait to see tc39’s response to the `is-even` shit show after they
decided to just add leftpad to the stdlib.

~~~
da_chicken
Wow, I hadn't heard about the is-odd/is-even/is-number thing. That's hilarious
and awful.

Reminds me of: [https://github.com/jezen/is-
thirteen](https://github.com/jezen/is-thirteen)

~~~
stephenr
The “best” part of it all is that apparently js engines have an internal
optimisation for `foo % 2 === 0`, because it’s such a common thing.

This clown was using a bit wise operation in `is-even` “because everyone
already knows about % 2 === 0`, and thus was hurting performance (on top of
whatever extra memory is used for the module, function call overhead etc)

------
martin_
This looks like seriously impressive technology, and yields impressive
results, but: I don't think I'd be comfortable with the idea of something
rewriting my database in production. What I -would- probably be OK with is
having dynamize analyze my workload in a staging/load test environment, and
then producing a new binary for me which I could then run through its paces.

Beyond actual errors being produced, I'm wondering what'd happen in weird
scenarios such as one where by primary gets heavily optimized for its write
load and creeps up to, say, 80% CPU or so at peak.. What then happens then if
my replica which has been heavily optimized for its read load gets promoted in
a failure scenario and gets pegged at high CPU?

Final thought here is if this tech really is solid, when is AWS going to start
shipping it with my VM?

~~~
davidyeager
This does not rewrite your database. It optimizes the live in-memory machine
code of the mysqld (MySQL Server) process. It must run on the same OS host as
the mysqld (MySQL Server) process being optimized. So if you are using this on
the master and not on the replica, the replica won’t be touched. Hope that
makes sense.

~~~
grogers
I think the point was that profile guided optimization relies on the workload
staying relatively fixed. If the workload suddenly changes (like promoting a
readonly slave to be the writable master) the assumptions made by PGO may not
be valid, and performance could be worse than if no modifications were made in
the first place.

I think you'd just have to measure scenarios before using it in production.

~~~
davidyeager
Dynimizer can react to drastic changes in workload and reoptimize depending on
how you configure it.

------
indexerror
Very cool product.

From the /product page:

> ...It profiles applications using the Linux perf_events subsystem and
> interfaces with a target application's machine code through the Linux ptrace
> system call. When optimizing a program, it loads a code cache into the
> target program's address space...

------
ggambetta
@davidyeager: In the legend of the graph in Dynimizer System Overhead in
[https://dynimize.com/product](https://dynimize.com/product), both series are
labelled as "Without Dynimizer".

Couple of questions. Since this seems to be a very general technology, why the
emphasis on MySQL (and DBs in general)? Marketing? Also, I found Dynimize vs
Dynimizer confusing - is that company name vs product name?

~~~
davidyeager
Yes we will correct the legend in that graph, thanks for reporting it.

Dynimize is the company, Dynimizer the product. We may ditch the name
Dynimizer and just go with Dynimize to avoid confusion. Thoughts?

It is a general purpose approach to optimization and MySQL is just a starting
point. It was chosen first because it has a broad user base and is relatively
easier to support compared to many other Linux programs: single process
architecture, long process lifetimes, OLTP workloads are known to spend much
of their time in front-end CPU stalls on the CPU side which are effectively
targeted with profile guided compiler optimizations, and it’s statically
compiled. We’ve tried it on MongoDB and seen similar benefit but not supported
yet. Coming soon. Windows will probably require some driver development for
effective sample based profiling and will happen later on. We will improve the
effectiveness of our other optimizations that don’t target front-end CPU
stalls and better support multiprocess workloads with short process lifetimes,
which will allow us to target many other types of programs in the future.

Hope that clarifies things a bit.

------
psandersen
This looks really interesting!

Wonder if it could be integrated into the whole OS/kernel, and if it can help
with typical ML workloads like running a randomforest in scikit learn.

------
lucio
Very simple-to-use product. There's almost no friction and you get 10% extra
TPS. They'll make a lot of money.

~~~
blantonl
How are they making money?

------
z3t4
I'm very skeptical running a script from a random web page that promises to
make my program faster.

~~~
etaioinshrdlu
Indeed, it is literal insanity to run this on anything production that you
care about.

~~~
jnwatson
It isn't any different than running your production in a container, a VM, or
the cloud, all of which can significantly affect what's actually going on.

------
JeanMarcS
Does it makes a difference on VM / containers / VPS ? Or does-it works only on
server CPUs ?

~~~
davidyeager
Works with VMs or on a VPS. We have done a lot of testing on KVM, Xen, and a
bit on VMware. Still need to do a bit of work to properly support containers.

------
chatmasta
I wonder, do the authors have a reverse engineering background? It seems like
a lot of concepts from reverse engineering were applied to the JIT compiler,
which I find incredibly cool.

------
jacob019
The main infographic shows an impressive improvement in TPS, but what about
single query execution time? I often see the CPU maxed out by complex queries
against large tables.

------
lincolnq
So neat. I'd love to read a paper about how this works in more detail. Does it
exist?

------
jlgaddis
What I saw of this looked pretty neat... until the web page crashed Safari (on
iPad).

------
spacemanmatt
Heyo, workloads that fit in RAM are not that interesting. I hope they have
identified their market ahead of time, for their sake, because I don't think
it's valuable.

~~~
davidyeager
That was used to highlight the maximum improvement expected. When the working
set doesn’t fit into RAM then you will get some combination of a smaller
amount of performance improvement plus a reduction in CPU usage. The faster
the storage, the greater the increase in tps that you’ll see. Note that
replication is often CPU bound. We will be applying this to non-database
workloads as well in the near future.

