
Show HN: I built a MySQL storage engine which is InnoDB compatible - cruppstahl
https://upscaledb.com/blog/0014-beta-release-of-mysql-0.0.1.html
======
endymi0n
No transactions? Single-threaded? Global lock?

Don't want to put anyone's work down, but at first sight this looks much more
like MyISAM than InnoDB - and just like this, MyISAM's a helluva lot faster
than InnoDB for single writer workloads throwing ACID out of the window.

Just that MyISAM is battle tested over the course of several years. Again,
interested to see what comes out of it, but if history is a lesson, it's
usually easier to go from correct to fast (PostgreSQL) than from fast to
correct (MySQL et al.).

~~~
nine_k
Version 0.0.1 looks much like an experiment in writing a working storage
engine. It lacks a lot of critically important things. It's hard to say
anything at this point about any real-world scenarios where this storage
engine would be useful.

What concerned me a bit was absence of an interesting and promising general
idea behind the implementation, the reason to implement a new engine. Maybe it
is there, but the blog post does not say anything about it.

~~~
endymi0n
Exactly this, I was missing the big picture. Having worked with at least a
dozen databases, my claim today is that of Tony Million: Don't be a hipster
and just use Postgres. Don't get me wrong, I also work with Aerospike,
Redshift, Redis and a few other specialized data stores, but I always know
_why_ Postgres wouldn't do there. For any specialized data store, at least
provide a rationale _where_ the niche is where that thing is better than
anything else that's on the market. Simply claiming it's "fast" without
further specifications worked out for MongoDB, but I hope it's for the very
last time. Optimization means tradeoffs.

------
evanelias
The upscaledb FAQ indicates "it is not yet concurrent; it uses a big lock to
make sure that only one thread can access the upscaledb environment at a
time".

InnoDB is designed for concurrency (using MVCC, granular locking, etc) so I'd
expect it to be slower at single-threaded workloads than another engine that
skips all that.

Only using single-threaded benchmarking is a bit misleading, imo. This is
mentioned in the article but only in a small bullet point towards the bottom.

~~~
cruppstahl
I ran sysbench benchmarks with 30 concurrent connections. The performance gap
between InnoDB and upscaledb shrank a bit, but not much.

The reason is that most of the performance is spent in MySQL and not in the
key/value store, and then it does not make a big difference if the key/value
store is concurrent or not.

In my experience the assumption of "concurrent = fast" is a misconception.
Right now upscaledb moves certain operations (i.e. flushing dirty buffers) to
the background. It is better to have fast single-threaded code instead of
multi-threaded code with a huge locking overhead. A compromise would be to
move the lock to the database level (instead of the Environment, which is
basically the container for multiple databases), and make sure that there's no
shared state between the databases. But that actually does not have that much
priority for me because I do not expect to win that much performance.

~~~
evanelias
30 concurrent connections is a very low number, compared to uses of
MySQL/InnoDB at real scale. What happens at higher numbers?

I am not arguing that "concurrent = fast". My point is real systems have a
higher level of concurrency as a baseline.

InnoDB supports granular concurrent access because real workloads need this.
Systems that have poor stories around concurrency -- MyISAM, Redis, pre-
WiredTiger MongoDB -- definitely hit real scalability issues under high-volume
workloads.

------
spotman
Performance is great, but what is more interesting is maintaining that
performance in the midst of failures and things going wrong.

I don't know anyone who chooses InnoDB and says "gee I wish it were faster".
While it isn't the fastest show in town, it is a known quantity, and how it
breaks is well understood.

So if I was going to use a different storage engine, would want something more
than its a bit faster, in some cases.

Having said that, what a wonderful amount of work, and don't stop hacking on
it!

~~~
mey
The most recent break I encountered was ibtmp1 exploding to 100's of GB and
killing the box with InnoDB, so I'm not sure "how it breaks is well
understood".

~~~
philjohn
Funnily enough I encountered something similar years ago on Microsoft SQL
Server 2005.

Never got to the bottom of it either.

~~~
dspillett
A lot of cases I've seen where tempdb suddenly grows massively is an runaway
Cartesian product in a query. Sometimes this is intentional but was never
expected to be used on large data, but usually it is accidental and worse an
inexperienced DB person has "fixed" the problem with multiplied output rows by
adding "distinct" instead of fixing the query logic properly.

------
marmaduke
Stopped reading after no transactions. Theory question: musn't transactions
reduce necessarily performance?

~~~
cruppstahl
Not necessarily. I ran sysbench with InnoDB and manually switched off
transactions, and it was a lot slower. I don't know why - didn't look into it.

upscaledb _does_ use transactions. I.e. if you insert a row with a primary and
a secondary index then upscaledb inserts two key/value pairs in two databases.
All these operations are wrapped in Transactions. They are just not yet
supported on SQL level.

~~~
morgo
In InnoDB everything is a transaction. If you do not begin/commit then each
statement will be its own transaction.

Batching a group of statements together in a small transaction is usually
better because it reduces log flushing. ACID only needs to be guaranteed on
commit. Similarly, applying transactions in parallel is faster because of
group commit.

------
morgo
MySQL Product Manager here. Any feedback on the Storage Engine API?

~~~
cruppstahl
It's ok! The example storage engine could implement more functionality, just
to demonstrate i.e. how rows are serialized/deserialized etc. The test suite
was very helpful.

------
_Codemonkeyism
My physics teacher would have slapped those diagrams around my head asking "Is
the y-axis 6000 apples?"

------
daenney
I think this could be pretty neat, it'll be interesting to see how it evolves.
However, looking at the comments people have left here I would suggest putting
up a table/comparison with InnoDB side by side, detailing which features are
implemented, which are not and which are but aren't available through the SQL
layer yet.

This would also put the "compatible with InnoDB" thing in perspective for
people and potentially clear up some other confusion.

------
Domenic_S
Label your axes!

------
tombert
This looks pretty sexy. Are there atomicity guarantees?

~~~
cruppstahl
BEGIN/COMMIT is not yet supported. But that's on my TODO list.

~~~
Animats
Ulp. The whole point of the InnoDB engine is that it supports atomic
transactions, even over crashes. If you want an unsafe table, there's the ISAM
engine, which is faster.

What does "100% compatible to InnoDB" mean here? It's not functionally
compatible and it's not disk-file-format compatible.

~~~
cruppstahl
If that's your definition of atomicity - yes, upscaledb offers atomic
operations. If you define atomicity as the "A" in "ACID" then no -
BEGIN/COMMIT wrappers are not yet available. Work in progress!

~~~
j_jochem
So what you're saying is that in general, transactions are implemented, but
the begin/commit syntax for transactions is not yet supported?

~~~
cruppstahl
Correct.

~~~
tombert
Well I think this is very cool. I actually have a project in mind that I think
might benefit; I'll look to see if I can use Upscale.

