
ScyllaDB: Drop-in replacement for Cassandra that claims to be 10x faster - haint
http://www.scylladb.com/
======
jandrewrogers
Very nice.

Broadly speaking, this is the correct style of architecture for a database
engine on modern hardware. It is vastly more efficient in terms of throughput
than the more traditional architectures common in open source. It also lends
itself to elegant, compact implementations. I've been using similar
architectures for several years now.

While I have not benchmarked their particular implementation, my first-hand
experience is that these types of implementations are always _at least_ 10x
faster on the same hardware than a nominally equivalent open source database
engine, so the performance claim is completely believable. One of my
longstanding criticisms of open source data infrastructure has always been the
very poor operational efficiency at a basic architectural level; many closed
source companies have made a good business arbitraging the gross efficiency
differences.

~~~
acconsta
Agreed, but which architectural features are you referring to?

~~~
jandrewrogers
Over the last decade, the distributed system nature of modern server hardware
internals has become painfully evident in how software architectures scale on
a single machine. The traditional approaches -- multithreading, locking, lock-
free structures, etc -- are all forms of coordination and agreement in a
distributed system, with the attendant scalability problems if not used very
carefully.

At some point several years ago, a few people noticed that if you attack the
problem of scalable distribution within a single server the same way you would
in large distributed systems (e.g. shared nothing architectures) that you
could realize _huge_ performance increases on a single machine. The caveat is
that the software architectures look unorthodox.

The general model looks like this:

\- one process per core, each locked to a single core

\- use locked local RAM only (effectively limiting NUMA)

\- direct dedicated network queue (bypass kernel)

\- direct storage I/O (bypass kernel)

If you do it right, you minimize the amount of _silicon_ that is shared
between processes which has surprisingly large performance benefits. Linux has
facilities that make this relatively straightforward too.

As a consequence, adjacent cores on the same CPU have only marginally more
interaction with each other than cores on different machines entirely.
Treating a single server as a distributed cluster of 1-core machines, and
writing the software in such a way that the operating system behavior reflects
that model to the extent possible, is a great architecture for extreme
performance but you rarely see it outside of closed source software.

As a corollary, garbage-collected languages do not work for this at all.

~~~
lastofus
How does one bypass the kernel for network and disk IO? I've never heard of
doing this before (e.g. IO is always a system call)

~~~
technion
Oracle has long argued the value of bypassing the file system and associated
kernel drivers with its raw devices and ASM. It'd be interesting to see such a
thing land in other platforms.

------
acconsta
It's exciting to finally see this. Cassandra's strengths were in its
distributed architecture (no master, tunable consistency, etc.). The database
engine itself has always been a bit of a mess
([https://issues.apache.org/jira/browse/CASSANDRA-8099](https://issues.apache.org/jira/browse/CASSANDRA-8099)).

------
nattaylor
>The Scylla design, right, is based on a modern shared-nothing approach.
Scylla runs multiple engines, one per core, each with its own memory, CPU and
multi-queue NIC. We can easily reach 1 million CQL operations on a single
commodity server. In addition, Scylla targets consistent low latency, under 1
ms, for inserts, deletes, and reads.

Interesting. From:
[http://www.scylladb.com/technology/architecture/](http://www.scylladb.com/technology/architecture/)

~~~
whalesalad
So on virtualized hardware, namely AWS, I'm sure the benchmarks won't be so
magnificent. Needing a dedicated nic per core is a big deal unless you're at a
pretty large scale.

~~~
jandrewrogers
A modern Ethernet chipset has a large number of independent hardware queues.
These can be assigned to VMs for direct access to the NIC, bypassing the
hypervisor. AWS, since you used that example, offers instances with this type
of direct bypass.

Just to pull an example from memory, the ubiquitous Intel 82599 10GbE NIC
silicon has up to 128 TX and RX queues in hardware. IIRC, these are bundled in
pairs for direct access in virtualized environments, so in principle you could
have 64 virtual cores each with their own dedicated physical hardware queue.
This is almost certainly what they were talking about. That is the whole point
of this feature in Ethernet silicon; it gives cores (virtual or physical)
dedicate network hardware off a single NIC.

------
jhugg
"A Cassandra compatible NoSQL column store, at 1MM transactions/sec per
server."

Personal pet-peeve of mine. Using "TPS" or "Transactions/sec" to measure
something that is in no way transactional. Maybe ops/sec, reads/sec,
updates/sec, or something...

~~~
JoeAltmaier
Add my pet peeve: not listing latency stats. Big Tables does millions of
ops/sec but it can take 5(!) seconds to complete one. That's the stat that
matters to customers.

~~~
lucindo
[http://www.scylladb.com/technology/cassandra-vs-scylla-
laten...](http://www.scylladb.com/technology/cassandra-vs-scylla-latency-
benchmark/)

~~~
rakoo
> The test hardware configuration includes:

> 1 DB server (Cassandra / Scylla)

The whole point of Cassandra is to run a cluster of servers to handle load at
scale with minimal friction instead of having to buy a big single machine or
spend all your time/money trying to run a clustered RDBMS. This test doesn't
measure the correct thing.

~~~
acconsta
Cassandra's performance scales linearly with the number of nodes though, so
per-node performance definitely matters. Probably not 10x, but probably not 1x
either.

------
mappu
Numbers look great, but so do /dev/null's. What guarantees does it make?

Has it been through Jepsen yet?

~~~
dorlaor
It's planned. However, I don't believe we'll pass it today. We're targeting GA
for Jan and we'll give it our best shot.

~~~
acconsta
Yeah, I was wondering about that. It looks like you guys have done some
brilliant work with the storage engine, but reimplementing all the distributed
logic is another (possibly bigger) project.

------
mbfg
Wait, did I read that right? the test was with (1) one server? What's the
point of that? Smells like a cooked up test.

~~~
glommer
The point of that is to show how efficient a node can be, because that is what
is replaced.

All the external facing things for scylla is the same as Cassandra. That
includes all the ring stuff and all network protocols.

So you should expect similar cluster behavior.

~~~
kcw39217
Cassandra is an open source distributed database management system designed to
handle large amounts of data across many commodity servers, providing high
availability with no single point of failure.......

There is nothing commodity about a server with 128GB RAM.

When you introduce other nodes, you get chatter and network traffic....

~~~
sciurus
Nothing commodity about a server with 128GB RAM? At list price, you can
configure one of dell's entry-level servers with 128GB of RAM for less than
$3,500.

[http://www.dell.com/us/business/p/poweredge-rack-
servers](http://www.dell.com/us/business/p/poweredge-rack-servers)

~~~
mbfg
Dell's servers you point to do not have 48 logical cores, either. That cpu
runs $2.2K by itself.

------
domlebo70
Literally zero mention that in the event of a network partition they will just
drop messages on the floor. This is fine as a cache, perhaps replacing
Redis... but as a Cassandra replacement this is pretty scary

~~~
vitalyd
Where did you get this from? I hope that's not a conclusion from the
_benchmark_ doing single server load testing.

------
prohor
The license is Affero GPL, which means you need to open-source your code even
if you use it for a service. "Traditional" GPL was effective only while
redistributing. That means you would need to go for commercial license
whenever you build a service on it. Which in fact is a fair approach for a
business model when there is a company behind an open source project.
Especially that this time there is no lock-in. You could always come back to
Cassandra.

~~~
mappu
The virality doesn't cross the database interface layer.

Modifications to the database software must be shared, yes, but your client
application is outside the reach of the AGPL and can remain proprietary.

------
philipov
Wow, that autoadvancing website is a deal breaker.

~~~
dmarti
ScyllaDB web person here. If I made it so that you could block one script and
get a home page without the horizontally scrolling thingy (but have all the
other JS stuff work including syntax highlighting and graphs), would you come
back? ( dmarti@scylladb.com )

------
kcw39217
which JVM did they use? What was the flags passed to the JVM?

------
dschiptsov
Finally, back to sanity of great old-school products, like Informix, by
dropping Java (the whole scam) for C++14 and by paying attention to details of
an underlying OS (again).

Same trend, by the way, is in Android development.

~~~
_Codemonkeyism
10x speedup (same algorithms, same architecture) replacing Java with C++ is
not possible (~2x at max).

One of the latest benchmarks I've seen is "Comparison of Programming Languages
in Economics" [1] for code without any IO just number crunching, has a 1.91 to
2.69 speedup of using C++ compared to Java. So any code involving IO is going
to be slower.

Replacing bad Java code with excellent machine aligned C++ a 10x speedup is
possible.

[1] [https://github.com/jesusfv/Comparison-Programming-
Languages-...](https://github.com/jesusfv/Comparison-Programming-Languages-
Economics)

~~~
cbsmith
It's particularly flawed given:

a) IO is such a large portion of the problem b) Hypertable isn't just way, way
faster.

~~~
glommer
IO is not only a large part. It is the main part. That is why it is important
to get it right : scylla for instance does not leave the cache to the OS. It
has its own caches for everything. Never blocks on IO or page faults because
all IO bypasses the kernel. And those are just two tiny examples.

~~~
cbsmith
> scylla for instance does not leave the cache to the OS. It has its own
> caches for everything

Uh-huh... that's all pretty common for databases. Cassandra would fit that
description.

> Never blocks on IO or page faults because all IO bypasses the kernel.

That just seems nonsensical. Sometimes, you are waiting for IO. That's just
reality. It is conceivable you bypass the kernel for I/O, but that creates a
lot of complexity and limitations. Near as I can tell though, they do talk to
the kernel for IO.

~~~
vitalyd
[http://www.scylladb.com/technology/memory/](http://www.scylladb.com/technology/memory/)

By the way, I think you're replying to one of the devs of Scylla.

~~~
cbsmith
So, in general, I understand there is lots of stuff going on in Scylla that
does distinguish it, at least from Cassandra. There is the user space
networking logic for IO. However, a lot of the IO overhead with disk, for
example.

~~~
vitalyd
>However, a lot of the IO overhead with disk, for example.

That's why they benchmarked this workload on a 4x SSD RAID configuration :).
Given that i/o bandwidth and throughput continues to increase, processor
frequency isn't, and core counts are going up, it's prudent to design a system
that can take advantage of this.

~~~
cbsmith
Yeah, and a 4x SSD RAID configuration is kind of overkill in the extreme for
most Cassandra set ups.

I'm sure there is a way to set up IO subsystems so that Cassandra becomes a
huge bottleneck, but that's a pretty specialized context.

------
finalight
will this be the next docker in the nosql database?

~~~
ketralnis
What does this mean?

~~~
mappu
It's nonsensical buzzwords.

I guess the poster's underlying question is "will this database become hyped
as the Next Big Thing"

