
ScyllaDB Closes $16M in Series B Funding - bsg75
http://www.scylladb.com/press-release/scylladb-closes-16-million-series-b-funding-expand-next-generation-cassandra-database
======
hendzen
If I was Datastax I would be scared of Scylla. They have momentum and they are
one of the best engineering teams around outside of top teams at Google/FB
etc.

~~~
threeseed
I wouldn't.

Enterprise companies care about performance, sure. But far, far less than they
care about being on a supported platform. DataStax has done the hard yards
over the years to prove themselves capable of supporting Cassandra. SycllaDB
has no street cred at all.

So SycllaDB might make some inroads in performance critical startups and web
companies. But that is likely to be about it.

~~~
hendzen
Enterprise companies also care about OPEX.. i.e. if they have an internal SLA
that can be met with a 100 node C* cluster or with 10 Scylla nodes, that will
matter quite a bit.

~~~
threeseed
Enterprise companies spend millions on database projects and majority of the
cost comes from Professional Services and Consultancies engaged in data
migration etc. So cost of the product itself is never the driving factor
especially when the cost will be so substantially less already than Teradata
or Oracle.

And the most important part of a database is confidence. You need to be able
to trust that when you have a Production outage IT is able to talk to someone
to fix it. DataStax has proven capable of meeting that task. It's very
difficult for a second tier player to do the same especially with Cassandra
being such a niche product.

~~~
hendzen
I used to work for a database company and have been involved in the purchasing
side of database support contracts. Professional Services/Consulting is a
small revenue line (and TCO component) relative to support licensing. It also
has considerably lower margins.

Scylla is an upstart... but I would not call them a second tier player.
DataStax is already trying and failing [0, 1] to incorporate ideas from Scylla
in to their product.

Yes, support and SLAs matter. And this round of funding will go a long way
toward helping Scylla build out those parts of their business.

[0] -
[https://issues.apache.org/jira/browse/CASSANDRA-10989](https://issues.apache.org/jira/browse/CASSANDRA-10989)

[1] -
[https://issues.apache.org/jira/browse/CASSANDRA-8520](https://issues.apache.org/jira/browse/CASSANDRA-8520)

------
mwcampbell
Looks like this is the same team that brought us OSv
([http://osv.io/](http://osv.io/)). So is OSv abandoned now?

~~~
thekozmo
It's not abandoned, we continue to maintain and contribute to it but sadly,
it's not our business focus in the present. I really want to see it to win
centerstage one day in the future.

Few orgs do use it in production. It's really cool but along with it the cloud
needs to change. For instance, we wanted to have a 1-minute-granularity price
scheme from the cloud vendors so it will resemble a container (just with
hardware-based security..). This any many needs to happen for it.

------
TheGuyWhoCodes
Congratulations to the team!

I would love to move over from Apache Cassandra to Scylla but honestly I'm a
bit afraid to do that. I have no doubt that it's much faster but I haven't
seen hard numbers about consistency and availability. Apache Cassandra is a
much older project with many installations and is battle tested (to a degree)
how can I be sure that Scylla will behave as stable as Cassandra in that
regard?

~~~
tirus
We use 1.6 in production on AWS i3.16xlarges and it's great. We've seen a
reduction to less than half the nodes and over 100k per node requests/sec (up
from 4k req/sec for Cassandra). Very few issues and those that come up get
immediate attention from core developers that actually wrote the code you're
having issues with.

Absolutely 100% recommend.

------
jdoliner
Does anyone else find it a bit weird that a post on ScyllaDB's blog announcing
their fundraising starts with: "ScyllaDB announced today that it?" It seems
weirdly self referential to me. They're definitely not the only ones to do
this though.

Anyways, congrats on the funding guys, certainly not trying to cast shade.

~~~
tekacs
This is the standard format for a press release (note the URL of the article -
scylladb.com/press-release/...)

[https://en.wikipedia.org/wiki/Press_release](https://en.wikipedia.org/wiki/Press_release)

~~~
jdoliner
I guess that makes some sense. It's still pretty weird to me though, the big
difference with a press release is that the press is writing about an
announcement that the company made. So it makes sense to describe the company
in third person. If the company is writing for themselves they should just
announce that they want to announce rather than announce their announcement.

~~~
manigandham
A press release _is_ written by the company, then released for the press to
further distribute and write about if necessary.

------
agentgt
You know if Postgres can just get some better distributed/elastic support it
could do some damage.

I switched from Cassandra to citus + pipelinedb (and I'm JVM guy). Postgres is
such an awesome platform. I'm planning on looking into some logical decoding.

~~~
manigandham
Postgres is a completely different type of database, there's not much
comparison with cassandra/scylla at all.

It also has a way to go with just scale-up performance before it even gets to
scale-out.

~~~
agentgt
I wanted a db for realtime analytics.

I agree on scale-out but scale-up performance I will say Cassandra is not even
close to cost to performance ratio of Postgres w/ extensions for a real time
analytics.

I have legitimate 6months experience that I wasted on Driud/Cassandra and
could not match Postgres in terms of performance.

I don't want to hear CAP this and that when I can do all sorts of stream
processing a priori. Besides real SQL is easy to understand and hack with then
many proprietary query languages.

I only tell people so they don't was time like I did.

~~~
ericfrenkiel
I'm one of the founders at MemSQL, which is designed for real-time analytics.
Take it for a spin and see if it works for you.

~~~
agentgt
Thanks for the recommendation! I will check it out for sure!

~~~
manigandham
We use memsql, if you're using citus + pipelinedb then memsql will likely
solve your problem with a single better solution. Not open-source and uses
mysql dialect instead of postgresql but definitely highly recommended.

------
rosslazer
I don't get it. Is this just a faster Cassandra? What's their competitive
niche?

~~~
nemothekid
Their marketing points are really understood and relatable to people who know
what a pain operating Cassandra can be. The two biggest pain points IMO are
read-repair and compaction - both of which must be run periodically and
consume a huge amount of resources. Read Repair is especially a pain because
(1) it must be run periodically or you risk losing data (2) it takes forever
to complete in some deploys (ex - you must run read-repair in a certain (user-
tunable) timeframe, the default of which is every 10 days - I have tables that
take _7 days_ to complete a read-repair, meaning I have repairs pretty much
running 24/7) and (3) there are no/few operational tools to manage read
repair. The low-tech way is to write a cron job on every node - and even then
there is no way to measure progress or detect if a job failed/completed
without grepping logs - it's so bad that Spotify wrote a open source tool to
manage it.

The solution has been to just buy more nodes (if you don't want long repairs,
store less than 1TB of data per node) and faster disks. Read Repair
maintenance is probably the _only_ thing I hate about Cassandra - and seeing
benchmarks that Scylla does these operations on the order of minutes rather
than hours is attractive enough for most people (I don't think most deploys
are even coming close to the benchmarked txn/s in real-world workloads, for
both databases). Both compaction and repair tend to be CPU intensive (both
work by essentially reading a ton of data), so I'd imagine the move to C++ and
the core-per-thread design is more efficient.

In short, the operational efficiency is far more attractive even if you aren't
pushing a trillion writes/sec.

I've been thinking about testing Scylla for a while, but unfortunately they
don't support the features we support, and while our Cassandra deployment is a
rather comparatively large cost, there are enough things on my plate right now
where trading my current set of evils for other unknown ones isn't very
attractive.

See this post by Discord App - [https://blog.discordapp.com/how-discord-
stores-billions-of-m...](https://blog.discordapp.com/how-discord-stores-
billions-of-messages-7fa6ec7ee4c7#.8ybap8c6o) \- where they are mentioning
moving to Scylla from Cassandra for similar reasons. Performance is fine, but
repair efficiency is more of the driving factor.

I'd also add that Cassandra advertises itself as a relatively high performance
database for distributed workloads. If something like a faster Cassandra
doesn't entice you, chances are you'd be better served by something like
Postgres anyways.

~~~
Nate75Sanders
What you keep calling "read repair" is actually just "repair" or the longer
phrase "anti-entropy repair". "Read repair" is something different.

------
redwood
Is there any change to the data model? Guessing still no secondary indexes for
example.

~~~
manigandham
They're working on getting to feature parity with cassandra by version 2.0

[http://www.scylladb.com/technology/status/](http://www.scylladb.com/technology/status/)

~~~
jazoom
Unfortunately that doesn't seem to give any indication of how long that might
be. I just saw a post from one of their engineers about a year ago that said
secondary indices would only take about 2 weeks to implement.

~~~
penberg
Where did you see that kind of estimate?

We are currently working on first finishing materialized view support, which
will hopefully be completed in the upcoming months. Secondary indices will be
implemented after that and we're hoping to reuse MV infrastructure for that.
So I personally expect both features to land into a release later this year.

~~~
jazoom
Here:

[https://news.ycombinator.com/item?id=11523531](https://news.ycombinator.com/item?id=11523531)

~~~
penberg
Thanks for the pointer!

Please note that @glommer is talking about the classic secondary index
implementation in Cassandra, which is very simple but also broken. I don't
know the details but we probably did have "half-ready" code for that. We
decided against going forward with it because Cassandra had already moved to
SASI (which is also much more complex). As I said, we're currently focusing on
materialized views, and tacking secondary indices after that.

Btw, I highly recommend subscribing to our user mailing list or engaging on
Github for questions and comments about features. You'll get better and up-to-
date answers there.

Update: reading @glommer's reply carefully, he explicitly says that he's
unsure if we'll move forward with that specific implementation: "_if_ we do
implement it, it should land in our main version in a couple of weeks"
(emphasis mine).

~~~
glommer
update: we didn't move forward with that implementation.

Secondary indexes will be implemented on top of Materialized Views. Patches
for Materialized Views already exist, and are soon to appear in preview
releases.

------
ram_rar
I like ScyllaDb. But I am not sure, why does it need to run on XFS only ?

~~~
manigandham
The entire database is designed for performance with message-passing thread-
per-core async architecture. XFS is the only filesystem that has good async
support.

[http://www.scylladb.com/2016/02/09/qualifying-
filesystems/](http://www.scylladb.com/2016/02/09/qualifying-filesystems/)

[http://www.scylladb.com/technology/architecture/](http://www.scylladb.com/technology/architecture/)

~~~
penberg
Yes, exactly. Scylla needs proper AIO/DIO support at the filesystem level and
XFS has so far been the one that implements it best. There are fixes to Scylla
to make it also run well on ext4 (by working around some of its limitations),
but that's not part of any release yet.

------
crudbug
Congratulations to the team.

Any word on native JSON types ? Scylla will have another reason to drop
Cassandra.

~~~
penberg
JSON types are actually not very often requested feature so we have not
prioritized it very high. I suspect that's because most people just store
their JSON data as text and do processing in their applocation.

There's an open issue about it on Github:

[https://github.com/scylladb/scylla/issues/2058](https://github.com/scylladb/scylla/issues/2058)

Please feel free to upvote and comment on the issue to voice your interest in
the feature.

~~~
manigandham
Some clarity - that issue is about interfacing with a row within a CQL table
as a piece of json, not actually storing a schemaless json document (like in a
special postgres-style json datatype).

Cassandra is actually schemaless but since the shift to CQL from Thrift, it's
unlikely that it'll go back to a schemaless model again.

In the meantime, the Keen.IO crew has a nice model for storing lots of
arbitrary json if that's something thats needed. It takes some work but a very
clever strategy and they've made it work well.

------
notacoward
Business plan.

1\. Let someone else solve the hard distributed-system problems.

2\. Re-implement the local pieces for higher performance.

3\. Profit!

~~~
geodel
No one stopped that someone to work for higher performance and profit
themselves.

~~~
notacoward
Nobody stopped them; they were just busy doing other things that made the
effort worthwhile. It's worth keeping that in mind, to make sure that
competitive claims about performance don't drown out proper credit for the
true innovators.

Note that I'm not saying anything is _wrong_ here. Reimplementations of
existing ideas are a time honored tradition, and often lead to their own
innovations. Linux was a reimplementation of UNIX, and seems to have been good
for a lot of people. Most web servers and browsers are reimplementations of
things that had existed previously. From compilers and databases to
filesystems and hypervisors, a lot of software we all rely on today -
especially in open source - is a reimplementation of something or other. I'm
pointing out an _opportunity_ , not a flaw.

------
mrits
From their front page: "allow for perfect scale-up linear performance of up to
1,000,000 read/write operations per node."

What happens after 1M operations? The nodes catch on fire?

~~~
jazoom
I know nothing about database architecture but I'm gonna assume that the
individual operations take longer until you distribute the load across more
nodes.

~~~
glommer
Every system has a maximum capacity, and after that is reached, the requests
just see increased latency without increased throughput. At this point you
scale up (by replacing it with better hardware) or out (by adding more nodes).

------
dominotw
Garbage collection has been the pain point for java based tech like hadoop.
Interesting to see databases being written in Golang, which I imagine would
have the same issues.

~~~
Chyzwar
ScyllaDB is written in C++.

