
Amazon Launches Managed Cassandra Service - jedberg
https://aws.amazon.com/blogs/aws/new-amazon-managed-apache-cassandra-service-mcs/
======
thekozmo
Kudos for AWS for the ability to launch so many services, many of them are
competing with each other or complimentary.

At first I (disclosure, ScyllaDB co-founder) called their serverless a bluff
but I gave it a try and it's nice to create a table without waiting for any
server. That said, I know personally that Dynamo scales really slow so they
wouldn't catch up the speed. It also say 1 digit ms latency but state the plan
to improve jvm overhead.

Another funny thing is (regardless of tech) that AWS folks have wonderful
people talking about open source but this very solution isn't open at all.
It's impossible to figure out what's Cassandra and what's Dynamo. How about a
diagram?

Lastly, it's pricey (for 1M iops you'll pay $3.5M/year!!) and doesn't have
counters, UDT, materialized views and plenty other features.

If you got down here, hey, give ScyllaDB a try, it's OSS (AGPL) and as a
service on AWS and has features that neither C* or MCS has - workload
prioritization for instance. Dor

~~~
simonebrunozzi
> Dynamo scales really slow

Quite arrogant attitude, dude. DynamoDB is not slow in scaling, quite the
opposite. (disclosure: 6+ years at AWS, 2008-2014, and I know the main guy
that designed DynamoDB).

If you want your startup to succeed, I suggest your first step should be to
avoid criticizing the competition without substantiating it with hard
evidence.

Be humble. There's plenty of space for great solutions, you don't need to bash
the huge elephant in the room to get started.

~~~
LegitShady
≥Quite arrogant attitude, dude

I find your lack of self awareness hilarious. You know the guy who designed
dynamodb! Wow. Could I get you to send me his autograph?

~~~
tapatio
LOLOLOL

------
rbranson
My guess is that this is on-demand DynamoDB wrapped with a Cassandra-
compatible frontend. Just based on the pricing and stated characteristics. It
would be really hard to provide on-demand capacity for compute and storage
using anything remotely similar to off-the-shelf Cassandra. The pricing looks
like DynamoDB on-demand + a bit extra to cover the cost of operating the
frontend.

EDIT: confirmed by an AWS employee here
[https://twitter.com/_msw_/status/1201924979647905792](https://twitter.com/_msw_/status/1201924979647905792)

~~~
_msw_
Disclosure: I work for AWS

Off-the-shelf Cassandra has a fair amount of flexibility in its architecture.
Open Source Cassandra is the thing that is in the front end. And the team is
excited about the idea of collaborating with the development community around
some of the generally useful abstractions needed to build this managed
database experience. More at
[https://aws.amazon.com/blogs/opensource/contributing-
cassand...](https://aws.amazon.com/blogs/opensource/contributing-cassandra-
community/)

~~~
sg47
Has Amazon ever contributed anything meaningful back to open source?

~~~
manigandham
Quite a lot:
[https://aws.amazon.com/opensource/](https://aws.amazon.com/opensource/)

~~~
sg47
Let me give you an example.

"Amazon EMR has been adding Spark runtime improvements since EMR 5.24, and
discussed them in Optimizing Spark Performance. EMR 5.28 features several new
improvements."

Have these improvements been contributed back to Spark? When I take a look at
the improvements themselves, it looks like all Amazon did was upgrade Spark
from 2.3 to 2.4.

~~~
manigandham
This seems completely random. EMR isn't an open-source project, it's a
proprietary offering.

The page I linked lists plenty of projects if you're looking for actual OSS
work.

~~~
sg47
EMR isn't open source but Spark is. What does the EMR Spark Runtime if not
offer Spark as a service? And the changes to optimize spark runtime, why were
they not contributed back to upstream Spark?

~~~
manigandham
I'm sure some changes are upstreamed (assuming they're even accepted) but
there's no requirement for them to do so.

Again this seems like a random example. What is so important about this
particular change over all the other open-source contributions?

~~~
sg47
This is just an example. I'm sure there are many others. The developers that
take the time to contribute to Spark are making Spark a better product. Amazon
is not making it better. Amazon should not claim they made improvements to
Spark in a newer version. What they did was upgrade Spark to 2.4 and claim
that the improvements were done by them whereas in reality they were done by
the community.

~~~
manigandham
Again this is all wild assumption. They're not required to contribute to free
software, but they do upstream changes.

The improvements they're claiming is for their own EMR product, not to Spark,
and they do make updates to software to run better on their own own
infrastructure. That's what their customers care about.

You seem dedicated to believing that Amazon has done nothing for open-source
even I pointed you to an exhaustive list of their contributions. At this
point, there's nothing more to say.

------
jedberg
At reddit, we used Cassandra, and it was a huge pain to manage. At Netflix we
used it, and had a whole team of engineers that built tools just to manage
Cassandra.

If this service had existed then, it would have made life so much easier!

~~~
takeda
I find it a bit weird, that company that created a DynamoDB is now supporting
a data store that supposed to be an open source DynamoDB. Does that mean
Cassandra is better than DynamoDB since people would still prefer to use
Cassandra in AWS?

~~~
notatoad
Hasn't this been amazon's MO the last couple years? Competition with their in-
house versions hasn't prevented AWS from building anything before.

It's not really an admission that one is "better" than the other, it's just an
admission that people like managed drop-in replacements for tools that they're
already using.

~~~
onlyrealcuzzo
As a side note, I'm interested why Facebook doesn't have a cloud offering yet.
I'd love to see more players in this space.

~~~
Nextgrid
Would you really want “sponsored” rows automatically appearing in your
Facebook-managed database?

~~~
PeterCorless
_golfclap.gif_

------
tschellenbach
For the first 3 years of Stream we used Cassandra. Afterwards we switched to a
custom RocksDB + Raft solution. (somewhat outdated stackshare interview:
[https://stackshare.io/stream/stream-and-go-news-feeds-for-
ov...](https://stackshare.io/stream/stream-and-go-news-feeds-for-
over-300-million-end-users))

The difference is massive. Cassandra was hard to manage and after many years
of our team using it still had random spikes. RocksDB+Raft has been extremely
solid, doesn't require any maintenance, load times are flat, zero spikes.

Cassandra was awesome, but it definitely has some issues. That's also why
companies like ScyllaDB see space in that market. I wonder if AWS's cassandra
implementation is better than regular cassandra.

~~~
techie128
Something awesome is coming up for Cassandra 4.0

~~~
bsg75
Can you add anything more specific? Something related to the storage engine?

------
pritambarhate
Master stroke! Especially the pricing and autoscaling. Till they brought
autoscaling and pay as you go pricing for DynamoDB, Google cloud data store
was superior product (at least on paper) as you didn't need to think about
preprovisioning the capacity. It is supposed to just scale. They have brought
the same model to Cassandra. So no vendor lock in!

Also one of the reasons small projects stayed away from Cassandra is the
requirement of an hefty cluster to get decent performance out of it. That too
is taken care by AWS and made Cassandra much more accessible.

Now only thing that needs to be seen is how good this AWS product is.
Especially for first year many AWS services get very average reviews.

------
giaour
Excited to see a managed open source offering with the same pricing paradigm
as pay-per-request DynamoDB. I frequently encounter projects for which a
Dynamo-style storage layer would be a natural fit, but often use RDS Postgres
because I don't want to lock projects into the AWS ecosystem or sign teams up
for operating their own Cassandra or Riak clusters.

~~~
scarface74
So instead of using the best solution, you chose to use a non optimal solution
for fear of “lock in”? Were they not using any other AWS services? If not, why
pay more for a cloud provider than a colo and not get any of the benefits of
it?

~~~
giaour
I would disagree with that characterization and would instead say that
DynamoDB's lack of portability between vendors made it a non-optimal solution.
The services I work on need to survive specific vendors going out of business.

~~~
scarface74
What’s more likely to happen your company going out of business or AWS? Is
your company better capitalized than Amazon?

Would you also refuse to use Windows just in case Microsoft went out of
business?

~~~
stephenr
Amazon going out of business isn’t the only reason you shouldn’t be happy to
be tied to them.

If you want decent “high availability” and failover when shit hits the fan,
that means multi vendor. Who else offers dynamodb?

If amazon were to increase the price of dynamo db 10x would you still just
keep using it, because you’re already using it, and eat the 10x cost increase?

The amount of comments that essentially boil down to “amazon is not going
anywhere, you’re stupid for worrying about lock in” is both staggering and
depressing.

~~~
scarface74
You mean if multiple availability zones go down? In AWS’s entire existence,
have they been known to raise prices?

How much time, energy, and development effort are you willing to spend on
“avoiding vendor lock in” in the off chance that you will move your entire
infrastructure as opposed to spending those same resources creating either
revenue generating features or cost saving features?

If you’re using a cloud provider as a glorified overpriced colo, you have the
worse of all worlds. You’re spending more on resources and just as much
babysitting infrastructure.

It’s just like the bushy tailed “architects” who create layers of “factories”
and “repositories” just in case their CTO wakes up one day and decides to move
their companies six figure Oracle installation to Postgres. All the while
creating suboptimal queries to avoid using Oracle specific functionality.

~~~
stephenr
So far most major AWS instances I’ve paid attention to have been ultimately
caused by their own Rube Goldberg inspired infra. Nothing at AWS just is
something, it practically all relies on something else at AWS, and when there
are outages at the apparently lowest level, the issues are wide spread.

With such convoluted systems, fat finger syndrome seems to be a not
insignificant factor in their downtime, and the interdependence just makes it
blow up.

But sure. If you want to trust everything to aws you go right ahead and do
that.

As for rising prices - I have no idea - no company has done anything until
they do it the first time. AWS doesn’t really need to increase prices to be a
more expensive solution for the vast majority of companies using it.

If you don’t rely on proprietary aws “solutions” in the first place, there’s
no extra “time and cost” involved. It’s just running your setup process -
whatever that may be - with another location, another vendor, whatever.

Like I said if you want real HA you’re going to be running in multiple vendors
all the time - it’s not something you’re going to say “well shit aws is down
again let’s go sign up for azure”.

You’re going to sit back and eat crumpets because your site is running fine in
spite of aws or azure or whoever’s latest brown pants event.

And to be clear I’m not suggesting using aws is a smart move over bare metal
or even just regular rented virtual machines at a normal facility - the
concept of HA across vendors applies the same.

~~~
scarface74
Yes because colos never have a problem with reliability and most companies
have better managed infrastructure than AWS/Azure/GCP.

How many companies need higher reliability than you get from any of the cloud
vendors if you architect your site to across multiple AZ’s or multiple
regions?

And “running your setup” means duplicating functionality on VMs where you
could use managed offerings - the absolutely most expensive and least reliable
way of using cloud providers and it costs more in time and resources to
manage.

And you are ignoring how much money you can save by not needing as many
infrastructure people.

Heck, half the time you can get away with having a much cheaper shared
services/managed service provider.

Are all of the companies big and small who are using cloud vendors proprietary
solutions delusional?

~~~
stephenr
I see you're a graduate of the school of strawman tactics.

Of course traditional colo and rental VM hosting have outages. That's
literally why I said, multiple times, if you want actual HA, you need to be
using multiple vendors, regardless of what that vendor provides you - whether
it's bare racks or a web GUI to "push a button to make it go now". I didn't
explicitly state it, but I kind of assumed you'd realise that means different
vendors in different physical DCs/locations.

Complaining to me that using basic VMs in a "cloud" service is more expensive,
is like complaining to a duck that water is wet. No shit, EC2 is more
expensive than even a regular rented VPS/VM service from a more 'traditional'
hosting service, and much more expensive than either renting or owning
physical gear in a rack.

 _I_ didn't suggest you use EC2 or AWS at all - but just because you use self-
managed services doesn't mean you can't take advantage of the _one_ thing a
"cloud" service offers which traditional VM hosting doesn't: essentially
instant spin up and time usage billing.

If you want to split your workload across two or three cloud providers, and
run resources split such that you have just slightly more than 100% of the
resources you need for regular operations, and then when (not if) one of those
providers has an outage, you increase the capacity at the other provider(s) to
handle the increased load it'll handle.

I'm not even going to dignify the "we don't need infra people" comment because
it's not even a bad joke any more, it's more like a warning of management who
have no fucking idea what is involved.

I don't know what motivations each company has when deciding what technologies
they should use. But if you're suggesting that companies don't ever make bad
choices because of (a) uninformed/misinformed management decisions, or (b)
short sightedness, I'll kindly suggest you're either being very sarcastic, or
you're very naive.

~~~
scarface74
_I 'm not even going to dignify the "we don't need infra people" comment
because it's not even a bad joke any more, it's more like a warning of
management who have no fucking idea what is involved._

I didn’t say that you didn’t need _any_ I said that you didn’t need “as many”.
But yes, at smaller companies you can get away with no dedicated
infrastructure people and just use a managed service provider. At a slightly
larger company you can get away with a few people on site that manage your
MSP.

So you want HA by running in multiple DCs - Exactly what happens when you run
in multiple AZs and/or regions.

 _But if you 're suggesting that companies don't ever make bad choices because
of (a) uninformed/misinformed management decisions, or (b) short sightedness,
I'll kindly suggest you're either being very sarcastic, or you're very naive._

So you think, Netflix for instance, who started off running all of their own
servers and now are AWS biggest customers were being “naive”? Instead of
thinking that all of these companies are being irrational - including major
enterprises - by using cloud providers and their proprietary servers, maybe
they know something that you don’t know?

~~~
stephenr
Smaller companies don't necessarily need dedicated infra people regardless. My
point is that using "a cloud" doesn't change your level of infrastructure
experience/knowledge needs, it just changes what they need to know.

.... You're either not reading what I wrote or being deliberately obtuse. I
said multiple vendors, in different DCs. The same vendor in two DCs is not as
good as two different vendors in two different DCs.

I didn't say the companies are naive. I said _you_ are being naive, if you
think companies haven't made bad decisions.

~~~
scarface74
_My point is that using "a cloud" doesn't change your level of infrastructure
experience/knowledge needs, it just changes what they need to know._

It very much does change what they need to know. You don’t need to know how to
set up a database with multi region failover, load balancers, server
maintenance, switches, routers, storage, firewalls, etc. Have you ever used
managed services at any scale?

Yes I’ve done both - hosted our own servers in house.

~~~
stephenr
A good chunk of my work is getting clients out of shit situations with "The
cloud" because someone drank too much of the "Cloud means no more ops" Kool
aid.

For most small to medium companies, the alternative to a managed AWS service
is not "lets go buy some switches".

It's "let's use open source software on rented virtual machines". The "cloud"
model is only useful if your staff have no idea how a database server works.
If they _do_ , it's going to make a heap of basic tasks harder (and more
expensive) because you don't have access to the software itself.

I'm done discussing this with you. You can make all the same arguments
everyone else does when trying to justify "the cloud", and you won't convince
me, because your arguments are, as usual for this type of "discussion"
comparing against the most extreme alternatives.

Right from the start you've declared literally no cost of being at the
complete mercy of a single vendor for your entire infrastructure (and one with
a history for dirty tactics to "win" a market)

If that approach works for you, good for you. I, and my clients once it's
bitten them, aren't willing to do that.

~~~
scarface74
And your method of getting them out of “shit situations” is not by showing
them how to do it correctly - it’s by moving them to something you know.

So now, the same people who don’t have the expertise to manage a colo, are now
going all of the sudden have the expertise to manage VMs and open source
alternatives and know how to manage a fault tolerant multi region database and
other HA setups at multiple colos?

Again, how much experience do you personally have with actually using cloud
services from the big three? I’ve done both, I had to. The cloud vendors
didn’t exist when I started. Heck we had a “server room” with raised floors
for our “massive” 2TB SAN.

------
jedberg
Poor Datastax. Another company killed by AWS.

~~~
_msw_
Disclosure: I work for AWS, but this is my personal opinion.

I don't know of any company "killed" by AWS, and I don't think that Datastax
will be either.

~~~
jedberg
Heh you don’t know about them because they don’t exist _anymore_.

I know at least three founders who had to shut down after AWS launched a
feature at re:invent.

One went on to start another company that was acquired by Facebook in an 8
figure deal.

~~~
_msw_
Were they in angel / seed rounds? Which AWS features?

------
enitihas
This seems to be very good for people who don't want to be locked in to DDB.
The prices are very similar to DynamoDB(slightly higher), and the model is pay
as you go.

Prices: Managed Cassandra:

Write request: units $1.45 per million write request units

Read request: units $0.29 per million read request units

Storage: $0.30 per GB-month

Dynamo DB On Demand:

Write request: units $1.25 per million write request units

Read request: units $0.25 per million read request units

Storage: $0.25 per GB-month

~~~
paragraft
Tangentially: I don't understand why they have storage costs so high for DDB
still (hasn't changed since 2013) when Aurora's charging $0.10 GB-month (with
6 way redundancy at that, vs DDB's 3x)...

------
AtlasBarfed
So many questions

\- per-cell timestamps (this is IMPORTANT for online data migration with no
downtime)?

\- can you choose compaction strategy?

\- access to sstables for rapid data loads/custom backups?

\- triggers?

\- UDT?

\- how will upgrades work?

\- are they using rocksandra or other techniques?

\- how about a scylladb option?

------
heipei
They could have offered a managed ScyllaDB service and saved themselves and
their customers a lot of money while staying CQL / Cassandra compatible. Or
what am I missing here?

~~~
pritambarhate
Scylla is licensed under AGPL. Most of the cloud services companies try to
stay away from AGPL.

However, I think the real reason is not even that. I suspect underneath it
uses modified version of DynamoDB and it's just wire compatible with
Cassandra. That's how they are offering pay as you go pricing and auto-
scaling.

Considering the kind of resources Apache Cassandra requires to run, I don't
think they can offer this kind of pricing. This is the same company that
charges $0.2 per hour to run a K8s cluster.

~~~
_msw_
Disclosure: I work for AWS

The new Amazon Managed Apache Cassandra Service does use the open source
Apache Cassandra code. The team is excited about working with the development
community on Cassandra. See more at
[https://aws.amazon.com/blogs/opensource/contributing-
cassand...](https://aws.amazon.com/blogs/opensource/contributing-cassandra-
community/)

~~~
AtlasBarfed
Did you guys swap out the storage guts like Rocksandra did?

Can we access the sstables of our tables?

Can we do triggers and UDF?

Can we access the timestamps at a CELL LEVEL like cassandra proper (VERY
important for online/downtimeless data migration)

------
rjurney
If AWS runs everyone's open source for them as a metered service and people
don't run their own software... then they can't edit that software and make
contributions back into open source. The open source model breaks, no new
projects except from the likes of Amazon.

AWS would have to step up open source contributions to match users' and they
in fact do the opposite: they are extreme leeches compared to Google and
Microsoft.

------
antoncohen
One of the killer features of Cassandra is cross-region replication with
location aware consistency for queries. It looks like in this preview of
Manages Cassandra Service they are only supporting single regions clusters. I
do hope they support multi-region clusters in the future.

It would also be nice if they documented any differences from Apache
Cassandra, like if Amazon MCS improves secondary indexes so they can be used
in more cases.

~~~
AtlasBarfed
I personally don't trust dynamodb's mat views/secondary indexes since they
never explain how they maintain index coherency behind their black box.

If they do have better tech for that, I also hope they can solve the mat view
problems in cassandra 3.x

------
didip
It's about freakin time. Dynamo's lack of query language has always been a
pain point for me. Congrats AWS team!

------
PeterCorless
ScyllaDB's take on the new announcement:
[https://www.scylladb.com/2019/12/04/managed-cassandra-on-
aws...](https://www.scylladb.com/2019/12/04/managed-cassandra-on-aws-our-
take/)

------
vroyer
I'm the author of Elassandra (Elasticsearch plugin for Apache Cassandra)
currently under Apache 2. I think i'm going to release the next major version
of Elassandra under AGPL like ScyllaDB...

------
adzm
Is there any reason to actually use Cassandra instead of Scylla?

~~~
jjirsa
Biased as a cassandra committer, but:

\- Feature sets aren't the same yet

\- History / maturity

\- License

\- Actual savings dont tend to match proposed savings

\- Development isn't driven by a startup that AWS can kill with an
announcement like this.

------
xref
I’m curious if this plays out like their MongoDB debacle where Amazon burned
all their goodwill, only extracting value from the community and returning
nothing

~~~
cnlwsu
The contributors for Cassandra are from a group of companies that use it, not
a single company that is trying to be profitable from it. I think this is a
positive for the community, even if they don't contribute back since it would
provide an easy getting started platform.

------
rjurney
Amazon taking from open source without giving back.

~~~
alexbilbie
[https://aws.amazon.com/blogs/opensource/contributing-
cassand...](https://aws.amazon.com/blogs/opensource/contributing-cassandra-
community/)

~~~
rjurney
Right. Total bullshit PR you're supposed to be smarter than.

------
tus88
Cassandra hasn't changed their license like MongoDB yet?

~~~
jjirsa
Cassandra is an Apache project and will always have an Apache license.

It’s not a startup-driven product that is worried about AWS as a competitor.

------
alexnewman
I'd bet money this is actually dynamodb and will be insanely expensive.

~~~
giaour
Pricing is available at
[https://aws.amazon.com/mcs/pricing/](https://aws.amazon.com/mcs/pricing/)

MCS has a very similar pricing model to on-demand mode DynamoDB
([https://aws.amazon.com/dynamodb/pricing/on-
demand/](https://aws.amazon.com/dynamodb/pricing/on-demand/)) but is ~15% more
expensive on all line items.

~~~
alexnewman
Wow super psyched I got triple downvoted when I dared to say something mean
against one of the great deathstar companies. I've noticed if I ever dare to
say anything mean against amazon or google I get instantly downvoted. It is
more reliable than a bot.

Anyway, dynamodb is insanely expensive compared to postgresql and I'd
recommend only using it when you need to expose a datastore directly to the
user as it's attribute based access control is easy to use. Postgresql
serverless is now a thing as well. Anyway, if you truely want a WAN datastore,
exposed to web clients, with attribute based access control, run dynamo.

