Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Launches Managed Cassandra Service (amazon.com)
135 points by jedberg 2 days ago | hide | past | web | favorite | 152 comments

Kudos for AWS for the ability to launch so many services, many of them are competing with each other or complimentary.

At first I (disclosure, ScyllaDB co-founder) called their serverless a bluff but I gave it a try and it's nice to create a table without waiting for any server. That said, I know personally that Dynamo scales really slow so they wouldn't catch up the speed. It also say 1 digit ms latency but state the plan to improve jvm overhead.

Another funny thing is (regardless of tech) that AWS folks have wonderful people talking about open source but this very solution isn't open at all. It's impossible to figure out what's Cassandra and what's Dynamo. How about a diagram?

Lastly, it's pricey (for 1M iops you'll pay $3.5M/year!!) and doesn't have counters, UDT, materialized views and plenty other features.

If you got down here, hey, give ScyllaDB a try, it's OSS (AGPL) and as a service on AWS and has features that neither C* or MCS has - workload prioritization for instance. Dor

Just curious about AGPL. Do you feel that AGPL gives you a hedge against a cloud service deploying your project at scale for profit that the project would otherwise miss out on?

I ask this as someone who has been sitting on some code for 10 years for precisely this reason!

To some extent yes. It's not 100% protection. AGPL is just the right model, closes the loophole that GPL left. If you get to change the code, you must contribute the changes back. That's all

AGPL is viral as well. Many companies believe client libraries will also fall into AGPL scope, and therefor any apps using these clients will also have to be AGPL licensed.

So many companies ban AGPL completely.

A: The "one additional feature" - Section 2(d) reads as follows: " If the Program as you received it is intended to interact with users through a computer network and if, in the version you received, any user interacting with the Program was given the opportunity to request transmission to that user of the Program's complete source code, you must not remove that facility from your modified version of the Program or work based on the Program, and must offer an equivalent opportunity for all users interacting with your Program through a computer network to request immediate transmission by HTTP of the complete source code of your modified version or other derivative work."

"This enables the original author... to download the source and receive the benefit of any modifications to its original work.

So if a cloud vendor (or anyone) decided to take your software and use it, if they made modifications to it you are entitled to get access to those modifications. For many people this is a barrier to entry -- they want the ability to modify it in however they want (which they can still do privately under AGPL). But if they offer it in public, they have to share their mods back to the community.

At least, that is my layperson's understanding of it. I am not an intellectual property lawyer and do not pretend to be.

Source: http://www.affero.org/oagf.html

The quoted language is from the long-extinct AGPLv1, which was published in the early 2000s and adopted by only a handful of projects.

Ah! Good catch. Like I said, I'm not a lawyer. :)

> Dynamo scales really slow

Quite arrogant attitude, dude. DynamoDB is not slow in scaling, quite the opposite. (disclosure: 6+ years at AWS, 2008-2014, and I know the main guy that designed DynamoDB).

If you want your startup to succeed, I suggest your first step should be to avoid criticizing the competition without substantiating it with hard evidence.

Be humble. There's plenty of space for great solutions, you don't need to bash the huge elephant in the room to get started.

> If you want your startup to succeed, I suggest your first step should be to avoid criticizing the competition without substantiating it with hard evidence.

From scylladb's site: Scylla Cloud vs DynamoDB benchmarks


≥Quite arrogant attitude, dude

I find your lack of self awareness hilarious. You know the guy who designed dynamodb! Wow. Could I get you to send me his autograph?


ScyllaDB has already proven itself to be one of the fastest databases in the world, and far better than DynamoDB on scaling and efficiency.

AWS can launch so many services since they are not developing any of them and are just stealing the open source contributions of other companies/developers without contributing anything back.

AWS isn't stealing anything. They're hosting and running software for paying customers who want solutions to their problems, not licenses they have to run themselves. That's the whole point of managed services and the rise of cloud computing. Do you consider independent MSPs like Aiven and Instaclustr to be "stealing" too?

Vendors should compete by creating their own cloud offerings, and many are finally getting around to it.

They are disincentivizing (made up word) true open source development and using their monopoly to offer services that vendors cannot compete with easily.

I don't see how any of that is true. Customers want managed services, why aren't vendors meeting that demand?

For example, why hasn't Datastax rolled out a cloud offering on AWS yet? All this time and they have no solution. Even ScyllaDB just launched a minimal service this year. I would expect a db vendor to be able to wire up some APIs and at least get a basic deployment running instead of complaining about AWS offering it first.

Even when they do, there is a problem. They specialize in DB code. AWS specializes in infra... and can just take the DB code as well. That is a pretty huge competitive advantage (while the db company has to develop that second specialty). Even worse, there's the further monopoly advantage that companies can just role it into their AWS bill instead of doing another vendor evaluation. The fact that Amazon has almost all the competitive advantages for offering someone else's code it what seems unfair.

(the one they don't have is the ability to offer super specialized support and modifications for their customers, which is where a lot of the database companies have made their money - on consulting)

The DB vendor who created the database is worse at running it than AWS? No, that isn't true.

If you don't want others using your source code then don't be open source or use a different license. There are thousands of closed proprietary products out there doing just fine, for example all of AWS own managed services.

Right. But we want open source to be strong and Amazon is undermining it. You don't care, but we do.

That's a repeat of the same statement. Exactly how is it being undermined? Nothing is stopping you from working on open-source software.

> For example, why hasn't Datastax rolled out a cloud offering on AWS yet?

Umm... they do (https://www.datastax.com/platform/amazon-web-services). Instacluster also has an offering. Which kind of highlights the problem.

Didn't see that before. Still a poor implementation compared to what's available. And I mentioned Instaclustr, so if you're fine with them offering Cassandra then AWS is no different.

My point is that there is a reality that AWS can offer even a crummier, more expensive version than someone else, and it will still garner the $'s.

Sure, because that's what customers are fine with. Again the argument has nothing to do with open-source. Everyone has the same challenges when competing with a bigger company selling the same thing.

DataStax made a major deal with Google instead:


It's not a great implementation, the biggest advantage is consolidated billing. Unfortunately it's also ignoring the biggest cloud so they have no excuse against AWS launching this service.

Because they have limited resources and have to support existing customers? And they were doing the right thing by contributing everything back to the open source community and weren't expecting Amazon to offer a service?

That's called business competition. Should startups shut down because a bigger company might be selling the same thing?

There's no expectation of contribution with open-source software and the vast majority of users don't add anything. That's how it goes. If you don't want to give it away, then simply don't give it away.

DB vendors earn revenue by charging for features, support and services, just like AWS. There's nothing stopping them from selling proprietary closed-source products and I can name dozens that are doing well.

"There's no expectation of contribution with open-source software and the vast majority of users don't add anything. That's how it goes. If you don't want to give it away, then simply don't give it away." - Vendors are wising up to it now and changing their licenses. Amazon definitely took advantage of this before the days of AGPL. I agree that there's no expectation of contribution. As I said before, it just discourages open source in general because of what Amazon might do. My original comment was in response to someone asking how Amazon is able to launch these services quickly. My response was that the bulk of the work is being done by someone else. I guess that's not new. On the retail side, Amazon takes products that sell well and then offer an Amazon Basics version of the same product.

As did millions of other users and companies. If you give away source code then people will probably use it. Don't do it if you don't want that to happen.

If they're changing licenses now then good for them, but you could say they also took advantage of the free marketing from open-source and now put most of the development into enterprise versions. There's no right vs wrong side here.

There's a difference between people using it and making money off it. What Amazon is doing is putting a wrapper around other people's hard work and claiming it as their own.

Looks like the Amazon Mechanical Turk workers are out in full force to downvote.

Posts complaining about downvoting are boring.

I kept mine short so that people don't waste too much time.

My guess is that this is on-demand DynamoDB wrapped with a Cassandra-compatible frontend. Just based on the pricing and stated characteristics. It would be really hard to provide on-demand capacity for compute and storage using anything remotely similar to off-the-shelf Cassandra. The pricing looks like DynamoDB on-demand + a bit extra to cover the cost of operating the frontend.

EDIT: confirmed by an AWS employee here https://twitter.com/_msw_/status/1201924979647905792

Disclosure: I work for AWS

Off-the-shelf Cassandra has a fair amount of flexibility in its architecture. Open Source Cassandra is the thing that is in the front end. And the team is excited about the idea of collaborating with the development community around some of the generally useful abstractions needed to build this managed database experience. More at https://aws.amazon.com/blogs/opensource/contributing-cassand...

If you don't mind, can you please elaborate about the backend too.

Has Amazon ever contributed anything meaningful back to open source?

Disclosure: I work at AWS.

Here are some examples from my team’s 2019 work: We contributed numerous changes to containerd. We open sourced firecracker-containerd, and we also created a Go SDK that others are using to work with Firecracker. We contributed to Debian and the Debian kernel team. We contributed to Envoy. We collaborated with a number of communities, including Kata Containers, Red Hat’s Clair, and the Open Container Initiative. All of these examples are sustained investments, not one offs.

Sure, it's contributed where it's convenient. I still don't see a list of committers so that I can look at what those contributions were.

Let me give you an example.

"Amazon EMR has been adding Spark runtime improvements since EMR 5.24, and discussed them in Optimizing Spark Performance. EMR 5.28 features several new improvements."

Have these improvements been contributed back to Spark? When I take a look at the improvements themselves, it looks like all Amazon did was upgrade Spark from 2.3 to 2.4.

This seems completely random. EMR isn't an open-source project, it's a proprietary offering.

The page I linked lists plenty of projects if you're looking for actual OSS work.

EMR isn't open source but Spark is. What does the EMR Spark Runtime if not offer Spark as a service? And the changes to optimize spark runtime, why were they not contributed back to upstream Spark?

I'm sure some changes are upstreamed (assuming they're even accepted) but there's no requirement for them to do so.

Again this seems like a random example. What is so important about this particular change over all the other open-source contributions?

This is just an example. I'm sure there are many others. The developers that take the time to contribute to Spark are making Spark a better product. Amazon is not making it better. Amazon should not claim they made improvements to Spark in a newer version. What they did was upgrade Spark to 2.4 and claim that the improvements were done by them whereas in reality they were done by the community.

Do you have a list of open source committees accessible somewhere?

What do you mean by committees?

Sorry, I meant committers. I'm asking since the link you posted talks about open source events and has some job postings. I was looking for actual contributions by Amazon employees.

There's far more listed there including entire projects, repos, videos and several blog posts detailing their contributions, including one describing what they're doing for Cassandra: https://aws.amazon.com/blogs/opensource/contributing-cassand...

Amazon, Microsoft, Google, Oracle, IBM and others are all major contributors to open-source software.

That's a remarkable level of openness for Amazon. When Kinesis launched we were explicitly told we couldn't say anything publicly about the implementation (I don't know if the NDA has expired since).

They are way more open now. Before, it was a strong advantage of Google that you could hear from their engineers. AWS seems to have definitely pivoted on that since then.

This explanation does make a lot of sense. But why not just add a Cassandra interface to DynamoDB the way they added a Postgres interface to Aurora RDS?

They didn’t add a Postgres front end to RDS. They took the Postgres open source code and (grossly simplified) changed the storage layer and added more AWS specific features. Aurora/MySQL and Aurora/Postgres or more or less AWS specific forks of the respective projects.

Aurora RDS is a meta product name though, you can't use it without specifying the engine, MySQL or PostgreSQL.

What you are saying would be more like adding a postgres frontend to mysql.

I think what the tweet is saying is the engine is actually Cassandra but the node management is shared with DynamoDB?

At reddit, we used Cassandra, and it was a huge pain to manage. At Netflix we used it, and had a whole team of engineers that built tools just to manage Cassandra.

If this service had existed then, it would have made life so much easier!

I find it a bit weird, that company that created a DynamoDB is now supporting a data store that supposed to be an open source DynamoDB. Does that mean Cassandra is better than DynamoDB since people would still prefer to use Cassandra in AWS?

Hasn't this been amazon's MO the last couple years? Competition with their in-house versions hasn't prevented AWS from building anything before.

It's not really an admission that one is "better" than the other, it's just an admission that people like managed drop-in replacements for tools that they're already using.

As a side note, I'm interested why Facebook doesn't have a cloud offering yet. I'd love to see more players in this space.

Would you really want “sponsored” rows automatically appearing in your Facebook-managed database?


I briefly read through those leaked FB files, I think they're on NBC News. There was an email and other documents that showed they were thinking about opening a cloud offering multiple times, but it seems they never got round to launching it in the end.

Naah, I just think it means that people will pay for a hosted Cassandra instance that wouldn’t migrate to DynamoDB.

Honestly having a SaaS and the hosting instances of your competing offerings is probably a good strategy.

Dynamodb the product != the Dynamo paper which Cassy is based off. I was using Cassy in 2011 before Amazon asked me to use big bird which was the code name for DynamoDB.

I struggle to think of any similarities, actually. I think DynamoDB's design makes it clear that the era of decentralized architecture/eventual consistency at Amazon has come and gone. DHTs just don't make sense in the data center.

Can you elaborate on this?

Do you mean that amazon is tending towards strong or externalizable consistency now?

I’ve always noticed AWS products tend towards eventual consistency, and Google Cloud offerings are almost all strongly consistent.

For example S3 is eventually consistent for read after write (if it’s not the first write). Google cloud storage is strongly consistent for read after all writes.

I think the goal is for S3 to "eventually" be strongly consistent. I heard one of S3's designers say that they shouldn't have opted for eventual consistency in retrospect.

Not sure what you mean, DDB provides both strong and eventual consistency, now in 2 fashions with transactions. 3 if you include cross region replication.

There's definitely still a DHT sitting under there.

No, it's something like a hash-partitioned primary-backup-replicated distributed database. S3 is much closer to a DHT.

There's no real difference between a DHT (distributed hash table) and a hash-partitioned distributed database. It's the same concept. Apache Cassandra is considered an implementation of a DHT.

DynamoDB is decentralized same with S3, it's just abstracted from the user.

There are a lot of libraries and tools that interact with Cassandra. Having an API compatible service solves a lot of problems for companies that are already invested in Cassandra.

Only if it were completely compatible.

As they aren't, even new projects will often not use DynamoDB to avoid vendor lock-in

IMO no. Amazon just offers what people use.

I think Amazon realises that it is better for them to disrupt themselves, rather than someone else.

People use Cassandra. So Amazon is going to provide it.

It doesn't have to be one or the other. They are giving people a choice, and are getting paid either way.

Obligatory I work for Amazon and this is my personal opinion.

As a customer, it's feels way better to have both options like this in AWS, and I'm sure DDB and managed Cassandra will always have pros and cons in standalone functionality and ecosystem integration.

I just left a company that used an in-house cloud object storage product for their cloud DVR service. It was an amazing piece of tech that was hamstrung by some of the most irritating "features" of Cassandra. We spent more time cleaning up than anything else. I am looking forward to seeing how people use this at scale.

Could you provide a bit more specifics on what was being cleaned up please? Thanks

For the first 3 years of Stream we used Cassandra. Afterwards we switched to a custom RocksDB + Raft solution. (somewhat outdated stackshare interview: https://stackshare.io/stream/stream-and-go-news-feeds-for-ov...)

The difference is massive. Cassandra was hard to manage and after many years of our team using it still had random spikes. RocksDB+Raft has been extremely solid, doesn't require any maintenance, load times are flat, zero spikes.

Cassandra was awesome, but it definitely has some issues. That's also why companies like ScyllaDB see space in that market. I wonder if AWS's cassandra implementation is better than regular cassandra.

> Cassandra was hard to manage and after many years of our team using it still had random spikes

A lot of that is getting better over time. A non-trivial percentage of contributions are coming from companies that are only looking to improve their operational story, not adding new features.

Most of the talks at NGCC (Next generation cassandra conference) focused on operational improvements - it's something we care a lot about (myself especially).

Those that don't want to roll their own RocksDB + Raft can consider using TiKV, which uses the RocksDB + Raft architecture. https://tikv.org/

Were your spikes all p999 impacts from GC?

I'm hoping rocksandra will solve a lot of that.

Something awesome is coming up for Cassandra 4.0

Can you add anything more specific? Something related to the storage engine?

Master stroke! Especially the pricing and autoscaling. Till they brought autoscaling and pay as you go pricing for DynamoDB, Google cloud data store was superior product (at least on paper) as you didn't need to think about preprovisioning the capacity. It is supposed to just scale. They have brought the same model to Cassandra. So no vendor lock in!

Also one of the reasons small projects stayed away from Cassandra is the requirement of an hefty cluster to get decent performance out of it. That too is taken care by AWS and made Cassandra much more accessible.

Now only thing that needs to be seen is how good this AWS product is. Especially for first year many AWS services get very average reviews.

Excited to see a managed open source offering with the same pricing paradigm as pay-per-request DynamoDB. I frequently encounter projects for which a Dynamo-style storage layer would be a natural fit, but often use RDS Postgres because I don't want to lock projects into the AWS ecosystem or sign teams up for operating their own Cassandra or Riak clusters.

So instead of using the best solution, you chose to use a non optimal solution for fear of “lock in”? Were they not using any other AWS services? If not, why pay more for a cloud provider than a colo and not get any of the benefits of it?

I would disagree with that characterization and would instead say that DynamoDB's lack of portability between vendors made it a non-optimal solution. The services I work on need to survive specific vendors going out of business.

What’s more likely to happen your company going out of business or AWS? Is your company better capitalized than Amazon?

Would you also refuse to use Windows just in case Microsoft went out of business?

I'm a civil servant, so I look at these questions with a different mindset. I consider adopting a proprietary service to be a decision that entails a fair amount of risk, as I would have no easy recourse should the service become completely unavailable for reasons beyond my organization's control. This could happen for a number of reasons, including the vendor going out of business or raising prices beyond my authority to pay.

I might find that risk acceptable in some cases (e.g., most people who work at my agency use Windows on their workstations, some critical services are still running on z/OS, we use tools like Splunk and New Relic for monitoring), but it might not be worth it in other cases. This would get weighed against other forms of risk (such as the risk that a hand-rolled Cassandra cluster would have less availability than DynamoDB or the risk that you might spend more money than necessary using RDS instead of on-demand DynamoDB).

Let’s see how much the US government spends on some “proprietary services”....

Oracle: $3 billion.


Microsoft $7.6 Billion


Both AWS and Azure have separate gov regions. Civil servants have never been adverse to spending money on proprietary solutions.

Yes, as mentioned above, we do use proprietary solutions where appropriate. However, all else being equal, I would prefer a managed but replicable service (e.g., MCS or Aurora MySQL) over a purely proprietary one, because deploying a replacement Cassandra or Vitess cluster is usually less costly than rewriting an application to use a different data store.

Amazon going out of business isn’t the only reason you shouldn’t be happy to be tied to them.

If you want decent “high availability” and failover when shit hits the fan, that means multi vendor. Who else offers dynamodb?

If amazon were to increase the price of dynamo db 10x would you still just keep using it, because you’re already using it, and eat the 10x cost increase?

The amount of comments that essentially boil down to “amazon is not going anywhere, you’re stupid for worrying about lock in” is both staggering and depressing.

You mean if multiple availability zones go down? In AWS’s entire existence, have they been known to raise prices?

How much time, energy, and development effort are you willing to spend on “avoiding vendor lock in” in the off chance that you will move your entire infrastructure as opposed to spending those same resources creating either revenue generating features or cost saving features?

If you’re using a cloud provider as a glorified overpriced colo, you have the worse of all worlds. You’re spending more on resources and just as much babysitting infrastructure.

It’s just like the bushy tailed “architects” who create layers of “factories” and “repositories” just in case their CTO wakes up one day and decides to move their companies six figure Oracle installation to Postgres. All the while creating suboptimal queries to avoid using Oracle specific functionality.

So far most major AWS instances I’ve paid attention to have been ultimately caused by their own Rube Goldberg inspired infra. Nothing at AWS just is something, it practically all relies on something else at AWS, and when there are outages at the apparently lowest level, the issues are wide spread.

With such convoluted systems, fat finger syndrome seems to be a not insignificant factor in their downtime, and the interdependence just makes it blow up.

But sure. If you want to trust everything to aws you go right ahead and do that.

As for rising prices - I have no idea - no company has done anything until they do it the first time. AWS doesn’t really need to increase prices to be a more expensive solution for the vast majority of companies using it.

If you don’t rely on proprietary aws “solutions” in the first place, there’s no extra “time and cost” involved. It’s just running your setup process - whatever that may be - with another location, another vendor, whatever.

Like I said if you want real HA you’re going to be running in multiple vendors all the time - it’s not something you’re going to say “well shit aws is down again let’s go sign up for azure”.

You’re going to sit back and eat crumpets because your site is running fine in spite of aws or azure or whoever’s latest brown pants event.

And to be clear I’m not suggesting using aws is a smart move over bare metal or even just regular rented virtual machines at a normal facility - the concept of HA across vendors applies the same.

Yes because colos never have a problem with reliability and most companies have better managed infrastructure than AWS/Azure/GCP.

How many companies need higher reliability than you get from any of the cloud vendors if you architect your site to across multiple AZ’s or multiple regions?

And “running your setup” means duplicating functionality on VMs where you could use managed offerings - the absolutely most expensive and least reliable way of using cloud providers and it costs more in time and resources to manage.

And you are ignoring how much money you can save by not needing as many infrastructure people.

Heck, half the time you can get away with having a much cheaper shared services/managed service provider.

Are all of the companies big and small who are using cloud vendors proprietary solutions delusional?

I see you're a graduate of the school of strawman tactics.

Of course traditional colo and rental VM hosting have outages. That's literally why I said, multiple times, if you want actual HA, you need to be using multiple vendors, regardless of what that vendor provides you - whether it's bare racks or a web GUI to "push a button to make it go now". I didn't explicitly state it, but I kind of assumed you'd realise that means different vendors in different physical DCs/locations.

Complaining to me that using basic VMs in a "cloud" service is more expensive, is like complaining to a duck that water is wet. No shit, EC2 is more expensive than even a regular rented VPS/VM service from a more 'traditional' hosting service, and much more expensive than either renting or owning physical gear in a rack.

I didn't suggest you use EC2 or AWS at all - but just because you use self-managed services doesn't mean you can't take advantage of the one thing a "cloud" service offers which traditional VM hosting doesn't: essentially instant spin up and time usage billing.

If you want to split your workload across two or three cloud providers, and run resources split such that you have just slightly more than 100% of the resources you need for regular operations, and then when (not if) one of those providers has an outage, you increase the capacity at the other provider(s) to handle the increased load it'll handle.

I'm not even going to dignify the "we don't need infra people" comment because it's not even a bad joke any more, it's more like a warning of management who have no fucking idea what is involved.

I don't know what motivations each company has when deciding what technologies they should use. But if you're suggesting that companies don't ever make bad choices because of (a) uninformed/misinformed management decisions, or (b) short sightedness, I'll kindly suggest you're either being very sarcastic, or you're very naive.

I'm not even going to dignify the "we don't need infra people" comment because it's not even a bad joke any more, it's more like a warning of management who have no fucking idea what is involved.

I didn’t say that you didn’t need any I said that you didn’t need “as many”. But yes, at smaller companies you can get away with no dedicated infrastructure people and just use a managed service provider. At a slightly larger company you can get away with a few people on site that manage your MSP.

So you want HA by running in multiple DCs - Exactly what happens when you run in multiple AZs and/or regions.

But if you're suggesting that companies don't ever make bad choices because of (a) uninformed/misinformed management decisions, or (b) short sightedness, I'll kindly suggest you're either being very sarcastic, or you're very naive.

So you think, Netflix for instance, who started off running all of their own servers and now are AWS biggest customers were being “naive”? Instead of thinking that all of these companies are being irrational - including major enterprises - by using cloud providers and their proprietary servers, maybe they know something that you don’t know?

Smaller companies don't necessarily need dedicated infra people regardless. My point is that using "a cloud" doesn't change your level of infrastructure experience/knowledge needs, it just changes what they need to know.

.... You're either not reading what I wrote or being deliberately obtuse. I said multiple vendors, in different DCs. The same vendor in two DCs is not as good as two different vendors in two different DCs.

I didn't say the companies are naive. I said you are being naive, if you think companies haven't made bad decisions.

My point is that using "a cloud" doesn't change your level of infrastructure experience/knowledge needs, it just changes what they need to know.

It very much does change what they need to know. You don’t need to know how to set up a database with multi region failover, load balancers, server maintenance, switches, routers, storage, firewalls, etc. Have you ever used managed services at any scale?

Yes I’ve done both - hosted our own servers in house.

A good chunk of my work is getting clients out of shit situations with "The cloud" because someone drank too much of the "Cloud means no more ops" Kool aid.

For most small to medium companies, the alternative to a managed AWS service is not "lets go buy some switches".

It's "let's use open source software on rented virtual machines". The "cloud" model is only useful if your staff have no idea how a database server works. If they do, it's going to make a heap of basic tasks harder (and more expensive) because you don't have access to the software itself.

I'm done discussing this with you. You can make all the same arguments everyone else does when trying to justify "the cloud", and you won't convince me, because your arguments are, as usual for this type of "discussion" comparing against the most extreme alternatives.

Right from the start you've declared literally no cost of being at the complete mercy of a single vendor for your entire infrastructure (and one with a history for dirty tactics to "win" a market)

If that approach works for you, good for you. I, and my clients once it's bitten them, aren't willing to do that.

And your method of getting them out of “shit situations” is not by showing them how to do it correctly - it’s by moving them to something you know.

So now, the same people who don’t have the expertise to manage a colo, are now going all of the sudden have the expertise to manage VMs and open source alternatives and know how to manage a fault tolerant multi region database and other HA setups at multiple colos?

Again, how much experience do you personally have with actually using cloud services from the big three? I’ve done both, I had to. The cloud vendors didn’t exist when I started. Heck we had a “server room” with raised floors for our “massive” 2TB SAN.

How long have you been an employed programmer? On the scale of decades things can change dramatically.

In theory (based on the cost of an instance vs the cost to get your own) AWS is grossly overpriced, and SOMEONE will eventually beat them by a very large margin for commodity services.

AWS knows this, and probably is behind all its custom features.

AWS is the new IBM.

Well, if the 74 in my username doesn’t give it away, quite awhile. But just to give you a hint, my first professional contract in college was writing a Gopher site. My first hobby projects involved writing 65C02 assembly in the mid 80s.

As far as IBM, if someone had chosen to get “locked-in” to IBM in the 70s, they could still buy new hardware that could run their old software unmodified. Isn’t that an argument for using AWS?

In the scale of decades, most of the time you will be performing heavy rewrites anyway, are you really trying to optimize now just in case in 20 years you might want to move to something else?

Are you using the same language and frameworks you were using 20 years ago? When I first started developing I was writing C programs on DEC VAX, Stratus VOS mainframes.

If you are just using AWS to host a bunch of EC2 instances and as an overpriced colo, sure you could find much cheaper options now.

No matter what you do, more than likely you will have to migrate. Just changing infrastructure if all you’re doing is moving to VMs on another provider, reconfiguring your network if you are using a hybrid, etc is going to be a heavy lift no matter how much you try to avoid lock in and you’re spending money now for an amorphous future where you may want to change vendors instead of using your vendor of choices features that can save you money and/or time.

I would make the same argument if we were talking about Azure or God forbid Oracle.

AWS is like crack to developers, especially ones that used to be tied down to enterprise data centers and their six month procurement schedules.

So you get a lot of "addict reasononing".

If AWS runs everyone's open source for them as a metered service and people don't run their own software... then they can't edit that software and make contributions back into open source. The open source model breaks, no new projects except from the likes of Amazon.

AWS would have to step up open source contributions to match users' and they in fact do the opposite: they are extreme leeches compared to Google and Microsoft.

Poor Datastax. Another company killed by AWS.

We begged Datastax to provide a managed service in AWS. We used Datastax for consulting and for their repackaging of Cassandra with admin tools. After perpetual pain with Cassandra management, and little belief that Datastax would enter the market for managed services, we eventually decided to stop everything and rewrite to Dynamo.

Datastax was undeniably a beneficial force behind earlier cassandra, and I initially thought the Apache board pseudo-ejecting them from the project governance was foolish, but I do think they were starting to mess with cassandra a bit too much.

They can probably support their product line atop this just fine. They really make their money on integrating cassandra and a bunch of other big data frameworks in one neat little package.

I usually stay away from politics but the Apache Cassandra is doing just fine after DataStax was ejected. If you pay any attention to the current Cassandra development in the community, you’ll notice an increased focus on stability, testing and operational excellence.

Disclosure: I work for AWS, but this is my personal opinion.

I don't know of any company "killed" by AWS, and I don't think that Datastax will be either.

Heh you don’t know about them because they don’t exist anymore.

I know at least three founders who had to shut down after AWS launched a feature at re:invent.

One went on to start another company that was acquired by Facebook in an 8 figure deal.

Were they in angel / seed rounds? Which AWS features?

AWS contributes very little to open source and takes all the SaaS profits from the project founders.

This seems to be very good for people who don't want to be locked in to DDB. The prices are very similar to DynamoDB(slightly higher), and the model is pay as you go.

Prices: Managed Cassandra:

Write request: units $1.45 per million write request units

Read request: units $0.29 per million read request units

Storage: $0.30 per GB-month

Dynamo DB On Demand:

Write request: units $1.25 per million write request units

Read request: units $0.25 per million read request units

Storage: $0.25 per GB-month

Tangentially: I don't understand why they have storage costs so high for DDB still (hasn't changed since 2013) when Aurora's charging $0.10 GB-month (with 6 way redundancy at that, vs DDB's 3x)...

So many questions

- per-cell timestamps (this is IMPORTANT for online data migration with no downtime)?

- can you choose compaction strategy?

- access to sstables for rapid data loads/custom backups?

- triggers?

- UDT?

- how will upgrades work?

- are they using rocksandra or other techniques?

- how about a scylladb option?

They could have offered a managed ScyllaDB service and saved themselves and their customers a lot of money while staying CQL / Cassandra compatible. Or what am I missing here?

Scylla is licensed under AGPL. Most of the cloud services companies try to stay away from AGPL.

However, I think the real reason is not even that. I suspect underneath it uses modified version of DynamoDB and it's just wire compatible with Cassandra. That's how they are offering pay as you go pricing and auto-scaling.

Considering the kind of resources Apache Cassandra requires to run, I don't think they can offer this kind of pricing. This is the same company that charges $0.2 per hour to run a K8s cluster.

I don't work for AWS, but I do run Cassandra at scale for a living, and this line:

> Considering the kind of resources Apache Cassandra requires to run

Makes no sense. There is plenty of talent at AWS to run Cassandra proper on AWS for less money than they're currently charging for Dynamo. This was true before this announcement - it's how people like Instaclustr stay in business (containerized cassandra as a service on EC2).

Disclosure: I work for AWS

The new Amazon Managed Apache Cassandra Service does use the open source Apache Cassandra code. The team is excited about working with the development community on Cassandra. See more at https://aws.amazon.com/blogs/opensource/contributing-cassand...

Did you guys swap out the storage guts like Rocksandra did?

Can we access the sstables of our tables?

Can we do triggers and UDF?

Can we access the timestamps at a CELL LEVEL like cassandra proper (VERY important for online/downtimeless data migration)

License, ScyllaDB is AGPL 3.0.

Also maintainability I am assuming. ScyllaDB is written by a team of people experts in writing high performance low level code, and I am not sure it is easy for others to fork it easily. Cassandra being written in java and being a much more normal application (no DPDK), might be easier to fork.

Disclosure: I work for AWS, but this is my personal opinion.

We have lots of developers that write code that's similar to how ScyllaDB is built. Lots of DPDK. There are many other practical reasons why a fork should never be the first choice when building a service that you have to sustain for customers basically forever. Especially if there's active, high velocity development going on.

> what am I missing here?

Mostly that your believing Scylla marketing literature and paid benchmarks over what people actually use and experience. Theres thousands of large Cassandra deployments and a hand full of Scylla ones and it's not because people didn't do their research. If they did to ScyllaDB though it will probably kill the product, with Cassandra this is more of a positive for the community.

That's a strong accusation. Do you have a reference to any 'unpaid benchmarks', so we can judge ourselves?

I would like a scylladb option as well, but it isn't exactly feature parity with cass 3.11.X, to say nothing of 4.X

It started like gangbusters, but I get the feeling the dev money ran out because the feature rollout has completely stalled.

Aside: We have Scylla Cloud. It runs on AWS. Our next major release includes LWT and CDC, so any major gaps will be closed soon. And yes, we're looking at Cassandra 4.0 closely.

In some ways, we will implement features similarly to Cassandra but subtly and importantly different. Example: We do both local and global secondary indexes (Cassandra only has local). Also, our CQL LIKE function is closer in implementation to SQL LIKE than to Cassandra's. And so on.

btw: ScyllaDB just got $25M in new funding in September 2019. We're doing fine financially.

I don't think many enteprises are eager to have AGPL floating around in their repo's or has that changed?

I don't think google allowed checkins of AGPL to their repo but maybe this has all changed.

AGPL and immature.

One of the killer features of Cassandra is cross-region replication with location aware consistency for queries. It looks like in this preview of Manages Cassandra Service they are only supporting single regions clusters. I do hope they support multi-region clusters in the future.

It would also be nice if they documented any differences from Apache Cassandra, like if Amazon MCS improves secondary indexes so they can be used in more cases.

I personally don't trust dynamodb's mat views/secondary indexes since they never explain how they maintain index coherency behind their black box.

If they do have better tech for that, I also hope they can solve the mat view problems in cassandra 3.x

It's about freakin time. Dynamo's lack of query language has always been a pain point for me. Congrats AWS team!

Is there any reason to actually use Cassandra instead of Scylla?

Biased as a cassandra committer, but:

- Feature sets aren't the same yet

- History / maturity

- License

- Actual savings dont tend to match proposed savings

- Development isn't driven by a startup that AWS can kill with an announcement like this.

The biggest feature still missing in Scylla is Lightweight Transactions.

Otherwise no, use ScyllaDB instead for much better performance, automatic tuning, minimal maintenance, and better feature implementations like truly scalable global secondary indexes and materialized views.

I’m curious if this plays out like their MongoDB debacle where Amazon burned all their goodwill, only extracting value from the community and returning nothing

The contributors for Cassandra are from a group of companies that use it, not a single company that is trying to be profitable from it. I think this is a positive for the community, even if they don't contribute back since it would provide an easy getting started platform.

ASF != Mongo

Amazon taking from open source without giving back.

Right. Total bullshit PR you're supposed to be smarter than.

Cassandra hasn't changed their license like MongoDB yet?

Cassandra is an Apache project and will always have an Apache license.

It’s not a startup-driven product that is worried about AWS as a competitor.

No, nor do I know of any desire to do so

I'd bet money this is actually dynamodb and will be insanely expensive.

Pricing is available at https://aws.amazon.com/mcs/pricing/

MCS has a very similar pricing model to on-demand mode DynamoDB (https://aws.amazon.com/dynamodb/pricing/on-demand/) but is ~15% more expensive on all line items.

Wow super psyched I got triple downvoted when I dared to say something mean against one of the great deathstar companies. I've noticed if I ever dare to say anything mean against amazon or google I get instantly downvoted. It is more reliable than a bot.

Anyway, dynamodb is insanely expensive compared to postgresql and I'd recommend only using it when you need to expose a datastore directly to the user as it's attribute based access control is easy to use. Postgresql serverless is now a thing as well. Anyway, if you truely want a WAN datastore, exposed to web clients, with attribute based access control, run dynamo.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact