Kudos for AWS for the ability to launch so many services, many of them are competing with each other or complimentary.
At first I (disclosure, ScyllaDB co-founder) called their serverless a bluff but I gave it a try and it's nice to create a table without waiting for any server. That said, I know personally that Dynamo scales really slow so they wouldn't catch up the speed. It also say 1 digit ms latency but state the plan to improve jvm overhead.
Another funny thing is (regardless of tech) that AWS folks have wonderful people talking about open source but this very solution isn't open at all. It's impossible to figure out what's Cassandra and what's Dynamo. How about a diagram?
Lastly, it's pricey (for 1M iops you'll pay $3.5M/year!!) and doesn't have counters, UDT, materialized views and plenty other features.
If you got down here, hey, give ScyllaDB a try, it's OSS (AGPL) and as a service on AWS and has features that neither C* or MCS has - workload prioritization for instance.
Dor
Just curious about AGPL. Do you feel that AGPL gives you a hedge against a cloud service deploying your project at scale for profit that the project would otherwise miss out on?
I ask this as someone who has been sitting on some code for 10 years for precisely this reason!
To some extent yes. It's not 100% protection.
AGPL is just the right model, closes the loophole that GPL left. If you get to change the code, you must contribute the changes back. That's all
AGPL is viral as well. Many companies believe client libraries will also fall into AGPL scope, and therefor any apps using these clients will also have to be AGPL licensed.
A: The "one additional feature" - Section 2(d) reads as follows: " If the Program as you received it is intended to interact with users through a computer network and if, in the version you received, any user interacting with the Program was given the opportunity to request transmission to that user of the Program's complete source code, you must not remove that facility from your modified version of the Program or work based on the Program, and must offer an equivalent opportunity for all users interacting with your Program through a computer network to request immediate transmission by HTTP of the complete source code of your modified version or other derivative work."
"This enables the original author... to download the source and receive the benefit of any modifications to its original work.
So if a cloud vendor (or anyone) decided to take your software and use it, if they made modifications to it you are entitled to get access to those modifications. For many people this is a barrier to entry -- they want the ability to modify it in however they want (which they can still do privately under AGPL). But if they offer it in public, they have to share their mods back to the community.
At least, that is my layperson's understanding of it. I am not an intellectual property lawyer and do not pretend to be.
Quite arrogant attitude, dude. DynamoDB is not slow in scaling, quite the opposite. (disclosure: 6+ years at AWS, 2008-2014, and I know the main guy that designed DynamoDB).
If you want your startup to succeed, I suggest your first step should be to avoid criticizing the competition without substantiating it with hard evidence.
Be humble. There's plenty of space for great solutions, you don't need to bash the huge elephant in the room to get started.
> If you want your startup to succeed, I suggest your first step should be to avoid criticizing the competition without substantiating it with hard evidence.
From scylladb's site: Scylla Cloud vs DynamoDB benchmarks
AWS can launch so many services since they are not developing any of them and are just stealing the open source contributions of other companies/developers without contributing anything back.
AWS isn't stealing anything. They're hosting and running software for paying customers who want solutions to their problems, not licenses they have to run themselves. That's the whole point of managed services and the rise of cloud computing. Do you consider independent MSPs like Aiven and Instaclustr to be "stealing" too?
Vendors should compete by creating their own cloud offerings, and many are finally getting around to it.
They are disincentivizing (made up word) true open source development and using their monopoly to offer services that vendors cannot compete with easily.
I don't see how any of that is true. Customers want managed services, why aren't vendors meeting that demand?
For example, why hasn't Datastax rolled out a cloud offering on AWS yet? All this time and they have no solution. Even ScyllaDB just launched a minimal service this year. I would expect a db vendor to be able to wire up some APIs and at least get a basic deployment running instead of complaining about AWS offering it first.
Even when they do, there is a problem. They specialize in DB code. AWS specializes in infra... and can just take the DB code as well. That is a pretty huge competitive advantage (while the db company has to develop that second specialty). Even worse, there's the further monopoly advantage that companies can just role it into their AWS bill instead of doing another vendor evaluation. The fact that Amazon has almost all the competitive advantages for offering someone else's code it what seems unfair.
(the one they don't have is the ability to offer super specialized support and modifications for their customers, which is where a lot of the database companies have made their money - on consulting)
The DB vendor who created the database is worse at running it than AWS? No, that isn't true.
If you don't want others using your source code then don't be open source or use a different license. There are thousands of closed proprietary products out there doing just fine, for example all of AWS own managed services.
Didn't see that before. Still a poor implementation compared to what's available. And I mentioned Instaclustr, so if you're fine with them offering Cassandra then AWS is no different.
Sure, because that's what customers are fine with. Again the argument has nothing to do with open-source. Everyone has the same challenges when competing with a bigger company selling the same thing.
It's not a great implementation, the biggest advantage is consolidated billing. Unfortunately it's also ignoring the biggest cloud so they have no excuse against AWS launching this service.
Because they have limited resources and have to support existing customers? And they were doing the right thing by contributing everything back to the open source community and weren't expecting Amazon to offer a service?
That's called business competition. Should startups shut down because a bigger company might be selling the same thing?
There's no expectation of contribution with open-source software and the vast majority of users don't add anything. That's how it goes. If you don't want to give it away, then simply don't give it away.
DB vendors earn revenue by charging for features, support and services, just like AWS. There's nothing stopping them from selling proprietary closed-source products and I can name dozens that are doing well.
"There's no expectation of contribution with open-source software and the vast majority of users don't add anything. That's how it goes. If you don't want to give it away, then simply don't give it away." - Vendors are wising up to it now and changing their licenses. Amazon definitely took advantage of this before the days of AGPL. I agree that there's no expectation of contribution. As I said before, it just discourages open source in general because of what Amazon might do. My original comment was in response to someone asking how Amazon is able to launch these services quickly. My response was that the bulk of the work is being done by someone else. I guess that's not new. On the retail side, Amazon takes products that sell well and then offer an Amazon Basics version of the same product.
As did millions of other users and companies. If you give away source code then people will probably use it. Don't do it if you don't want that to happen.
If they're changing licenses now then good for them, but you could say they also took advantage of the free marketing from open-source and now put most of the development into enterprise versions. There's no right vs wrong side here.
There's a difference between people using it and making money off it. What Amazon is doing is putting a wrapper around other people's hard work and claiming it as their own.
My guess is that this is on-demand DynamoDB wrapped with a Cassandra-compatible frontend. Just based on the pricing and stated characteristics. It would be really hard to provide on-demand capacity for compute and storage using anything remotely similar to off-the-shelf Cassandra. The pricing looks like DynamoDB on-demand + a bit extra to cover the cost of operating the frontend.
Off-the-shelf Cassandra has a fair amount of flexibility in its architecture. Open Source Cassandra is the thing that is in the front end. And the team is excited about the idea of collaborating with the development community around some of the generally useful abstractions needed to build this managed database experience. More at https://aws.amazon.com/blogs/opensource/contributing-cassand...
Here are some examples from my team’s 2019 work: We contributed numerous changes to containerd. We open sourced firecracker-containerd, and we also created a Go SDK that others are using to work with Firecracker. We contributed to Debian and the Debian kernel team. We contributed to Envoy. We collaborated with a number of communities, including Kata Containers, Red Hat’s Clair, and the Open Container Initiative. All of these examples are sustained investments, not one offs.
"Amazon EMR has been adding Spark runtime improvements since EMR 5.24, and discussed them in Optimizing Spark Performance. EMR 5.28 features several new improvements."
Have these improvements been contributed back to Spark? When I take a look at the improvements themselves, it looks like all Amazon did was upgrade Spark from 2.3 to 2.4.
EMR isn't open source but Spark is. What does the EMR Spark Runtime if not offer Spark as a service? And the changes to optimize spark runtime, why were they not contributed back to upstream Spark?
This is just an example. I'm sure there are many others. The developers that take the time to contribute to Spark are making Spark a better product. Amazon is not making it better. Amazon should not claim they made improvements to Spark in a newer version. What they did was upgrade Spark to 2.4 and claim that the improvements were done by them whereas in reality they were done by the community.
Sorry, I meant committers. I'm asking since the link you posted talks about open source events and has some job postings. I was looking for actual contributions by Amazon employees.
That's a remarkable level of openness for Amazon. When Kinesis launched we were explicitly told we couldn't say anything publicly about the implementation (I don't know if the NDA has expired since).
They are way more open now. Before, it was a strong advantage of Google that you could hear from their engineers. AWS seems to have definitely pivoted on that since then.
This explanation does make a lot of sense. But why not just add a Cassandra interface to DynamoDB the way they added a Postgres interface to Aurora RDS?
They didn’t add a Postgres front end to RDS. They took the Postgres open source code and (grossly simplified) changed the storage layer and added more AWS specific features. Aurora/MySQL and Aurora/Postgres or more or less AWS specific forks of the respective projects.
At reddit, we used Cassandra, and it was a huge pain to manage. At Netflix we used it, and had a whole team of engineers that built tools just to manage Cassandra.
If this service had existed then, it would have made life so much easier!
I find it a bit weird, that company that created a DynamoDB is now supporting a data store that supposed to be an open source DynamoDB. Does that mean Cassandra is better than DynamoDB since people would still prefer to use Cassandra in AWS?
Hasn't this been amazon's MO the last couple years? Competition with their in-house versions hasn't prevented AWS from building anything before.
It's not really an admission that one is "better" than the other, it's just an admission that people like managed drop-in replacements for tools that they're already using.
I briefly read through those leaked FB files, I think they're on NBC News. There was an email and other documents that showed they were thinking about opening a cloud offering multiple times, but it seems they never got round to launching it in the end.
Dynamodb the product != the Dynamo paper which Cassy is based off. I was using Cassy in 2011 before Amazon asked me to use big bird which was the code name for DynamoDB.
I struggle to think of any similarities, actually. I think DynamoDB's design makes it clear that the era of decentralized architecture/eventual consistency at Amazon has come and gone. DHTs just don't make sense in the data center.
Do you mean that amazon is tending towards strong or externalizable consistency now?
I’ve always noticed AWS products tend towards eventual consistency, and Google Cloud offerings are almost all strongly consistent.
For example S3 is eventually consistent for read after write (if it’s not the first write). Google cloud storage is strongly consistent for read after all writes.
I think the goal is for S3 to "eventually" be strongly consistent. I heard one of S3's designers say that they shouldn't have opted for eventual consistency in retrospect.
Not sure what you mean, DDB provides both strong and eventual consistency, now in 2 fashions with transactions. 3 if you include cross region replication.
There's definitely still a DHT sitting under there.
There's no real difference between a DHT (distributed hash table) and a hash-partitioned distributed database. It's the same concept. Apache Cassandra is considered an implementation of a DHT.
There are a lot of libraries and tools that interact with Cassandra. Having an API compatible service solves a lot of problems for companies that are already invested in Cassandra.
Obligatory I work for Amazon and this is my personal opinion.
As a customer, it's feels way better to have both options like this in AWS, and I'm sure DDB and managed Cassandra will always have pros and cons in standalone functionality and ecosystem integration.
I just left a company that used an in-house cloud object storage product for their cloud DVR service. It was an amazing piece of tech that was hamstrung by some of the most irritating "features" of Cassandra. We spent more time cleaning up than anything else. I am looking forward to seeing how people use this at scale.
The difference is massive. Cassandra was hard to manage and after many years of our team using it still had random spikes. RocksDB+Raft has been extremely solid, doesn't require any maintenance, load times are flat, zero spikes.
Cassandra was awesome, but it definitely has some issues. That's also why companies like ScyllaDB see space in that market. I wonder if AWS's cassandra implementation is better than regular cassandra.
> Cassandra was hard to manage and after many years of our team using it still had random spikes
A lot of that is getting better over time. A non-trivial percentage of contributions are coming from companies that are only looking to improve their operational story, not adding new features.
Most of the talks at NGCC (Next generation cassandra conference) focused on operational improvements - it's something we care a lot about (myself especially).
Master stroke! Especially the pricing and autoscaling. Till they brought autoscaling and pay as you go pricing for DynamoDB, Google cloud data store was superior product (at least on paper) as you didn't need to think about preprovisioning the capacity. It is supposed to just scale. They have brought the same model to Cassandra. So no vendor lock in!
Also one of the reasons small projects stayed away from Cassandra is the requirement of an hefty cluster to get decent performance out of it. That too is taken care by AWS and made Cassandra much more accessible.
Now only thing that needs to be seen is how good this AWS product is. Especially for first year many AWS services get very average reviews.
Excited to see a managed open source offering with the same pricing paradigm as pay-per-request DynamoDB. I frequently encounter projects for which a Dynamo-style storage layer would be a natural fit, but often use RDS Postgres because I don't want to lock projects into the AWS ecosystem or sign teams up for operating their own Cassandra or Riak clusters.
So instead of using the best solution, you chose to use a non optimal solution for fear of “lock in”? Were they not using any other AWS services? If not, why pay more for a cloud provider than a colo and not get any of the benefits of it?
I would disagree with that characterization and would instead say that DynamoDB's lack of portability between vendors made it a non-optimal solution. The services I work on need to survive specific vendors going out of business.
I'm a civil servant, so I look at these questions with a different mindset. I consider adopting a proprietary service to be a decision that entails a fair amount of risk, as I would have no easy recourse should the service become completely unavailable for reasons beyond my organization's control. This could happen for a number of reasons, including the vendor going out of business or raising prices beyond my authority to pay.
I might find that risk acceptable in some cases (e.g., most people who work at my agency use Windows on their workstations, some critical services are still running on z/OS, we use tools like Splunk and New Relic for monitoring), but it might not be worth it in other cases. This would get weighed against other forms of risk (such as the risk that a hand-rolled Cassandra cluster would have less availability than DynamoDB or the risk that you might spend more money than necessary using RDS instead of on-demand DynamoDB).
Yes, as mentioned above, we do use proprietary solutions where appropriate. However, all else being equal, I would prefer a managed but replicable service (e.g., MCS or Aurora MySQL) over a purely proprietary one, because deploying a replacement Cassandra or Vitess cluster is usually less costly than rewriting an application to use a different data store.
Amazon going out of business isn’t the only reason you shouldn’t be happy to be tied to them.
If you want decent “high availability” and failover when shit hits the fan, that means multi vendor. Who else offers dynamodb?
If amazon were to increase the price of dynamo db 10x would you still just keep using it, because you’re already using it, and eat the 10x cost increase?
The amount of comments that essentially boil down to “amazon is not going anywhere, you’re stupid for worrying about lock in” is both staggering and depressing.
You mean if multiple availability zones go down? In AWS’s entire existence, have they been known to raise prices?
How much time, energy, and development effort are you willing to spend on “avoiding vendor lock in” in the off chance that you will move your entire infrastructure as opposed to spending those same resources creating either revenue generating features or cost saving features?
If you’re using a cloud provider as a glorified overpriced colo, you have the worse of all worlds. You’re spending more on resources and just as much babysitting infrastructure.
It’s just like the bushy tailed “architects” who create layers of “factories” and “repositories” just in case their CTO wakes up one day and decides to move their companies six figure Oracle installation to Postgres. All the while creating suboptimal queries to avoid using Oracle specific functionality.
So far most major AWS instances I’ve paid attention to have been ultimately caused by their own Rube Goldberg inspired infra. Nothing at AWS just is something,
it practically all relies on something else at AWS, and when there are outages at the apparently lowest level, the issues are wide spread.
With such convoluted systems, fat finger syndrome seems to be a not insignificant factor in their downtime, and the interdependence just makes it blow up.
But sure. If you want to trust everything to aws you go right ahead and do that.
As for rising prices - I have no idea - no company has done anything until they do it the first time. AWS doesn’t really need to increase prices to be a more expensive solution for the vast majority of companies using it.
If you don’t rely on proprietary aws “solutions” in the first place, there’s no extra “time and cost” involved. It’s just running your setup process - whatever that may be - with another location, another vendor, whatever.
Like I said if you want real HA you’re going to be running in multiple vendors all the time - it’s not something you’re going to say “well shit aws is down again let’s go sign up for azure”.
You’re going to sit back and eat crumpets because your site is running fine in spite of aws or azure or whoever’s latest brown pants event.
And to be clear I’m not suggesting using aws is a smart move over bare metal or even just regular rented virtual machines at a normal facility - the concept of HA across vendors applies the same.
Yes because colos never have a problem with reliability and most companies have better managed infrastructure than AWS/Azure/GCP.
How many companies need higher reliability than you get from any of the cloud vendors if you architect your site to across multiple AZ’s or multiple regions?
And “running your setup” means duplicating functionality on VMs where you could use managed offerings - the absolutely most expensive and least reliable way of using cloud providers and it costs more in time and resources to manage.
And you are ignoring how much money you can save by not needing as many infrastructure people.
Heck, half the time you can get away with having a much cheaper shared services/managed service provider.
Are all of the companies big and small who are using cloud vendors proprietary solutions delusional?
I see you're a graduate of the school of strawman tactics.
Of course traditional colo and rental VM hosting have outages. That's literally why I said, multiple times, if you want actual HA, you need to be using multiple vendors, regardless of what that vendor provides you - whether it's bare racks or a web GUI to "push a button to make it go now". I didn't explicitly state it, but I kind of assumed you'd realise that means different vendors in different physical DCs/locations.
Complaining to me that using basic VMs in a "cloud" service is more expensive, is like complaining to a duck that water is wet. No shit, EC2 is more expensive than even a regular rented VPS/VM service from a more 'traditional' hosting service, and much more expensive than either renting or owning physical gear in a rack.
I didn't suggest you use EC2 or AWS at all - but just because you use self-managed services doesn't mean you can't take advantage of the one thing a "cloud" service offers which traditional VM hosting doesn't: essentially instant spin up and time usage billing.
If you want to split your workload across two or three cloud providers, and run resources split such that you have just slightly more than 100% of the resources you need for regular operations, and then when (not if) one of those providers has an outage, you increase the capacity at the other provider(s) to handle the increased load it'll handle.
I'm not even going to dignify the "we don't need infra people" comment because it's not even a bad joke any more, it's more like a warning of management who have no fucking idea what is involved.
I don't know what motivations each company has when deciding what technologies they should use. But if you're suggesting that companies don't ever make bad choices because of (a) uninformed/misinformed management decisions, or (b) short sightedness, I'll kindly suggest you're either being very sarcastic, or you're very naive.
I'm not even going to dignify the "we don't need infra people" comment because it's not even a bad joke any more, it's more like a warning of management who have no fucking idea what is involved.
I didn’t say that you didn’t need any I said that you didn’t need “as many”. But yes, at smaller companies you can get away with no dedicated infrastructure people and just use a managed service provider. At a slightly larger company you can get away with a few people on site that manage your MSP.
So you want HA by running in multiple DCs - Exactly what happens when you run in multiple AZs and/or regions.
But if you're suggesting that companies don't ever make bad choices because of (a) uninformed/misinformed management decisions, or (b) short sightedness, I'll kindly suggest you're either being very sarcastic, or you're very naive.
So you think, Netflix for instance, who started off running all of their own servers and now are AWS biggest customers were being “naive”? Instead of thinking that all of these companies are being irrational - including major enterprises - by using cloud providers and their proprietary servers, maybe they know something that you don’t know?
Smaller companies don't necessarily need dedicated infra people regardless. My point is that using "a cloud" doesn't change your level of infrastructure experience/knowledge needs, it just changes what they need to know.
.... You're either not reading what I wrote or being deliberately obtuse. I said multiple vendors, in different DCs. The same vendor in two DCs is not as good as two different vendors in two different DCs.
I didn't say the companies are naive. I said you are being naive, if you think companies haven't made bad decisions.
My point is that using "a cloud" doesn't change your level of infrastructure experience/knowledge needs, it just changes what they need to know.
It very much does change what they need to know. You don’t need to know how to set up a database with multi region failover, load balancers, server maintenance, switches, routers, storage, firewalls, etc. Have you ever used managed services at any scale?
Yes I’ve done both - hosted our own servers in house.
A good chunk of my work is getting clients out of shit situations with "The cloud" because someone drank too much of the "Cloud means no more ops" Kool aid.
For most small to medium companies, the alternative to a managed AWS service is not "lets go buy some switches".
It's "let's use open source software on rented virtual machines". The "cloud" model is only useful if your staff have no idea how a database server works. If they do, it's going to make a heap of basic tasks harder (and more expensive) because you don't have access to the software itself.
I'm done discussing this with you. You can make all the same arguments everyone else does when trying to justify "the cloud", and you won't convince me, because your arguments are, as usual for this type of "discussion" comparing against the most extreme alternatives.
Right from the start you've declared literally no cost of being at the complete mercy of a single vendor for your entire infrastructure (and one with a history for dirty tactics to "win" a market)
If that approach works for you, good for you. I, and my clients once it's bitten them, aren't willing to do that.
And your method of getting them out of “shit situations” is not by showing them how to do it correctly - it’s by moving them to something you know.
So now, the same people who don’t have the expertise to manage a colo, are now going all of the sudden have the expertise to manage VMs and open source alternatives and know how to manage a fault tolerant multi region database and other HA setups at multiple colos?
Again, how much experience do you personally have with actually using cloud services from the big three? I’ve done both, I had to. The cloud vendors didn’t exist when I started. Heck we had a “server room” with raised floors for our “massive” 2TB SAN.
How long have you been an employed programmer? On the scale of decades things can change dramatically.
In theory (based on the cost of an instance vs the cost to get your own) AWS is grossly overpriced, and SOMEONE will eventually beat them by a very large margin for commodity services.
AWS knows this, and probably is behind all its custom features.
Well, if the 74 in my username doesn’t give it away, quite awhile. But just to give you a hint, my first professional contract in college was writing a Gopher site. My first hobby projects involved writing 65C02 assembly in the mid 80s.
As far as IBM, if someone had chosen to get “locked-in” to IBM in the 70s, they could still buy new hardware that could run their old software unmodified. Isn’t that an argument for using AWS?
In the scale of decades, most of the time you will be performing heavy rewrites anyway, are you really trying to optimize now just in case in 20 years you might want to move to something else?
Are you using the same language and frameworks you were using 20 years ago? When I first started developing I was writing C programs on DEC VAX, Stratus VOS mainframes.
If you are just using AWS to host a bunch of EC2 instances and as an overpriced colo, sure you could find much cheaper options now.
No matter what you do, more than likely you will have to migrate. Just changing infrastructure if all you’re doing is moving to VMs on another provider, reconfiguring your network if you are using a hybrid, etc is going to be a heavy lift no matter how much you try to avoid lock in and you’re spending money now for an amorphous future where you may want to change vendors instead of using your vendor of choices features that can save you money and/or time.
I would make the same argument if we were talking about Azure or God forbid Oracle.
If AWS runs everyone's open source for them as a metered service and people don't run their own software... then they can't edit that software and make contributions back into open source. The open source model breaks, no new projects except from the likes of Amazon.
AWS would have to step up open source contributions to match users' and they in fact do the opposite: they are extreme leeches compared to Google and Microsoft.
We begged Datastax to provide a managed service in AWS. We used Datastax for consulting and for their repackaging of Cassandra with admin tools. After perpetual pain with Cassandra management, and little belief that Datastax would enter the market for managed services, we eventually decided to stop everything and rewrite to Dynamo.
Datastax was undeniably a beneficial force behind earlier cassandra, and I initially thought the Apache board pseudo-ejecting them from the project governance was foolish, but I do think they were starting to mess with cassandra a bit too much.
They can probably support their product line atop this just fine. They really make their money on integrating cassandra and a bunch of other big data frameworks in one neat little package.
I usually stay away from politics but the Apache Cassandra is doing just fine after DataStax was ejected. If you pay any attention to the current Cassandra development in the community, you’ll notice an increased focus on stability, testing and operational excellence.
This seems to be very good for people who don't want to be locked in to DDB. The prices are very similar to DynamoDB(slightly higher), and the model is pay as you go.
Prices:
Managed Cassandra:
Write request: units $1.45 per million write request units
Read request: units $0.29 per million read request units
Storage: $0.30 per GB-month
Dynamo DB On Demand:
Write request: units $1.25 per million write request units
Read request: units $0.25 per million read request units
Tangentially: I don't understand why they have storage costs so high for DDB still (hasn't changed since 2013) when Aurora's charging $0.10 GB-month (with 6 way redundancy at that, vs DDB's 3x)...
They could have offered a managed ScyllaDB service and saved themselves and their customers a lot of money while staying CQL / Cassandra compatible. Or what am I missing here?
Scylla is licensed under AGPL. Most of the cloud services companies try to stay away from AGPL.
However, I think the real reason is not even that. I suspect underneath it uses modified version of DynamoDB and it's just wire compatible with Cassandra. That's how they are offering pay as you go pricing and auto-scaling.
Considering the kind of resources Apache Cassandra requires to run, I don't think they can offer this kind of pricing. This is the same company that charges $0.2 per hour to run a K8s cluster.
I don't work for AWS, but I do run Cassandra at scale for a living, and this line:
> Considering the kind of resources Apache Cassandra requires to run
Makes no sense. There is plenty of talent at AWS to run Cassandra proper on AWS for less money than they're currently charging for Dynamo. This was true before this announcement - it's how people like Instaclustr stay in business (containerized cassandra as a service on EC2).
Also maintainability I am assuming. ScyllaDB is written by a team of people experts in writing high performance low level code, and I am not sure it is easy for others to fork it easily. Cassandra being written in java and being a much more normal application (no DPDK), might be easier to fork.
Disclosure: I work for AWS, but this is my personal opinion.
We have lots of developers that write code that's similar to how ScyllaDB is built. Lots of DPDK. There are many other practical reasons why a fork should never be the first choice when building a service that you have to sustain for customers basically forever. Especially if there's active, high velocity development going on.
Mostly that your believing Scylla marketing literature and paid benchmarks over what people actually use and experience. Theres thousands of large Cassandra deployments and a hand full of Scylla ones and it's not because people didn't do their research. If they did to ScyllaDB though it will probably kill the product, with Cassandra this is more of a positive for the community.
Aside: We have Scylla Cloud. It runs on AWS. Our next major release includes LWT and CDC, so any major gaps will be closed soon. And yes, we're looking at Cassandra 4.0 closely.
In some ways, we will implement features similarly to Cassandra but subtly and importantly different. Example: We do both local and global secondary indexes (Cassandra only has local). Also, our CQL LIKE function is closer in implementation to SQL LIKE than to Cassandra's. And so on.
btw: ScyllaDB just got $25M in new funding in September 2019. We're doing fine financially.
One of the killer features of Cassandra is cross-region replication with location aware consistency for queries. It looks like in this preview of Manages Cassandra Service they are only supporting single regions clusters. I do hope they support multi-region clusters in the future.
It would also be nice if they documented any differences from Apache Cassandra, like if Amazon MCS improves secondary indexes so they can be used in more cases.
The biggest feature still missing in Scylla is Lightweight Transactions.
Otherwise no, use ScyllaDB instead for much better performance, automatic tuning, minimal maintenance, and better feature implementations like truly scalable global secondary indexes and materialized views.
I’m curious if this plays out like their MongoDB debacle where Amazon burned all their goodwill, only extracting value from the community and returning nothing
The contributors for Cassandra are from a group of companies that use it, not a single company that is trying to be profitable from it. I think this is a positive for the community, even if they don't contribute back since it would provide an easy getting started platform.
Wow super psyched I got triple downvoted when I dared to say something mean against one of the great deathstar companies. I've noticed if I ever dare to say anything mean against amazon or google I get instantly downvoted. It is more reliable than a bot.
Anyway, dynamodb is insanely expensive compared to postgresql and I'd recommend only using it when you need to expose a datastore directly to the user as it's attribute based access control is easy to use. Postgresql serverless is now a thing as well. Anyway, if you truely want a WAN datastore, exposed to web clients, with attribute based access control, run dynamo.
At first I (disclosure, ScyllaDB co-founder) called their serverless a bluff but I gave it a try and it's nice to create a table without waiting for any server. That said, I know personally that Dynamo scales really slow so they wouldn't catch up the speed. It also say 1 digit ms latency but state the plan to improve jvm overhead.
Another funny thing is (regardless of tech) that AWS folks have wonderful people talking about open source but this very solution isn't open at all. It's impossible to figure out what's Cassandra and what's Dynamo. How about a diagram?
Lastly, it's pricey (for 1M iops you'll pay $3.5M/year!!) and doesn't have counters, UDT, materialized views and plenty other features.
If you got down here, hey, give ScyllaDB a try, it's OSS (AGPL) and as a service on AWS and has features that neither C* or MCS has - workload prioritization for instance. Dor
reply