At first I (disclosure, ScyllaDB co-founder) called their serverless a bluff but I gave it a try and it's nice to create a table without waiting for any server. That said, I know personally that Dynamo scales really slow so they wouldn't catch up the speed. It also say 1 digit ms latency but state the plan to improve jvm overhead.
Another funny thing is (regardless of tech) that AWS folks have wonderful people talking about open source but this very solution isn't open at all. It's impossible to figure out what's Cassandra and what's Dynamo. How about a diagram?
Lastly, it's pricey (for 1M iops you'll pay $3.5M/year!!) and doesn't have counters, UDT, materialized views and plenty other features.
If you got down here, hey, give ScyllaDB a try, it's OSS (AGPL) and as a service on AWS and has features that neither C* or MCS has - workload prioritization for instance.
I ask this as someone who has been sitting on some code for 10 years for precisely this reason!
So many companies ban AGPL completely.
"This enables the original author... to download the source and receive the benefit of any modifications to its original work.
So if a cloud vendor (or anyone) decided to take your software and use it, if they made modifications to it you are entitled to get access to those modifications. For many people this is a barrier to entry -- they want the ability to modify it in however they want (which they can still do privately under AGPL). But if they offer it in public, they have to share their mods back to the community.
At least, that is my layperson's understanding of it. I am not an intellectual property lawyer and do not pretend to be.
Quite arrogant attitude, dude. DynamoDB is not slow in scaling, quite the opposite. (disclosure: 6+ years at AWS, 2008-2014, and I know the main guy that designed DynamoDB).
If you want your startup to succeed, I suggest your first step should be to avoid criticizing the competition without substantiating it with hard evidence.
Be humble. There's plenty of space for great solutions, you don't need to bash the huge elephant in the room to get started.
From scylladb's site: Scylla Cloud vs DynamoDB benchmarks
I find your lack of self awareness hilarious. You know the guy who designed dynamodb! Wow. Could I get you to send me his autograph?
Vendors should compete by creating their own cloud offerings, and many are finally getting around to it.
For example, why hasn't Datastax rolled out a cloud offering on AWS yet? All this time and they have no solution. Even ScyllaDB just launched a minimal service this year. I would expect a db vendor to be able to wire up some APIs and at least get a basic deployment running instead of complaining about AWS offering it first.
(the one they don't have is the ability to offer super specialized support and modifications for their customers, which is where a lot of the database companies have made their money - on consulting)
If you don't want others using your source code then don't be open source or use a different license. There are thousands of closed proprietary products out there doing just fine, for example all of AWS own managed services.
Umm... they do (https://www.datastax.com/platform/amazon-web-services). Instacluster also has an offering. Which kind of highlights the problem.
There's no expectation of contribution with open-source software and the vast majority of users don't add anything. That's how it goes. If you don't want to give it away, then simply don't give it away.
DB vendors earn revenue by charging for features, support and services, just like AWS. There's nothing stopping them from selling proprietary closed-source products and I can name dozens that are doing well.
If they're changing licenses now then good for them, but you could say they also took advantage of the free marketing from open-source and now put most of the development into enterprise versions. There's no right vs wrong side here.
EDIT: confirmed by an AWS employee here https://twitter.com/_msw_/status/1201924979647905792
Off-the-shelf Cassandra has a fair amount of flexibility in its architecture. Open Source Cassandra is the thing that is in the front end. And the team is excited about the idea of collaborating with the development community around some of the generally useful abstractions needed to build this managed database experience. More at https://aws.amazon.com/blogs/opensource/contributing-cassand...
Here are some examples from my team’s 2019 work: We contributed numerous changes to containerd. We open sourced firecracker-containerd, and we also created a Go SDK that others are using to work with Firecracker. We contributed to Debian and the Debian kernel team. We contributed to Envoy. We collaborated with a number of communities, including Kata Containers, Red Hat’s Clair, and the Open Container Initiative. All of these examples are sustained investments, not one offs.
"Amazon EMR has been adding Spark runtime improvements since EMR 5.24, and discussed them in Optimizing Spark Performance. EMR 5.28 features several new improvements."
Have these improvements been contributed back to Spark? When I take a look at the improvements themselves, it looks like all Amazon did was upgrade Spark from 2.3 to 2.4.
The page I linked lists plenty of projects if you're looking for actual OSS work.
Again this seems like a random example. What is so important about this particular change over all the other open-source contributions?
Amazon, Microsoft, Google, Oracle, IBM and others are all major contributors to open-source software.
What you are saying would be more like adding a postgres frontend to mysql.
If this service had existed then, it would have made life so much easier!
It's not really an admission that one is "better" than the other, it's just an admission that people like managed drop-in replacements for tools that they're already using.
Honestly having a SaaS and the hosting instances of your competing offerings is probably a good strategy.
Do you mean that amazon is tending towards strong or externalizable consistency now?
I’ve always noticed AWS products tend towards eventual consistency, and Google Cloud offerings are almost all strongly consistent.
For example S3 is eventually consistent for read after write (if it’s not the first write). Google cloud storage is strongly consistent for read after all writes.
There's definitely still a DHT sitting under there.
As they aren't, even new projects will often not use DynamoDB to avoid vendor lock-in
As a customer, it's feels way better to have both options like this in AWS, and I'm sure DDB and managed Cassandra will always have pros and cons in standalone functionality and ecosystem integration.
The difference is massive. Cassandra was hard to manage and after many years of our team using it still had random spikes. RocksDB+Raft has been extremely solid, doesn't require any maintenance, load times are flat, zero spikes.
Cassandra was awesome, but it definitely has some issues. That's also why companies like ScyllaDB see space in that market. I wonder if AWS's cassandra implementation is better than regular cassandra.
A lot of that is getting better over time. A non-trivial percentage of contributions are coming from companies that are only looking to improve their operational story, not adding new features.
Most of the talks at NGCC (Next generation cassandra conference) focused on operational improvements - it's something we care a lot about (myself especially).
I'm hoping rocksandra will solve a lot of that.
Also one of the reasons small projects stayed away from Cassandra is the requirement of an hefty cluster to get decent performance out of it. That too is taken care by AWS and made Cassandra much more accessible.
Now only thing that needs to be seen is how good this AWS product is. Especially for first year many AWS services get very average reviews.
Would you also refuse to use Windows just in case Microsoft went out of business?
I might find that risk acceptable in some cases (e.g., most people who work at my agency use Windows on their workstations, some critical services are still running on z/OS, we use tools like Splunk and New Relic for monitoring), but it might not be worth it in other cases. This would get weighed against other forms of risk (such as the risk that a hand-rolled Cassandra cluster would have less availability than DynamoDB or the risk that you might spend more money than necessary using RDS instead of on-demand DynamoDB).
Oracle: $3 billion.
Microsoft $7.6 Billion
Both AWS and Azure have separate gov regions. Civil servants have never been adverse to spending money on proprietary solutions.
If you want decent “high availability” and failover when shit hits the fan, that means multi vendor. Who else offers dynamodb?
If amazon were to increase the price of dynamo db 10x would you still just keep using it, because you’re already using it, and eat the 10x cost increase?
The amount of comments that essentially boil down to “amazon is not going anywhere, you’re stupid for worrying about lock in” is both staggering and depressing.
How much time, energy, and development effort are you willing to spend on “avoiding vendor lock in” in the off chance that you will move your entire infrastructure as opposed to spending those same resources creating either revenue generating features or cost saving features?
If you’re using a cloud provider as a glorified overpriced colo, you have the worse of all worlds. You’re spending more on resources and just as much babysitting infrastructure.
It’s just like the bushy tailed “architects” who create layers of “factories” and “repositories” just in case their CTO wakes up one day and decides to move their companies six figure Oracle installation to Postgres. All the while creating suboptimal queries to avoid using Oracle specific functionality.
With such convoluted systems, fat finger syndrome seems to be a not insignificant factor in their downtime, and the interdependence just makes it blow up.
But sure. If you want to trust everything to aws you go right ahead and do that.
As for rising prices - I have no idea - no company has done anything until they do it the first time. AWS doesn’t really need to increase prices to be a more expensive solution for the vast majority of companies using it.
If you don’t rely on proprietary aws “solutions” in the first place, there’s no extra “time and cost” involved. It’s just running your setup process - whatever that may be - with another location, another vendor, whatever.
Like I said if you want real HA you’re going to be running in multiple vendors all the time - it’s not something you’re going to say “well shit aws is down again let’s go sign up for azure”.
You’re going to sit back and eat crumpets because your site is running fine in spite of aws or azure or whoever’s latest brown pants event.
And to be clear I’m not suggesting using aws is a smart move over bare metal or even just regular rented virtual machines at a normal facility - the concept of HA across vendors applies the same.
How many companies need higher reliability than you get from any of the cloud vendors if you architect your site to across multiple AZ’s or multiple regions?
And “running your setup” means duplicating functionality on VMs where you could use managed offerings - the absolutely most expensive and least reliable way of using cloud providers and it costs more in time and resources to manage.
And you are ignoring how much money you can save by not needing as many infrastructure people.
Heck, half the time you can get away with having a much cheaper shared services/managed service provider.
Are all of the companies big and small who are using cloud vendors proprietary solutions delusional?
Of course traditional colo and rental VM hosting have outages. That's literally why I said, multiple times, if you want actual HA, you need to be using multiple vendors, regardless of what that vendor provides you - whether it's bare racks or a web GUI to "push a button to make it go now". I didn't explicitly state it, but I kind of assumed you'd realise that means different vendors in different physical DCs/locations.
Complaining to me that using basic VMs in a "cloud" service is more expensive, is like complaining to a duck that water is wet. No shit, EC2 is more expensive than even a regular rented VPS/VM service from a more 'traditional' hosting service, and much more expensive than either renting or owning physical gear in a rack.
I didn't suggest you use EC2 or AWS at all - but just because you use self-managed services doesn't mean you can't take advantage of the one thing a "cloud" service offers which traditional VM hosting doesn't: essentially instant spin up and time usage billing.
If you want to split your workload across two or three cloud providers, and run resources split such that you have just slightly more than 100% of the resources you need for regular operations, and then when (not if) one of those providers has an outage, you increase the capacity at the other provider(s) to handle the increased load it'll handle.
I'm not even going to dignify the "we don't need infra people" comment because it's not even a bad joke any more, it's more like a warning of management who have no fucking idea what is involved.
I don't know what motivations each company has when deciding what technologies they should use. But if you're suggesting that companies don't ever make bad choices because of (a) uninformed/misinformed management decisions, or (b) short sightedness, I'll kindly suggest you're either being very sarcastic, or you're very naive.
I didn’t say that you didn’t need any I said that you didn’t need “as many”. But yes, at smaller companies you can get away with no dedicated infrastructure people and just use a managed service provider. At a slightly larger company you can get away with a few people on site that manage your MSP.
So you want HA by running in multiple DCs - Exactly what happens when you run in multiple AZs and/or regions.
But if you're suggesting that companies don't ever make bad choices because of (a) uninformed/misinformed management decisions, or (b) short sightedness, I'll kindly suggest you're either being very sarcastic, or you're very naive.
So you think, Netflix for instance, who started off running all of their own servers and now are AWS biggest customers were being “naive”? Instead of thinking that all of these companies are being irrational - including major enterprises - by using cloud providers and their proprietary servers, maybe they know something that you don’t know?
.... You're either not reading what I wrote or being deliberately obtuse. I said multiple vendors, in different DCs. The same vendor in two DCs is not as good as two different vendors in two different DCs.
I didn't say the companies are naive. I said you are being naive, if you think companies haven't made bad decisions.
It very much does change what they need to know. You don’t need to know how to set up a database with multi region failover, load balancers, server maintenance, switches, routers, storage, firewalls, etc. Have you ever used managed services at any scale?
Yes I’ve done both - hosted our own servers in house.
For most small to medium companies, the alternative to a managed AWS service is not "lets go buy some switches".
It's "let's use open source software on rented virtual machines". The "cloud" model is only useful if your staff have no idea how a database server works. If they do, it's going to make a heap of basic tasks harder (and more expensive) because you don't have access to the software itself.
I'm done discussing this with you. You can make all the same arguments everyone else does when trying to justify "the cloud", and you won't convince me, because your arguments are, as usual for this type of "discussion" comparing against the most extreme alternatives.
Right from the start you've declared literally no cost of being at the complete mercy of a single vendor for your entire infrastructure (and one with a history for dirty tactics to "win" a market)
If that approach works for you, good for you. I, and my clients once it's bitten them, aren't willing to do that.
So now, the same people who don’t have the expertise to manage a colo, are now going all of the sudden have the expertise to manage VMs and open source alternatives and know how to manage a fault tolerant multi region database and other HA setups at multiple colos?
Again, how much experience do you personally have with actually using cloud services from the big three? I’ve done both, I had to. The cloud vendors didn’t exist when I started. Heck we had a “server room” with raised floors for our “massive” 2TB SAN.
In theory (based on the cost of an instance vs the cost to get your own) AWS is grossly overpriced, and SOMEONE will eventually beat them by a very large margin for commodity services.
AWS knows this, and probably is behind all its custom features.
AWS is the new IBM.
As far as IBM, if someone had chosen to get “locked-in” to IBM in the 70s, they could still buy new hardware that could run their old software unmodified. Isn’t that an argument for using AWS?
In the scale of decades, most of the time you will be performing heavy rewrites anyway, are you really trying to optimize now just in case in 20 years you might want to move to something else?
Are you using the same language and frameworks you were using 20 years ago? When I first started developing I was writing C programs on DEC VAX, Stratus VOS mainframes.
If you are just using AWS to host a bunch of EC2 instances and as an overpriced colo, sure you could find much cheaper options now.
No matter what you do, more than likely you will have to migrate. Just changing infrastructure if all you’re doing is moving to VMs on another provider, reconfiguring your network if you are using a hybrid, etc is going to be a heavy lift no matter how much you try to avoid lock in and you’re spending money now for an amorphous future where you may want to change vendors instead of using your vendor of choices features that can save you money and/or time.
I would make the same argument if we were talking about Azure or God forbid Oracle.
So you get a lot of "addict reasononing".
AWS would have to step up open source contributions to match users' and they in fact do the opposite: they are extreme leeches compared to Google and Microsoft.
They can probably support their product line atop this just fine. They really make their money on integrating cassandra and a bunch of other big data frameworks in one neat little package.
I don't know of any company "killed" by AWS, and I don't think that Datastax will be either.
I know at least three founders who had to shut down after AWS launched a feature at re:invent.
One went on to start another company that was acquired by Facebook in an 8 figure deal.
Write request: units $1.45 per million write request units
Read request: units $0.29 per million read request units
Storage: $0.30 per GB-month
Dynamo DB On Demand:
Write request: units $1.25 per million write request units
Read request: units $0.25 per million read request units
Storage: $0.25 per GB-month
- per-cell timestamps (this is IMPORTANT for online data migration with no downtime)?
- can you choose compaction strategy?
- access to sstables for rapid data loads/custom backups?
- how will upgrades work?
- are they using rocksandra or other techniques?
- how about a scylladb option?
However, I think the real reason is not even that. I suspect underneath it uses modified version of DynamoDB and it's just wire compatible with Cassandra. That's how they are offering pay as you go pricing and auto-scaling.
Considering the kind of resources Apache Cassandra requires to run, I don't think they can offer this kind of pricing. This is the same company that charges $0.2 per hour to run a K8s cluster.
> Considering the kind of resources Apache Cassandra requires to run
Makes no sense. There is plenty of talent at AWS to run Cassandra proper on AWS for less money than they're currently charging for Dynamo. This was true before this announcement - it's how people like Instaclustr stay in business (containerized cassandra as a service on EC2).
The new Amazon Managed Apache Cassandra Service does use the open source Apache Cassandra code. The team is excited about working with the development community on Cassandra. See more at https://aws.amazon.com/blogs/opensource/contributing-cassand...
Can we access the sstables of our tables?
Can we do triggers and UDF?
Can we access the timestamps at a CELL LEVEL like cassandra proper (VERY important for online/downtimeless data migration)
Also maintainability I am assuming. ScyllaDB is written by a team of people experts in writing high performance low level code, and I am not sure it is easy for others to fork it easily. Cassandra being written in java and being a much more normal application (no DPDK), might be easier to fork.
We have lots of developers that write code that's similar to how ScyllaDB is built. Lots of DPDK. There are many other practical reasons why a fork should never be the first choice when building a service that you have to sustain for customers basically forever. Especially if there's active, high velocity development going on.
Mostly that your believing Scylla marketing literature and paid benchmarks over what people actually use and experience. Theres thousands of large Cassandra deployments and a hand full of Scylla ones and it's not because people didn't do their research. If they did to ScyllaDB though it will probably kill the product, with Cassandra this is more of a positive for the community.
It started like gangbusters, but I get the feeling the dev money ran out because the feature rollout has completely stalled.
In some ways, we will implement features similarly to Cassandra but subtly and importantly different. Example: We do both local and global secondary indexes (Cassandra only has local). Also, our CQL LIKE function is closer in implementation to SQL LIKE than to Cassandra's. And so on.
btw: ScyllaDB just got $25M in new funding in September 2019. We're doing fine financially.
I don't think google allowed checkins of AGPL to their repo but maybe this has all changed.
It would also be nice if they documented any differences from Apache Cassandra, like if Amazon MCS improves secondary indexes so they can be used in more cases.
If they do have better tech for that, I also hope they can solve the mat view problems in cassandra 3.x
- Feature sets aren't the same yet
- History / maturity
- Actual savings dont tend to match proposed savings
- Development isn't driven by a startup that AWS can kill with an announcement like this.
Otherwise no, use ScyllaDB instead for much better performance, automatic tuning, minimal maintenance, and better feature implementations like truly scalable global secondary indexes and materialized views.
It’s not a startup-driven product that is worried about AWS as a competitor.
MCS has a very similar pricing model to on-demand mode DynamoDB (https://aws.amazon.com/dynamodb/pricing/on-demand/) but is ~15% more expensive on all line items.
Anyway, dynamodb is insanely expensive compared to postgresql and I'd recommend only using it when you need to expose a datastore directly to the user as it's attribute based access control is easy to use. Postgresql serverless is now a thing as well. Anyway, if you truely want a WAN datastore, exposed to web clients, with attribute based access control, run dynamo.