Fundamentally, the problem seems to be that choosing a partitioning key that's appropriate for DynamoDB's operational properties is ... unlikely. In their own docs on choosing a partition key  they use "user ID" as an example of one with good uniformity, but in reality if you choose something like that, you're probably about to be in for a world of pain: in many systems big users can be 7+ orders of magnitude bigger than small users, so what initially looked like a respectable partitioning key turns out to be very lopsided.
As mentioned in the article, you can then try to increase throughput, but you won't have enough control over the newly provisioned capacity to really address the problem. You can massively overprovision, but then you're paying for a lot of capacity that's sitting idle, and even then sometimes it's not enough.
Your best bet is probably to choose a partition key that's perfectly uniformly distributed (like a random ID), but at that point you're designing your product around DynamoDB rather than vice versa, and you should probably wonder why you're not looking at alternatives.
I've been happy using DynamoDB as a large-scale caching layer, but even that only fits very specific use case criteria.
Every database system, at large scale, suffers from this problem.
But what's a better option for a distributed, managed database-as-a-service? Rolling your own does mean significant operational burden.
I wonder if you could do something like add a random number to each of your keys before hashing. It would increase your storage size by Nx, but it seems like that would spread your load as well.
Ultimately, we've decided that dynamodb was not really suited to our use case anyway, but I'd say that's mostly because our total data set was fairly tiny and we our writes were not evenly distributed over time, which is just at odds with the capacity model.
> DynamoDB uses the partition key value as
input to an internal hash function; the output from the hash function determines the partition where
the item is stored.
This is somewhat at odds with their top-level messaging which still pushes DynamoDB as the most scalable solution. And perhaps it is... there are some scalability limits to Aurora. Writes are bottlenecked by one instance. 64TB max. I think performance drops when you exceed the in memory cache. But those limits are still quite large.
Basically I sense some tension between the DynamoDB and Aurora teams and I wonder where this is all going to shake out in the long run.
Here's the full quote (I transcribed it so may contain errors):
"The one thing that surprised me is that there are some customers who are moving their nosql workload to aurora. There are two reasons for that. One, it’s a lot easier to use Aurora because it’s mysql compatible compared to nosql because the interfaces and transaction characteristics are so different. What is also interesting is people also saved money because the IO cost is much lower. In no SQL if you have a large table it gets partitioned then the IO gets partitioned across all the table partitions that you have. And if you have one partition that is hot then you have to provision based on the IO requirement of the hot partition. In the case of Aurora we do automatic heat management so we don’t have this hot parition issue. Second we don’t charge based on provisioned IO. It’s only the IO that you use. And that actually saves a lot of money. In this particular case this is a big social company, interaction company, I cannot tell the name, and they reduced their operational costs by 40% by moving from nosql to Aurora" 
Amazon has deep dive talks on DynamoDB on YouTube that go into lots of these details and how to avoid problems from them. It's not that different from understanding how to structure data for Cassandra, Mongo, etc. All the NoSQL systems require an understanding of how they work both to structure your data and ensure optimal performance.
For example, maybe one's consistency constraints are better met by DynamoDB instead of BigTable's varying consistency on different query types (the author of this article didn't address consistency at all). With DynamoDB, you can get strongly consistent reads and queries all the time .
Overall, it seems like a kind of weak reason to make the strong statement about "probably shouldn't be using DynamoDB". Maybe a better title would be "Understanding drawbacks of large datasets in DynamoDB". I do hope the author understands the consistency changes they may experience in BigTable, as it could easily require large changes to the code-base if strong consistency was assumed on all queries.
Edit: Fixed inconsistency in what the 10GB limit was referring to, not per table, but per node (shard) of the table.
It appears you're confusing Bigtable with Datastore (to be fair, Datastore is built on Megastore, which is built on Bigtable, so it's an understandable confusion), but let's be clear: Google Cloud Bigtable != Google Cloud Datastore.
The URL you cited about consistency models:
is entirely about Datastore, but you referred to it as Bigtable.
I can understand having to optimize your key space, but in this case it necessitates extreme premature optimization.
Edit: Sorry, I did mention 10GB originally in relation to a table. That was incorrect of course.
It seems like poor key space design.
What? Why? Suppose that's 5 million customers, you will only have a 10GB table which fits in a single DynamoDB shard, with no sharding. With the restriction of 1-5 operation per customer per second, this sounds like the ideal use case for DynamoDB.
What am I missing?
It's a problem with how Amazon calculates your capacity units as it scales up Dynamo for you, if you have a particularly large data set with a few very active users.
I don't buy it. I've used BigTable in the past and found it to be infuriating. Now, because it works for him, I'm supposed to believe that BigTable is right for me?
If we chat and decide that Bigtable is appropriate for your use case, maybe you'd be willing to give it another shot? :-)
Any DynamoDB tuning advice will say how important it is to have well distributed hash keys. As for the second part, why not use Cloud Spanner? I wish AWS had something like it.
Maybe I'm missing something, but as I understand it he did have well distributed hash keys (I'm assuming his customer ids were random UUIDs). The problem he had however was that he had so much data that a single custom could cause throughput exceptions on a very small amount of ops/s.
If some customers are more active than others in a given unit time, some customers have more data than others, and customers' data is keyed by customer ID then no, it's not a well distributed key.
article that outlines gotchas of Bigtable
article you probably shouldn’t use Bigtable
article the amount of money we could've saved using PG and not rewriting things 3 times.
anything up to 10-15TB there are very few reasons not to use something like PG
Whenever I read about NoSQL systems, I'm always left a unsure about its use-cases. I've only worked on systems where a traditional RDBMS made the most sense. How do you identify when it's appropriate to reach for one of the many NoSQL tools?
I've had the suspicion that many applications that leverage NoSQL tools are usually used in conjunction with a relational database, and not in isolation. Based on my limited understanding, I can at least wrap my head around a few ways in which this could probably help. Am I off the mark here? One of the points I struggle with is that once you're storing data across multiple data stores, maintaining data integrity becomes much harder.
Doing that with an RDBMS would be ... unpleasant. Doing it with Cassandra isn't trivial, but it's straightforward, cross-cloud capable, cross-DC HA, linear scaling, and tunable consistency.
Data integrity is a nonissue - when you're dealing with the scale nosql was meant to solve, you're probably smart enough to learn how to write multiple places safely.
Despite the flag on the post, this is a pretty good summary of why you might choose one of several: https://news.ycombinator.com/item?id=14697230
DynamoDB for us has given us very reliable latency for our datasets that are too large to hold in memory on an RDBMS.
Additionally, self-managed PG is a huge operational undertaking compared to DynamoDB. RDS is a bit closer, but the largest RDS offers is db.r3.8xlarge, which is 244GB.
The db.r3.8xlarge on RDS Postgres all in with 3000 provisioned IOPS (just provisioning for writes, all reads should be served from memory) and 1.5TB storage ends up around $6k. A 3 year reservation will get that down to $3k.
Comparable DynamoDB (3000 write units, 3000 read units, reserved IOPS) comes out to $1500.
The key here is consistent performance and low operational overhead. As I said, we're pretty happy with it.
Note those aren't really comparable numbers - postgres will often collapse writes from concurrent sessions / statements.
> You can get a box with 48TB RAM
Not with AWS. If you don't have AWS as a requirement, go for it.
An i3.16xlarge still won't hold a 20TB dataset in memory. You are going to see more variance in latency if you have a working dataset too big to fit into memory.
As far as getting superior performance out of an i3.16xlarge, that's fine if you have the expertise and resources to run PG yourself. However you're going to need replication, which will increase that cost. You're going to need failover mechanisms, backup, etc.
We have preferred RDS for Postgres because it gives us something operationally simple. We've found DynamoDB to be even simpler operationally and have more predictable performance. We have frequently considered running a self managed PG instance and have decided against it for our use case.
When presenting an alternative to DynamoDB, AWS is an implied part of presenting that alternative. While the original article does suggest an alternative outside of AWS, it properly qualifies it as a competitor offering. Regardless, I don't care what was and wasn't claimed. It's not important.
> You are seriously claiming DynamoDB will be holding full 20TB in RAM
No. I'm claiming that DynamoDB will have predictable and consistent latency characteristics at 20TB. The related claim is that an RDBMS will not have consistent latency characteristics if the working dataset does not fit in memory.
It seems that a lot of the qualms with various databases stem from a misunderstanding of their use cases. A lot of the features of major SQL databases, namely ACID, are misappropriated to be features of databases in general. This key misunderstanding seems to cause a lot of SWEs insane as they later realize that NoSQL DBs are not always Available and Strongly Consistent.
Responding to Author:
I wasn't totally convinced by the author's argument against DynamoDB. This article  offers a good solution to pretty much all of OP's problems. Most significantly, hashing user data using date.
While DynamoDB is certainly different from most other databases, that doesn't mean that there aren't sensical usage methodologies.
It's not a good choice for various read heavy enterprise apps (and frankly for write ones neither). It also doesn't scale - it doesn't even have a cross region active/active support (highly surprising for a key value store when cassandra supported it for ages).
Don't use it for anything serious.
On the topic of DynamoDB use cases, here are some of DynamoDB users describing their use cases:
- DataXu: "How DataXu scaled its System to handle billions of events with DynamoDB" -- https://youtu.be/lDiI0JMf_yQ?t=765 (from AWS re:Invent 2016)
- Lyft: “Lyft Easily Scales Up its Ride Location Tracking System with Amazon DynamoDB” --
https://www.youtube.com/watch?v=WlTbaPXj-jc. The story "AWS Helps Fuel Lyft's Expansion" in InformationWeek mentions Lyft's use of DynamoDB as well:
- Under Armour implemented cross-region replication using DynamoDB: https://youtu.be/NtaTC2Fq7Wo?t=699
- Amazon.com: Amazon Marketplace shares their story of migration to DynamoDB and why they did it: https://www.youtube.com/watch?v=gllNauRR8GM
Finally, DynamoDB was created because Amazon needed a highly reliable and scalable key/value database. As Werner Vogels said it in his blog post announcement of DynamoDB back in 2012 (http://www.allthingsdistributed.com/2012/01/amazon-dynamodb....):
"This non-relational, or NoSQL, database was targeted at use cases that were core to the Amazon ecommerce operation, such as the shopping cart and session service."
That Data Pipeline + EMR solution mentioned in the blog post (here is a better link for it: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGui...) has several drawbacks:
- too many moving parts, especially given the track record of EMR
- might not even be available when your requirement is to keep the data in the same AWS region as the DynamoDB table, as only five regions support Data Pipeline
The best approach I've seen so far is to use DynamoDB Streams and an AWS Lambda function to create incremental backups in an versioned S3-bucket. dynamodb-replicator (https://github.com/mapbox/dynamodb-replicator) implements that together with some scripts to do management tasks like back filling an S3 bucket with data which is already in DynamoDB or joining incremental backups into a single file.
It's still pretty unpolished and definitely needs some love, but I think it's the right approach.
I just don't understand people who complain about a technology and say it is useless for all of these use cases when they couldn't even spend a few hours to do a Spike/POC or some basic data domain design. MongoDB has very clear documentation about what you should or shouldn't use it for.
MongoDB is unique in that it is one of the few document stores available today. So if you have use cases such as 360 Customer View or where you need to fetch a lot of nested information with a single id it is blisteringly fast.
If you have a relational data model than use a relational database.
Because MongoDB's marketing sold it as the hot new datastore that made SQL legacy - "newer! faster! web scale!" not "only use this if your data isn't relational".
The reference documentation might be more accurate but the way it was publicised certainly wasn't.
Although I dislike MySQL for its many gotchas (data corruption level stuff too!) I was looking for a _long_ _long_ time for a high consistency NoSQL database.. we basically need document storage of large binary data.
Ironically literally nothing in NoSQL land does write-through to disk, they just write to vfs and hope it works; additionally, those that support clustering opt for eventual consistency.. That just baffles my mind, so much potential for lost or corrupted data.
We ended up doing deterministic sharding on postgresql, it worked incredibly well, even in failure modes you hope never to see.. and no corruptions! :D
That has always been the case. Why would it be so shocking to you? Atomic writes have already been built. Sounds like you were standing there with a hammer looking for a nail.
That's how you do things like we did regarding deterministic sharding- it's very easy to scale this way if you don't ever "JOIN" tables etc;
Technically PostgreSQL is a NoSQL database and it along with Cassandra for example have modes in which they are fully and atomically consistent.
HBase, Cassandra, MongoDB, Riak, Couchbase etc all write through to disk with proper fsync flushes. And I've never heard of any database that has a model where it writes to a virtual file system - whatever that even means.
Please provide some specific examples.
Cassandra: (fsync to WAL, not full fsync). https://wiki.apache.org/cassandra/Durability
MongoDB: ... too much wrong here to list, although I hear it's improving in being cluster aware etc.
Redis does support fsync as far as I remember but the write/delete pattern is incredibly sub-optimal, it runs basically out of a WAL by itself and runs very poorly if your dataset does not fit in memory.
0. Here are more details on that: http://hadoop-hbase.blogspot.de/2013/07/protected-hbase-agai...
1. By default HBase "flushes" the WAL. Flush here means to make sure that at least 3 machines have change in memory (NOT on disk). A datacenter power outage can lose data.
2. As HDFS closes a block it is not by default forced to disk. So as HBase rewrites old data during compactions by default, old data can be lost during a power outage. Again, by default.
3. HDFS should be configured with sync-on-close, so that old data is forced to disk upon compactions (and sync-behind-writes for performance)
4. HBase now has an option to force and a WAL edit (and all previous edits) to disk (that's what I added in said jira).
5. This is post is 4 years old for chrissake :)... Don't base decisions on 4 year old information.
HBase _is_ a database and it will keep your data safe. Unfortunately it requires some configuration and some knowledge.
Also, their consistency model is hidden and they've lost data before.. so not selling points.
Dynamo does not scale for this specific use case but I have used it successfully in production (at scale) with ZERO issues.
Dynamo is a key value sharded and zero operations* database that most applications and companies will benefit from IMHO.
is the hot key and evenly sending queries to nodes the only issues you concluded we should not use DynamoDB on?
Don't worry. You don't. And there will be many good reasons to refactor the architecture before you do.
We store user generated content in DynamoDB. Our largest table is over 1 TB. The mapping of users to their generated content is in Postgres. So doing work on behalf of a particular user will generally be distributed across multiple nodes.
We've enjoyed the very predictable performance of DynamoDB as well as the operational simplicity. We've started moving smaller datasets over and are using it more and more.
S3 does not do well (cost wise) with billions of files. Provisioning 1000 Write IOPS on Dynamo costs around $100/month. That's 2.5B writes. On S3 that's going to be on the order of $10k. Similarly 1000 Read IOPS on Dynamo is ~$20/month vs 2.5B reads on s3 costing $1000. Storage is cheaper on S3 by about 10x, but per TB that comes out to $250 vs $25, which is hardly a dominating cost factor.
I used simpledb once as part of project. All I needed it for was to store a small amount of state to be shared between some otherwise stateless ec2 instances. So, kind of the perfect use case.. Except I don't think it is made to scale down that small.
I don't know if I provisioned the capacity wrong or what, but it was so flaky. All I needed it to do was a few reads every few minutes.
I think my issue was I had the provisioned scalability set down to the smallest value I could because I only needed to pull down ~10 rows every 60 seconds or so.. But when the app would start and try to sync up, simpledb would throttle it?
I ended up having to just write a cron job to sync the db to a local json file, because doing the reads from inside my main loop would randomly fail or timeout.
Here are a few things that ended up being show stoppers.
1. Both the partition key and the sort key are capped at 1 field. In an attempt to "think Cassandra data model", the ugly workaround was to stringify and concatenate things at the application layer, then parse / split on the other side. This made the code unreadable.
2. DynamoDB-Spark integration is a second-class citizen. (Cassandra-Spark integration is first-class and well-maintained.)
3. The other thing that made code unreadable was the accidental complexity introduced by exception handling / exponential backoff we needed to implement to protect against accidental read capacity underprovisioning.
Although I made repeated pleas to switch to Cassandra, the (non-technical) CEO insisted that we keep using DynamoDB. I'm no longer at that company but I hear they have meanwhile switched to RedShift.
They're always hiring. You should apply and show them the error of their ways.
There are no hidden exponential backoffs in Cassandra in any of the hot query paths, period.
Suggesting that using nosql as a DB is broken is so ludicrous I'm not even going to try to refute it - you simply don't know what you're talking about.
I don't think we're getting the whole story from the author. I'm not the biggest fan of Dynamo either for reasons I won't get into here, but this type of workload is exactly what Dynamo was built for: serving a website with millions of customers.
Disclaimer: I work at Amazon, my views are my own.
Such an outage can sometimes be mitigated when operations are internal to a company; but with DynamoDB, one has to build internal failover or implement a complex cross-region replication scheme. The complexity and reliability starts to reduce the attractiveness of the hosted service in general.
(Of course, cloud vs. bare metal is a long-running debate, but I think I take the position that multitenant clouds are generally cheap enough that it's worth the cost for the extra feature velocity, if you're not truly at scale.)
Ended up switching to RethinkDB and haven't looked back - far better query syntax/language (REQL) and we can organise the JSON content just how we want it.
Cheaper active-active-active options exist that don't require manual dr failover drills and manual failback when regions inevitably crash.
You're not describing NoSQL. You're describing something like MongoDB specifically. You're right about the specific case, wrong in the general case.
> There is almost no case where an explicit distributed caching or queuing solution backed by a traditional DB isn't strictly better.
If you're looking at big enough data sets (think "Walmart scale", not "web scale") then you have to design your data store to cope with this volume. First, it has to be replicated to multiple servers to serve more queries and for reliability, and is an eventually consistent system a consequence (basically any time you use your read replica, or do anything on a multimaster configuration). You'll need something shardable or multimaster to achieve horizontally scalablity and you'll probably also lose your foreign key enforcement and uniqueness constraints when you do this, so you might as well go the distance and fully denormalise everything for consistent access times. (Have you ever had the Postgres query planner suddenly make something take 10x longer than it usually does because your data patterns changed and decided to use a different query plan than it's ever used before? Yeah, that's a world of fun. Now make that thing a time-critical daily batch job and really squirm.)
When you do the things to make this work, you're basically restricting yourself, abandoning database features. As a consequence, you are going to be in for a world of pain, period, sorry. One key advantage of good NoSQL software is that it doesn't give you those features to begin with, so it's harder to get it wrong. It also has data structures designed to make this sort of thing work well and be performant in ways that general purpose databases won't be, on account of being designed for general purposes.
> A single partition can hold approximately 10 GB of data, and can support a maximum of 3,000 read capacity units or 1,000 write capacity units.
DynamoDb will also split your data if you provision more then 3000 reads or 1000 writes. And the caveat is that it will not join back the shards if you later reduce the throughput back down. Instead, each shard will just get even less throughput then you might believe.
So say you have 4000 write capacity and 0 reads (for simplicity). DynamoDb will allocate 4 shards for it, each getting 1000 writes. Now say shard 2 gets too big, and goes above 10GB. DynamoDb will split it in two. Now you have 5 shards, but they won't all get 4000 / 5 = 800 writes. Instead, the original shard 1 3 and 4 will each still have 1000, and shard 2a and 2b will have 500 each. That's because when dynamodb splits a shard, it redistribute the throughput of its parent to the shards.
Looks like you're entirely correct: http://docs.aws.amazon.com/amazondynamodb/latest/developergu...
I wonder how many times I scrolled past the explanation without internalizing it.
If I understand the author's point, you _can_ get great performance, but it requires over provisioning. So it's not so much a technical failing as much as it is a poor monetisation of the capability.
I see that elsewhere in AWS: They require huge instance types for large EBS volumes with ES. For our workload, said instances barely break a sweat.
We don't use the auto incremented user ID's either. We create a hash of the user ID and some of the other data contained in the alert and it would appear we still have hot partitions.
DynamoDB is definitely not the silver bullet people hope it is but does work exceptionally well for what it was designed for.
Here are some of DynamoDB customers describing their use cases:
- Amazon.com: Amazon Marketplace shares their story of migration to DynamoDB: https://www.youtube.com/watch?v=gllNauRR8GM
1. Every shardable database (Cassandra, Dynamo, BigTable) has to worry about hot spots. Picking a UUID as a partition key is only step one. What happens if one user is a huge majority of your traffic? All of their reads/writes are going to a single partition and of course you are going to suffer from performance issues from that hot spot. It becomes important to further break down your partition into synthetic shards or break up your data by time (only keep a day of data per shard). BigTable does not innately solve this, they may deal better with a large partition but it will inevitably become a problem.
2. Some people are criticizing the choice of NoSQL citing the data size. Note you can have a small data size but have huge write traffic. An unsharded RDBMS will not scale well to this since you cannot distribute the writes across multiple nodes. Don't assume just because someone has a small data set they don't need to use NoSQL to deal with their volume
Yeah, but the issue with DynamoDB seems to be bursts of access triggering "throughput exceptions" caused a very static bandwidth allocation which is going down with the number of shards and not so graceful handling of overload situations.
It is imho. an anti-pattern to split up the bandwidth like they do. It negates the multiplexing gain for no good reason except a rigid control model.
/s is I hope self evident
Only seriously for my purposes, the use I get from the typical article on databases which I find is linked to from HN, is a reverse index to the better discussions, long after the discussion is off the front page here.
I'm merely a little confused about the fact that widespread consternation of the quality of geek journalism for programmers is not much more than merely a occasional mention or moan.
Genuinely is a good dose of cynicism in force, which is invisible to me?
I understand that when I see a story about science intended for a general audience, then HN is likely to become home to much greater detail and depth in discussion.
But with a subject line arguing generalised conclusion from a subject matter of database architecture?
Sometimes I think that I'm confused about whether I'm supposed to be confused about the point of the TFA.