In a world where Opex is much higher than Capex DynamoDB might make sense, but for me server costs are 5% of dev costs. And even if it works from a cost perspective, how many AWS services have the console experience ruined by DynamoDB? The UI tricks you into thinking its a data table with sortable columns, but no! DynamoDB limitations strike again and you are off on a journey of endless paging. The cost savings come at the expense of the user.
DynamoDB also isn't fast. 20ms for a query isn't fast, 30ms for an insert isn't fast. Yes its amazingly consistent and faster than other systems holding 500TB, but that isn't a use case for many users.
- It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.
- You can store a maximum 400 KB data in one row.
- You can get a maximum of 1 MB data returned in a single query.
So it's mostly good for high-data-throughput applications, and then only if your high data throughput consists of large numbers of small records, processed a few at a time. This surely describes an important class of workloads. You may suffer if your workload isn't in this class.
Another annoyance is that (in my experience) one of the most common errors you will encounter is ProvisionedThroughputExceededException, when your workload changes faster than the auto-scaling. Until last year you couldn't test this scenario offline with the DynamoDB Local service because DynamoDB Local didn't implement capacity limits.
That is _infuriating_
It's documented, but it is so surprising when you first hit it. Sometimes, empty values have semantics attached to them, I don't want to scrub them out.
The rich data types in Dynamo are quite strange, since they're basically useless for querying I'm not sure why you would use them. Maybe I'm missing something...
So I can have a key and an associated map and update a few members of the map?
I would like to learn how to do that.
(Disclosure: I work for AWS on DynamoDB and on this)
Its not just that it is put in the database category, but that its champions at AWS make statements like "if you are utilising RDBMS you are living in the past", or that "there are very few use cases to choose Postgres over DynamoDB".
Btw, loved your AWS book!.
It's definitely a database. The modeling principles are different, and you won't get some of the niceties you get with a RDBMS, but it still allows for flexible querying and more.
S3 is not a database, but DynamoDB is :).
What differentiates DDB and S3 the most is cost and performance.
They're both highly-durable primitive data structures in the cloud, with a few extra features attached.
For example consistency is not "roughly the same" with DynamoDB supporting strongly consistent and atomic operations, and atomic update operations.
I was one of the top users by volume of both products when I worked at AWS.
Did you mean to say "as a RDBMS"? Because I don't see how it's not a DBMS.
IMO, there are two times you should absolutely default to DynamoDB:
- Very high scale workloads, due to its scaling characteristics
- Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.
You can use DynamoDB for almost all OLTP workloads, but outside of those two categories, I won't fault you for choosing an RDBMS.
Agree that DynamoDB isn't _blazing_ fast. It's more that it's extremely consistent. You're going to get ~10 millisecond response times when you have 1GB of data or when you have 10 TB of data, and that's pretty attractive.
If you can use Aurora Serverless, the Data API makes sense for lambda.
The RDS Proxy is _hopefully_ a better option in this regard but still early.
By the time you pay a fixed cost for the proxy on top of what you already pay for the RDS server, it'd be a far simpler architecture with less moving parts to just run a Fargate container (or better yet, AWS would offer a Google Cloud Run competitor)
The Data API, while still rough around the edges, at least keeps the solution more "serverless-y". Over time it should get easier to work with as tooling improves. At the very least, it won't be more difficult to work with than DynamoDB was initially with it's different paradigm.
For services that truly require consistently low latency, lambda shouldn't be used anyway, so the added latency of the data api shouldn't be a big deal IMO.
For those reasons, I view the RDS Proxy as an ugly stopgap that enables poor architecture, whereas the Data API actually enables something new, and potentially better. So I'd much rather AWS double down on it and quickly add some improvements.
We deploy our APIs to Fargate for low, predictable latency for our customers and to Lambda  which handles scaling up like crazy and scaling down to 0 for internal use but where latency isn’t a concern.
Our pipeline deploys to both.
 As far as being “locked into lambda”, that’s not a concern. With API Gateway “proxy integration” you just add three or four lines of code to your Node/Express, C#/WebAPI, Python/Flask code and you can deploy your code as is to lambda. It’s just a separate entry point.
There is also a threshold between “A group of developers will be developing this API and type safety would be nice so let’s use C#” and “I can write and debug this entire 20-50 line thing in the web console in Python, configure it using the GUI, and export a SAM CloudFormation template for our deployment pipeline.”
A lot of people want to use lambda (or serverless) even so. So AWS is just accommodating their wishes.
This is only true for AWS. Azure functions share resources and don't have this issue.
The speed is actually quite sad. Its 5-10x slower than my other databases at p95, and I can't throw money at the problem on the write side. Reads I can use DAX, but then there goes consistency.
I've never found the speed an issue, but YMMV. To me, the best thing is that you won't see speed degradation as you scale. With a relational database, your joins will get slower and slower as the size of your database grows. With DynamoDB, it's basically the same at 1GB as it is at 10TB.
If you're approaching that point, you already are going to need an analytics pipeline, a search DB, etc, because maintaining ever growing indices will kill your latency. You probably can get away with aggregations for a bit longer, but if the number of rows you aggregate is growing too, eventually you will need to come up with something and the way you do that with Dynamo off a stream isn't a bad way to go about it with MySql either.
Looking at the tables I have access to, they all come under 5ms for both read/write. This is the same ballpark as our MySql apps for similar style queries (i.e. not aggegrations).
Sadly my favorite reason to use Dynamo is political, not technical. Since it somehow is not classified as a database at my company, the DBAs don't 'own' it. So I don't have to wait 2-3 months for them to manually configure something.
Conway's law strikes again.
RDS goes 4TB on X1e instance type. But the point is RDBMS systems handle a large amount of data and workload types before needing to reach for specialist systems
I don't know how you are doing write transactions in 5ms on DynamoDB. Single puts p50 maybe, but i've never seen p90 put operations below 10ms.
What's the price of that on the cloud? I know I can run crazy big tables on DynamoDB for a couple of dollars. I don't know what 1 month of a relational database with 2TB of RAM costs on the cloud, but I am pretty sure I can't afford it.
NoSQL modeling is waaay different than relational modeling. I think a lot of NoSQL advice out there is pretty bad, which results in people dismissing the technology altogether. I've been working with DynamoDB for a few years now, and there's no way I'll go back.
The book has been available for about a month now, and I've been pretty happy with the reception. Strong support from Rick Houlihan (AWS DynamoDB wizard) and a lot of other folks at AWS.
You can get a free preview by signing up at the landing page. If you buy and don't like it, there's a full money-back guarantee with no questions asked. Also, if you're having income problems due to COVID, hit me up and we'll make something work :)
Anyhow, hit me up with questions!
EDIT: Added a coupon code for folks hearing about the book here. Use the code "HACKERNEWS" to save $20 on Basic, $30 on Plus, or $50 on Premium. :)
You also finally have a way of identifying hot keys with the terribly named CloudWatch Contributor Insights for DynamoDB. 
For exceptional use cases, you also have the option of On-Demand Capacity to pay for what you use and not worry about capacity at all. 
Basically, most of these issues are gone. As long as you don't have extreme skew in your partition keys, you don't need to worry about throughput limits.
What was your approach to self-publishing here? What tools did you use? If I wanted to publish a book but knew nothing about it, what resources should I read and what approach would you recommend?
The biggest advice I can give you is not about any specific tool, it's about an approach. You need to think about how you will market the book if you're self-publishing.
Engage with the community that will be interested in the book. Write articles, help out on Twitter, write code libraries, etc.
For me, I wrote DynamoDBGuide.com two and a half years ago over Christmas break. I wanted to just make an easier introduction to DynamoDB after I watched Rick Houlihan's talk at re:Invent (which is awesome).
That led to other opportunities and to me being seen as an 'expert' (even when I wasn't!). I got more questions and spent more time on DynamoDB to the point where I started to know more. I gave a few talks, etc.
I finally decided to do a book and set up a landing page and mailing list. I basically followed the playbook that Adam Wathan described for his first book launch. Write in public, release sample chapters, engage with people, etc.
In terms of tooling, I used AsciiDoc to generate the book and Gumroad to sell. On a 1-10 scale, I'd give AsciiDoc a 5 and Gumroad an 8. But the tooling barely matters -- think about how to find the people that are interested :)
Happy to answer any other questions, either in public or via email.
 - https://adamwathan.me/the-book-launch-that-let-me-quit-my-jo...
err.. back to what?
The JS landscape for Dynamo is a bit bare, notable options all largely ignore the indexing principles that are the real draw of Dynamo. This heartburn caused me to sit down and write a library myself (https://github.com/tywalch/electrodb) that allows you focus on the models and relationships while taking care of all the little pitfalls and “hacky” tricks inherent in single table design.
Alex’s book covers all these things and I honestly wish I had had it sooner before having to learn via foot shooting. It’s pricey but if you have a need for Dynamo on your project it really pays off knowing you’re swimming with the current, and Alex definitely gets you there.
It is just stunning how much better it is learning Dynamo/NoSQL in general from this than effectively any other source. Anyone who's had to rely on AWS docs knows how face-meltingly dense they can be.
I went back and refactored all my previous Dynamo work last night, and the difference was night and day. I'm planning to migrate some relational structures later this week, as well.
Is good book.
Edit: nevermind, I see another review elsewhere and the author replying. Though, your opinion would still be appreciated! :)
You can get many of the benefits of dynamo (sans auto-sharding), by applying its elegant indexing strategy to an sql database. It will be as fast or faster, your transactions can be as big as you need them to be, and you retain the ability to occasionally fire off un-indexed ad hoc queries for development or convenience. Running and scaling an sql db is also fairly painless these days with options like aurora.
But I agree in general about the limitations. Having used RDBMSes like Postgres a lot, as well used Cassandra and DynamoDB in production, I would almost certainly not create a new app with DynamoDB as the primary DB. Even if you have an app where you expect to need to scale writes heavily, it's not going to be on all tables equally. For instance, your users table, and related resources that are relatively small and grow linearly with your users, will probably fit fine in a Postgres DB for a very long time. And being able to have normalized models and powerful indexing and querying patterns available is a big benefit.
DynamoDB can work well for a specific sub-system that needs very high scalability. For instance, if you needed to store pairwise info between every user and product combination for some reason. Of if every user can upload a huge number of resources of some type (though the access patterns need to fit dynamodb's constraints, if these are documents or files of some type then another system like S3 or Elasticsearch would probably make more sense). Or if you're tracking advertising views by an advertising identifier or something. Or scraping and importing a bunch of data from other places. In some specific use-cases like this, the downsides vs an RDMS can be very minimal, and the built-in scalability can save you a ton of time vs having to constantly tune and potentially shard your RDBMS system.
But even in these cases, you might have better options depending on your access patterns. For instance if you don't ever need to refer to this data by reading it in an OLTP context, you might want to just write it to a log like Kafka to be ingested into Redshift or HDFS for offline processing or querying.
That said, I think you can definitely handle complex, relational patterns in DynamoDB pretty easily. It will take some work to learn new modeling patterns, but it's absolutely doable.
That said, I've seen people use DynamoDB as a timeseries database (modelling multi-dimensonal data with z-indices on top of the two dimensional partition-key and range-key pair), so it is definitely possible to be clever and practical at the same time.
Sounds like a perfect use case for a traditional RDBMS. Why Dynamo?
I ... can't think of a single time I've ever needed this.
And eventually consistent isn't the worst case scenario. Being unable to rollback correctly from an error could mean you'll never end up in a consistent state... that's a lot worse than "eventually".
I haven't done any serious tests but I'd say on average my reads to Fauna from Cloudflare workers are 30ms. Seems a lot compared to querying a local instance of Postgres but since Fauna is distributed you end up getting much better latency on average for your worldwide users compared to a single DB in us-east-1.
Writes take longer (probably around 200-300ms on average) but considering these are replicated to all Fauna servers with ACID I'm ok with that.
I wrote a little intro to Fauna's query language which is very powerful if anyone is interested:
RDBMS capacity planning basically goes:
1. How much traffic will I get?
2. How much RAM & CPU will I need to handle the traffic from (1).
With DynamoDB, you can skip the second question.
Can you tell me why the On Demand mode doesnt work for you?
If you are at the point where you are spending over thousands of dollars a month on DynamoDB, then it does make sense to review your usage, fine-tune your capacity, set up auto-scaling, buy reserved capacity, etc. But don't waste your time doing that to save $14 a month. There are better things to do.
But it's really nice to have a database where you can set up pay-per-use, don't have to think about exhausting your resources, and have an option to back out into a cheaper billing mode if it does get expensive.
 - Hat tip to Jared Short for this advice & phrase
I mean- hand a person a gun, and they might shoot themselves in the foot. While you can make bad queries/workloads for a relational database, you can just as easily make bad workloads for DynamoDB.
This is underrated, but it's really helpful. So many times w/ a relational database, I've had to tweak queries or access patterns over time as response times degrade. DynamoDB basically doesn't have that unless you really screw something up.
For me, I like that 98% of DynamoDB work is frontloaded. I spend the time building the model but once it's done -- set it and forget it.
With RDBMS, it's like there's a hidden 5% tax that's lurking at all times. You have to spend time tuning querying, reshaping data, changing patterns, etc. It can add up to significant drag over time.
Different teams might think the costs are different for their application, or they may be fine with one pattern over the other. Fine with me! I just know which one I choose now :)
If I have a person entity and its attributes listed out in a table. How would you go about sorting by first name, last name, created at, etc... I was thinking of streaming everything over to elastic search, but that would add extra complexity to maintain.
But how widely used is DynamoDB? And for what use cases?
And what are the problems with it?
- It was designed for super high scale use cases (think Amazon.com retail on Cyber Monday). It has decent adoption there. Competes mostly with Cassandra or other similar tools.
- With the introduction of AWS Lambda, it got more adoption in the 'serverless' ecosystem because of how well its connection model, provisioning model, and billing model works with Lambda. RDBMS doesn't work as well here.
A lot of people find 'problems' with it because they try to use it like a relational database, which it most certainly isn't. You have to model differently and think about it differently. The book helps here :).
That said, the principles apply pretty well to other popular NoSQL databases, especially MongoDB and Cassandra. There will be some slight differences -- MongoDB allows better nesting and querying on nested objects -- but it's broadly the same. If you want to model NoSQL for scale, you need to use these general patterns.
If you want to check it out but find out it doesn't work for you, just let me know. I've got a 100% money-back guarantee with no questions asked if you don't like it.
In others, you might have relations but lose consistency, in others you might have relations but only keep consistency under specific conditions (sharding keys etc)
NoSQL modeling typically depends on the specific characteristics of the database. Essentially it's about looking at these, see what it doesn't offer, compare that with what you need, and find workarounds.
That said, a few notes:
1. I added a coupon code ('HACKERNEWS') to knock $20 off Basic, $30 off Plus, and $50 off Premium.
2. If you're from a country where PPP makes this pretty expensive, hit me up. I'm happy to help.
3. If you're facing income challenges due to COVID-19, hit me up, I'm happy to help.
4. If this is unaffordable for any reason, hit me up, I'm happy to help. :)
I bought it and have found it to be completely worth the money. I don't look at prices for these things in relation to how much other books cost but how much time it will save me.
We tend to criticize people for asking decent amount of money in our industry whereas people on others industries shamelessly ask for ludicrous amount of money even for pretty much anything (think medical or legal)
Alex answered my questions in such a way that I myself saw where the bug was in my code.
He saved me easily several hours of time.
At my hourly rate, this means that the book had a negative cost in my case.
I was able to repay the favor, I suggested an improvement to one code example in the book which Alex eagerly accepted.