Hacker News new | past | comments | ask | show | jobs | submit login
The DynamoDB Book: Data Modeling with NoSQL and DynamoDB (dynamodbbook.com)
245 points by abd12 14 days ago | hide | past | web | favorite | 110 comments

I bought the book, I read the book, I've used DynamoDB for awhile. It didn't change my mind. DynamoDB makes tradeoffs in order to run at massive scale, but scale isn't a problem many people need solving when 2TB of RAM fits in a single box. Meanwhile I need to handle eventual consistency, an analytics pipeline, another database for fuzzy search, another geo lookup database, Lambda functions to do aggregations, and a pile of custom code. All while giving up tooling so readily available for the RDBMS world.

In a world where Opex is much higher than Capex DynamoDB might make sense, but for me server costs are 5% of dev costs. And even if it works from a cost perspective, how many AWS services have the console experience ruined by DynamoDB? The UI tricks you into thinking its a data table with sortable columns, but no! DynamoDB limitations strike again and you are off on a journey of endless paging. The cost savings come at the expense of the user.

DynamoDB also isn't fast. 20ms for a query isn't fast, 30ms for an insert isn't fast. Yes its amazingly consistent and faster than other systems holding 500TB, but that isn't a use case for many users.

Others are comparing DynamoDB to Redis and Cassandra. It has additional limitations. These are fairly clearly spelled out but maybe weren't highlighted as prominently a few years back. (I say that because I inherited an application that made heavy use of DynamoDB but turned out not to be a great fit for DDB.)

- It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.

- You can store a maximum 400 KB data in one row.

- You can get a maximum of 1 MB data returned in a single query.

So it's mostly good for high-data-throughput applications, and then only if your high data throughput consists of large numbers of small records, processed a few at a time. This surely describes an important class of workloads. You may suffer if your workload isn't in this class.

Another annoyance is that (in my experience) one of the most common errors you will encounter is ProvisionedThroughputExceededException, when your workload changes faster than the auto-scaling. Until last year you couldn't test this scenario offline with the DynamoDB Local service because DynamoDB Local didn't implement capacity limits.

> - It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.

That is _infuriating_

It's documented, but it is so surprising when you first hit it. Sometimes, empty values have semantics attached to them, I don't want to scrub them out.

I think that basically describes DDB in a nutshell. You think it’s like Mongo, only on AWS, but then you slowly find out it’s much, much worse (for most use cases that you’d go with Mongo).

Note that Cassandra has similar limitations with data/throughput, but they aren't enforced or documented (because they depend on your particular setup) and your queries just fail or worse make all queries to the same node in the cluster fail (fun times with large wide rows).

The rich data types in Dynamo are quite strange, since they're basically useless for querying I'm not sure why you would use them. Maybe I'm missing something...

The rich data types are also useful for partial updates, like adding items to a set, updating some fields in a map etc.

That does sound useful.

So I can have a key and an associated map and update a few members of the map?

I would like to learn how to do that.

Yep! Here's the AWS documentation about this feature: https://docs.aws.amazon.com/amazondynamodb/latest/developerg....

The rich data types can be useful for filtering, I guess? When you are running a query against a certain hash key and want the first record (sorted by range key) that meets a condition placed on a nested property of a map or whose "tags" property is a string set containing a certain member, for example.

Regarding the empty string and binary values: https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-dy...

(Disclosure: I work for AWS on DynamoDB and on this)

If you treat DynamoDB as a DBMS, you’re going to be disappointed (for the reasons you mention). But if you think of it as a highly-durable immediately-consistent btree in the cloud, it’s amazing. DynamoDB is closer to Redis than MySQL. Amazon does it a disservice by putting it in the databases category.

The indexes are not immediately consistent.

Its not just that it is put in the database category, but that its champions at AWS make statements like "if you are utilising RDBMS you are living in the past", or that "there are very few use cases to choose Postgres over DynamoDB".

Btw, loved your AWS book!.

DynamoDb is like redis without the fun data structures, the fantastic cli and discoverability, the usefull configurable tradeoff between fast and consistent, and really much-needed features s.a listing your keys.

So my quarantine project is building a Redis API on top of DynamoDB - https://github.com/sudhirj/redimo.go

Daniel, I'm a big fan of yours but disagree with this take :).

It's definitely a database. The modeling principles are different, and you won't get some of the niceties you get with a RDBMS, but it still allows for flexible querying and more.

S3 is not a database, but DynamoDB is :).

S3 and DDB are incredibly similar. Their fundamental operators are the same: key-value get/put and ordered list, and their consistency is roughly the same.

What differentiates DDB and S3 the most is cost and performance.

They're both highly-durable primitive data structures in the cloud, with a few extra features attached.

If you think of them as incredibly similar then you are likely not making very good use of them.

For example consistency is not "roughly the same" with DynamoDB supporting strongly consistent and atomic operations, and atomic update operations.

S3 is also immediately consistent unless you’re updating an existing object or listing objects.

I was one of the top users by volume of both products when I worked at AWS.

I agree. DynamoDB is like a serverless child of Redis and MongoDB.

> If you treat DynamoDB as a DBMS, you’re going to be disappointed

Did you mean to say "as a RDBMS"? Because I don't see how it's not a DBMS.

No, I meant a DBMS. Almost all the things you'd expect in a DBMS are not there. Take a look at most of the comments in this thread. Everyone expecting DBMS things, and complaining that they're not there.

Fair enough! I think that's a reasonable position.

IMO, there are two times you should absolutely default to DynamoDB:

- Very high scale workloads, due to its scaling characteristics

- Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.

You can use DynamoDB for almost all OLTP workloads, but outside of those two categories, I won't fault you for choosing an RDBMS.

Agree that DynamoDB isn't _blazing_ fast. It's more that it's extremely consistent. You're going to get ~10 millisecond response times when you have 1GB of data or when you have 10 TB of data, and that's pretty attractive.

Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.

If you can use Aurora Serverless, the Data API makes sense for lambda.


True! I'm not a huge fan of Aurora Serverless and the Data API. The scaling for Aurora Serverless is slow enough that it's not really serverless, IMO. And the Data API adds a good bit of latency and has a non-standard request & response format, so it's hard to use with existing libraries. But it's definitely an option for those that want Lambda + RDBMS.

The RDS Proxy is _hopefully_ a better option in this regard but still early.

Differing opinion - I think RDS Proxy is the wrong approach. Adding an additional fixed cost service to enable lambda seems like an indicator of a bad architecture. In this case the better approach would likely be to just use a Fargate container which would have a similar cost and fewer moving parts.

By the time you pay a fixed cost for the proxy on top of what you already pay for the RDS server, it'd be a far simpler architecture with less moving parts to just run a Fargate container (or better yet, AWS would offer a Google Cloud Run competitor)

The Data API, while still rough around the edges, at least keeps the solution more "serverless-y". Over time it should get easier to work with as tooling improves. At the very least, it won't be more difficult to work with than DynamoDB was initially with it's different paradigm.

For services that truly require consistently low latency, lambda shouldn't be used anyway, so the added latency of the data api shouldn't be a big deal IMO.

For those reasons, I view the RDS Proxy as an ugly stopgap that enables poor architecture, whereas the Data API actually enables something new, and potentially better. So I'd much rather AWS double down on it and quickly add some improvements.

I agree completely. We have APIs that are both used by our website and our external customers (we sell our API for our customers to integrate with their websites and mobile apps) and for batch loads for internal use.

We deploy our APIs to Fargate for low, predictable latency for our customers and to Lambda [1] which handles scaling up like crazy and scaling down to 0 for internal use but where latency isn’t a concern.

Our pipeline deploys to both.

[1] As far as being “locked into lambda”, that’s not a concern. With API Gateway “proxy integration” you just add three or four lines of code to your Node/Express, C#/WebAPI, Python/Flask code and you can deploy your code as is to lambda. It’s just a separate entry point.



Yup, that's exactly how I recommend clients to write lambdas for API purposes... Such a great balance of getting per request pricing while retaining all existing tooling for building APIs

For an internal API with one or two endpoints, I‘ll do things the native Lambda way. Your standard frameworks are heavy when all you need to do is respond to one or two events and you can do your own routing, use APIGW and API Key for authorization, etc.

There is also a threshold between “A group of developers will be developing this API and type safety would be nice so let’s use C#” and “I can write and debug this entire 20-50 line thing in the web console in Python, configure it using the GUI, and export a SAM CloudFormation template for our deployment pipeline.”

> By the time you pay a fixed cost for the proxy on top of what you already pay for the RDS server, it'd be a far simpler architecture with less moving parts to just run a Fargate container

A lot of people want to use lambda (or serverless) even so. So AWS is just accommodating their wishes.

We can’t use Aurora Serverless even in our non Prod environments because we have workflows that involving importing and exporting data to and from S3. But really, our Aurora servers in those environments are so small that most of our costs are storage.

Not to mention the same also applies for load. You get about 10ms at 10, 1000 or 1000000 requests per second, again irrespective of how much data you have.

There's a third use: if you want a free ride, AWS free tier for DynamoDB is quite nice, enough to run a decent dynamic website.

Especially combined with the always free tier of lambda....

> Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.

This is only true for AWS. Azure functions share resources and don't have this issue.

The speed is actually quite sad. Its 5-10x slower than my other databases at p95, and I can't throw money at the problem on the write side. Reads I can use DAX, but then there goes consistency.

Good point! I would usually not recommend using a database from a different cloud provider just because of different hassles around permissions, connections, etc.

I've never found the speed an issue, but YMMV. To me, the best thing is that you won't see speed degradation as you scale. With a relational database, your joins will get slower and slower as the size of your database grows. With DynamoDB, it's basically the same at 1GB as it is at 10TB.

RDS maxes out RAM at 768GiB, if we're comparing managed to managed.

If you're approaching that point, you already are going to need an analytics pipeline, a search DB, etc, because maintaining ever growing indices will kill your latency. You probably can get away with aggregations for a bit longer, but if the number of rows you aggregate is growing too, eventually you will need to come up with something and the way you do that with Dynamo off a stream isn't a bad way to go about it with MySql either.

Looking at the tables I have access to, they all come under 5ms for both read/write. This is the same ballpark as our MySql apps for similar style queries (i.e. not aggegrations).

Sadly my favorite reason to use Dynamo is political, not technical. Since it somehow is not classified as a database at my company, the DBAs don't 'own' it. So I don't have to wait 2-3 months for them to manually configure something.

Conway's law strikes again.

> RDS maxes out RAM at 768GiB

RDS goes 4TB on X1e instance type. But the point is RDBMS systems handle a large amount of data and workload types before needing to reach for specialist systems

I don't know how you are doing write transactions in 5ms on DynamoDB. Single puts p50 maybe, but i've never seen p90 put operations below 10ms.

Yeah, p50, all use cases we have are single row updates.

Haha, I love that story at the end. I promise not to tell your company that it is a database.

>but scale isn't a problem many people need solving when 2TB of RAM fits in a single box.

What's the price of that on the cloud? I know I can run crazy big tables on DynamoDB for a couple of dollars. I don't know what 1 month of a relational database with 2TB of RAM costs on the cloud, but I am pretty sure I can't afford it.

We used to run Riak for Dynamo like workloads very efficiently, 30ms p50 insertion time.

Very disappointed to find the top comment is about DynamoDB and not Alex and his wonderful book. I suppose this is par for the course with HN. I hope nothing I create ever ends up posted here.

You are disappointed the comment is about the subject of the book from someone who read it and didn’t find it a compelling read? I didn’t like the book because it continued a trend in mia-selling DynamoDB and my comment reflects that frustration. Sorry but not every review is going to be glowing, and I’m certainly not going to make personal comments about the author.

6TB fits in a single box

Waves Author here. Happy to answer any questions folks have about the book, about DynamoDB, or about self-publishing.

NoSQL modeling is waaay different than relational modeling. I think a lot of NoSQL advice out there is pretty bad, which results in people dismissing the technology altogether. I've been working with DynamoDB for a few years now, and there's no way I'll go back.

The book has been available for about a month now, and I've been pretty happy with the reception. Strong support from Rick Houlihan (AWS DynamoDB wizard) and a lot of other folks at AWS.

You can get a free preview by signing up at the landing page. If you buy and don't like it, there's a full money-back guarantee with no questions asked. Also, if you're having income problems due to COVID, hit me up and we'll make something work :)

Anyhow, hit me up with questions!

EDIT: Added a coupon code for folks hearing about the book here. Use the code "HACKERNEWS" to save $20 on Basic, $30 on Plus, or $50 on Premium. :)

The biggest problem I'm aware of with DynamoDB is the hot key / partition issue[1]. Throughout is distributed evenly across nodes, you can't control how many nodes you have, so you always have a node that's hot either temporarily or permanently and so you end up having to over provision all your nodes to be able to handle that hot case, which ends up costing far more than alternatives. What's your take on this? This is the chief reason I avoid DynamoDB, which in theory would be a good fit for some of my problems.

[1] https://syslog.ravelin.com/you-probably-shouldnt-use-dynamod...

As of a couple years ago, DynamoDB will redistribute throughput between shards based on usage [1], so in theory this should eliminate the hot shard problem. I haven't had a chance to test this in practice, if anybody has hands-on experience I'd love to hear it.

You also finally have a way of identifying hot keys with the terribly named CloudWatch Contributor Insights for DynamoDB. [2]

For exceptional use cases, you also have the option of On-Demand Capacity to pay for what you use and not worry about capacity at all. [3]

[1] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

[2] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

[3] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

That sounds like the problem had been solved and my information is just out of date now. Maybe I should give DynamoDB another look now.

With instant adaptive capacity, I think quite a few hot key issues are mitigated.


luhn responded to this one pretty well :)

Basically, most of these issues are gone. As long as you don't have extreme skew in your partition keys, you don't need to worry about throughput limits.

Bought the book, thank you!

What was your approach to self-publishing here? What tools did you use? If I wanted to publish a book but knew nothing about it, what resources should I read and what approach would you recommend?

Thank you for your support!

The biggest advice I can give you is not about any specific tool, it's about an approach. You need to think about how you will market the book if you're self-publishing.

Engage with the community that will be interested in the book. Write articles, help out on Twitter, write code libraries, etc.

For me, I wrote DynamoDBGuide.com two and a half years ago over Christmas break. I wanted to just make an easier introduction to DynamoDB after I watched Rick Houlihan's talk at re:Invent (which is awesome).

That led to other opportunities and to me being seen as an 'expert' (even when I wasn't!). I got more questions and spent more time on DynamoDB to the point where I started to know more. I gave a few talks, etc.

I finally decided to do a book and set up a landing page and mailing list. I basically followed the playbook that Adam Wathan described for his first book launch.[0] Write in public, release sample chapters, engage with people, etc.

In terms of tooling, I used AsciiDoc to generate the book and Gumroad to sell. On a 1-10 scale, I'd give AsciiDoc a 5 and Gumroad an 8. But the tooling barely matters -- think about how to find the people that are interested :)

Happy to answer any other questions, either in public or via email.

[0] - https://adamwathan.me/the-book-launch-that-let-me-quit-my-jo...

Just bought the book. I've been working at AWS and using DynamoDB for years now, but I'm sure there are things I could be doing better. I love that you've dedicated attention to analytics and operations too.

Thank you! I really appreciate it :) Hit me up if you have any questions!

Honest question: would you say "NoSQL modeling is way more restrictive, labor intensive and painful, but in turn gives you consistent performance as you scale" is a fair characterization?

I'd been sort-of considering buying this for a while and the coupon made me pull the trigger. Thanks!

Any chance for a Kindle friendly mobi?

Yep! It comes with PDF, MOBI, and EPUB formats :)

and there's no way I'll go back.

err.. back to what?

After using Dynamo for 2 years now the biggest problem I’ve seen thus far is the pretty extreme expectations it puts on your application code to manage things that have traditionally been considered the responsibility of the data store. We found it was a bit onerous to ensure all facets of modeling/validation/indexing were into consideration when writing that layer of the application. To address the constant bootstrapping you either end up with a crap ton of utilities that form indexes or create updateExpression strings, etc, or you end up constantly reinventing the wheel.

The JS landscape for Dynamo is a bit bare, notable options all largely ignore the indexing principles that are the real draw of Dynamo. This heartburn caused me to sit down and write a library myself (https://github.com/tywalch/electrodb) that allows you focus on the models and relationships while taking care of all the little pitfalls and “hacky” tricks inherent in single table design.

Alex’s book covers all these things and I honestly wish I had had it sooner before having to learn via foot shooting. It’s pricey but if you have a need for Dynamo on your project it really pays off knowing you’re swimming with the current, and Alex definitely gets you there.

I bought this a few weeks ago and am about 130 pages in.

It is just stunning how much better it is learning Dynamo/NoSQL in general from this than effectively any other source. Anyone who's had to rely on AWS docs knows how face-meltingly dense they can be.

I went back and refactored all my previous Dynamo work last night, and the difference was night and day. I'm planning to migrate some relational structures later this week, as well.

Is good book.

What has this book taught you that could be applied outside DynamoDB? I'm close to buying but the price is kinda steep... if however I can take away some general NoSQL insight then I'm sold.

Edit: nevermind, I see another review elsewhere and the author replying. Though, your opinion would still be appreciated! :)

Thank you for the kind words! :) Glad you're liking it.

DynamoDB is very compelling for performance, scalability, and low ops overhead, but I recommend thinking very carefully about the limited transaction support before going with it, as it’s likely to be a dealbreaker for many use cases, whether or not you realize that up front. I think most apps will need a transaction involving more than 25 rows at some point, and with dynamo your only option is to fire them off in groups of 25 and hope none fail (plenty will at scale).

You can get many of the benefits of dynamo (sans auto-sharding), by applying its elegant indexing strategy to an sql database. It will be as fast or faster, your transactions can be as big as you need them to be, and you retain the ability to occasionally fire off un-indexed ad hoc queries for development or convenience. Running and scaling an sql db is also fairly painless these days with options like aurora.

Interesting, idk that I've ever needed a transaction with more than 25 rows.

But I agree in general about the limitations. Having used RDBMSes like Postgres a lot, as well used Cassandra and DynamoDB in production, I would almost certainly not create a new app with DynamoDB as the primary DB. Even if you have an app where you expect to need to scale writes heavily, it's not going to be on all tables equally. For instance, your users table, and related resources that are relatively small and grow linearly with your users, will probably fit fine in a Postgres DB for a very long time. And being able to have normalized models and powerful indexing and querying patterns available is a big benefit.

DynamoDB can work well for a specific sub-system that needs very high scalability. For instance, if you needed to store pairwise info between every user and product combination for some reason. Of if every user can upload a huge number of resources of some type (though the access patterns need to fit dynamodb's constraints, if these are documents or files of some type then another system like S3 or Elasticsearch would probably make more sense). Or if you're tracking advertising views by an advertising identifier or something. Or scraping and importing a bunch of data from other places. In some specific use-cases like this, the downsides vs an RDMS can be very minimal, and the built-in scalability can save you a ton of time vs having to constantly tune and potentially shard your RDBMS system.

But even in these cases, you might have better options depending on your access patterns. For instance if you don't ever need to refer to this data by reading it in an OLTP context, you might want to just write it to a log like Kafka to be ingested into Redshift or HDFS for offline processing or querying.

I can understand the sentiment and don't fault you for it.

That said, I think you can definitely handle complex, relational patterns in DynamoDB pretty easily. It will take some work to learn new modeling patterns, but it's absolutely doable.

At what cost, though? RDBMS do provide a better querying interface with fewer pitfalls that invariably one encounters trying to fit a square peg in a round hole.

That said, I've seen people use DynamoDB as a timeseries database (modelling multi-dimensonal data with z-indices on top of the two dimensional partition-key and range-key pair), so it is definitely possible to be clever and practical at the same time.

Disclaimer: ex-AWS.

Agreed. This is a limitation we ran into trying to implement a critical accounting ledger on top of DynamoDB. The transaction model we came up with is formally verified w/ TLA+. We're turning our work into a product: txlayer.com

> This is a limitation we ran into trying to implement a critical accounting ledger on top of DynamoDB.

Sounds like a perfect use case for a traditional RDBMS. Why Dynamo?

A fair question, and we’ve done it this way before at simple.com. When looking at the options for a pay-per use database, with global replication, streams and managed for you; We felt that if you could build a ledger on dynamo for these use cases, it’d be pretty compelling and fun.

> I think most apps will need a transaction involving more than 25 rows at some point

I ... can't think of a single time I've ever needed this.

A common one is cascading deletes when you delete a user or a ‘project’ or something else that has a lot of stuff associated with it. Those will exceed 25 rows very quickly. Also any kind of bulk update or data import... hell even just initializing a new account can easily require writing more than 25 rows in a moderately complex app.

Still, why the need to be transactional? An eventually consistent delete seems fine here.

Maybe it is, or maybe some of the rows are involved in a security check. Or could cause race conditions if out of sync. Or otherwise need immediate read-after-write consistency.

And eventually consistent isn't the worst case scenario. Being unable to rollback correctly from an error could mean you'll never end up in a consistent state... that's a lot worse than "eventually".

For my current serverless project I'm using Fauna which I think is a better option than Dynamo. You get relations, complex queries, etc. You also get authentication and authorization baked-in.

I haven't done any serious tests but I'd say on average my reads to Fauna from Cloudflare workers are 30ms. Seems a lot compared to querying a local instance of Postgres but since Fauna is distributed you end up getting much better latency on average for your worldwide users compared to a single DB in us-east-1.

Writes take longer (probably around 200-300ms on average) but considering these are replicated to all Fauna servers with ACID I'm ok with that.

I wrote a little intro to Fauna's query language which is very powerful if anyone is interested:


DynamoDB is monster scale but... tricky to use and difficult pricing model. The paying for writers / readers thing is strange to me and makes it difficult to scale up for bursts. I recommend not using this tech for most things. You need to know exactly why you want to use it and have a good reason.

I'd much rather pay for reads & writes directly rather than guessing at how my CPU and RAM will translate to the reads and writes that I need.

RDBMS capacity planning basically goes:

1. How much traffic will I get? 2. How much RAM & CPU will I need to handle the traffic from (1).

With DynamoDB, you can skip the second question.

> makes it difficult to scale up for bursts

Can you tell me why the On Demand mode doesnt work for you?

7x the cost. I find it interesting that the DynamoDB cheer squad points out most databases only run at 10-15% utilisation and are burning money every hour. In the next breath they suggest running on demand "till it hurts" and paying AWS as if they were running at 15% utilisation.

I recommend On-Demand pricing 'until it hurts'[0], but that's because a ton of people I talk to are spending <$50/month on DynamoDB. At that point, it really doesn't make sense to spend hours of time optimizing your DynamoDB bill.

If you are at the point where you are spending over thousands of dollars a month on DynamoDB, then it does make sense to review your usage, fine-tune your capacity, set up auto-scaling, buy reserved capacity, etc. But don't waste your time doing that to save $14 a month. There are better things to do.

But it's really nice to have a database where you can set up pay-per-use, don't have to think about exhausting your resources, and have an option to back out into a cheaper billing mode if it does get expensive.

[0] - Hat tip to Jared Short for this advice & phrase

You need to build exponential-backoff logic into your system to handle waiting for Dynamo to warm up. It doesn't happen instantly.

You need that in provisioned in case of overload, too, right?

> While your relational database queries slow down as your data grows, DynamoDB keeps on going. It is designed to handle large, complex workloads without melting down.

I mean- hand a person a gun, and they might shoot themselves in the foot. While you can make bad queries/workloads for a relational database, you can just as easily make bad workloads for DynamoDB.

My contention is that it's much easier to have an access pattern that won't scale in a relational database than in DynamoDB. DynamoDB basically removes all the things that can prevent you from scaling (JOINs, large aggregations, unbounded queries, fuzzy-search).

This is underrated, but it's really helpful. So many times w/ a relational database, I've had to tweak queries or access patterns over time as response times degrade. DynamoDB basically doesn't have that unless you really screw something up.

So what is the cost of doing a bit of query tuning and de-norming every now and then compared to the development costs imposed by DynamoDB?

It depends!

For me, I like that 98% of DynamoDB work is frontloaded. I spend the time building the model but once it's done -- set it and forget it.

With RDBMS, it's like there's a hidden 5% tax that's lurking at all times. You have to spend time tuning querying, reshaping data, changing patterns, etc. It can add up to significant drag over time.

Different teams might think the costs are different for their application, or they may be fine with one pattern over the other. Fine with me! I just know which one I choose now :)

What I like in DDB is TTL. It is a fantastic feature. I read someone comparing it with Redis. Redis is faster because of TCP connectivity, whereas DDB is over HTTP.

This looks like a great resource. One thing I'm struggling with is the ability to sort and filter and was wondering if the book goes into detail about this topic.

If I have a person entity and its attributes listed out in a table. How would you go about sorting by first name, last name, created at, etc... I was thinking of streaming everything over to elastic search, but that would add extra complexity to maintain.

Yep! There are entire chapters on sorting & filtering. Note: it's different than in a relational database, but it's doable :)

Awesome! Glad to hear that there's a section on that. Quick question. I'm thinking of leveraging elasticsearch for the fulltext search capabilities. Is the work to get sorting on various different attributes heavy from a dev perspective and is there any advantages of doing it through dynamo rather than querying with elasticsearch?

I work a little outside the standard startup hyper-scale, fast growing business, so forgive my question.

But how widely used is DynamoDB? And for what use cases?

And what are the problems with it?

In a nutshell:

- It was designed for super high scale use cases (think Amazon.com retail on Cyber Monday). It has decent adoption there. Competes mostly with Cassandra or other similar tools.

- With the introduction of AWS Lambda, it got more adoption in the 'serverless' ecosystem because of how well its connection model, provisioning model, and billing model works with Lambda. RDBMS doesn't work as well here.

A lot of people find 'problems' with it because they try to use it like a relational database, which it most certainly isn't. You have to model differently and think about it differently. The book helps here :).

Well I'm a sucker for this kind of stuff- how do the videos work in the premium package? Do I get to download them for offline viewing?

Can some knowledge be transferred to other NoSQL flavours like mongo or is the book heavily specific about DynamoDB?

All the examples are specific to DynamoDB and use DynamoDB features.

That said, the principles apply pretty well to other popular NoSQL databases, especially MongoDB and Cassandra. There will be some slight differences -- MongoDB allows better nesting and querying on nested objects -- but it's broadly the same. If you want to model NoSQL for scale, you need to use these general patterns.

If you want to check it out but find out it doesn't work for you, just let me know. I've got a 100% money-back guarantee with no questions asked if you don't like it.

Wonder if anyone agrees that Uber's order processing can be handled by DynamoDB very well.

The book looks great, but being a startup, the price is hard to swallow for 20+ engineers.

Email me, and I'm happy to discuss :). alex@alexdebrie.com

Does anyone have book recommendations on NoSQL modeling in general?

Tbh I don't think that makes sense since it depends on what your definition of NoSQL is. Some people say 'no relations' others say 'no sql' others say 'eventual consistency'. Some people call FaunaDB NoSQL because it's distributed and scales yet it offers strong consistency and relations and hence normalized data and joins is an option.

In others, you might have relations but lose consistency, in others you might have relations but only keep consistency under specific conditions (sharding keys etc)

NoSQL modeling typically depends on the specific characteristics of the database. Essentially it's about looking at these, see what it doesn't offer, compare that with what you need, and find workarounds.

$79 for the basic package? A bit pricey if you ask me.

Fair enough! IMO, it's worth it :). You could spend a bunch of time cobbling together free resources, and you'd still only get about 30% of what's in the book. How much is your time worth as a software engineer?

That said, a few notes:

1. I added a coupon code ('HACKERNEWS') to knock $20 off Basic, $30 off Plus, and $50 off Premium.

2. If you're from a country where PPP makes this pretty expensive, hit me up. I'm happy to help.

3. If you're facing income challenges due to COVID-19, hit me up, I'm happy to help.

4. If this is unaffordable for any reason, hit me up, I'm happy to help. :)

Your book does an excellent job explaining the single-table design pattern of DynamoDB. This pattern literally saves you money. So at a certain point you will earn back the $79 from a lower AWS bill (plus your applications will be much faster!)

Thanks, Matthew! Appreciate it, and I agree with you :)

If your rate is low enough that you can learn everything that is in this book for $80 worth of your time, then sure. Price is relative, it's not like he's selling prescription drugs for $1000 per pill.

I bought it and have found it to be completely worth the money. I don't look at prices for these things in relation to how much other books cost but how much time it will save me.

Yeah I think that the medical comparison is a good idea.

We tend to criticize people for asking decent amount of money in our industry whereas people on others industries shamelessly ask for ludicrous amount of money even for pretty much anything (think medical or legal)

Alex was super-helpful to me. I had an edge-case problem using batch writes; the issue was assembling the batch in R to pass through the paws api, basically a bunch of really tricky nested lists, and I had one element out of place.

Alex answered my questions in such a way that I myself saw where the bug was in my code.

He saved me easily several hours of time.

At my hourly rate, this means that the book had a negative cost in my case.

I was able to repay the favor, I suggested an improvement to one code example in the book which Alex eagerly accepted.

Thanks for your support! I'm grateful for the fix you suggested as well :)

thanks for this. I just started creating my first DynamoDB database yesterday

Awesome! Hit me up if you have any questions :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact