
The DynamoDB Book: Data Modeling with NoSQL and DynamoDB - abd12
https://www.dynamodbbook.com/
======
arpinum
I bought the book, I read the book, I've used DynamoDB for awhile. It didn't
change my mind. DynamoDB makes tradeoffs in order to run at massive scale, but
scale isn't a problem many people need solving when 2TB of RAM fits in a
single box. Meanwhile I need to handle eventual consistency, an analytics
pipeline, another database for fuzzy search, another geo lookup database,
Lambda functions to do aggregations, and a pile of custom code. All while
giving up tooling so readily available for the RDBMS world.

In a world where Opex is much higher than Capex DynamoDB might make sense, but
for me server costs are 5% of dev costs. And even if it works from a cost
perspective, how many AWS services have the console experience ruined by
DynamoDB? The UI tricks you into thinking its a data table with sortable
columns, but no! DynamoDB limitations strike again and you are off on a
journey of endless paging. The cost savings come at the expense of the user.

DynamoDB also isn't fast. 20ms for a query isn't fast, 30ms for an insert
isn't fast. Yes its amazingly consistent and faster than other systems holding
500TB, but that isn't a use case for many users.

~~~
DVassallo
If you treat DynamoDB as a DBMS, you’re going to be disappointed (for the
reasons you mention). But if you think of it as a highly-durable immediately-
consistent btree in the cloud, it’s amazing. DynamoDB is closer to Redis than
MySQL. Amazon does it a disservice by putting it in the databases category.

~~~
avip
DynamoDb is like redis without the fun data structures, the fantastic cli and
discoverability, the usefull configurable tradeoff between fast and
consistent, and really much-needed features s.a listing your keys.

~~~
sudhirj
So my quarantine project is building a Redis API on top of DynamoDB -
[https://github.com/sudhirj/redimo.go](https://github.com/sudhirj/redimo.go)

------
abd12
_Waves_ Author here. Happy to answer any questions folks have about the book,
about DynamoDB, or about self-publishing.

NoSQL modeling is waaay different than relational modeling. I think a lot of
NoSQL advice out there is pretty bad, which results in people dismissing the
technology altogether. I've been working with DynamoDB for a few years now,
and there's no way I'll go back.

The book has been available for about a month now, and I've been pretty happy
with the reception. Strong support from Rick Houlihan (AWS DynamoDB wizard)
and a lot of other folks at AWS.

You can get a free preview by signing up at the landing page. If you buy and
don't like it, there's a full money-back guarantee with no questions asked.
Also, if you're having income problems due to COVID, hit me up and we'll make
something work :)

Anyhow, hit me up with questions!

EDIT: Added a coupon code for folks hearing about the book here. Use the code
"HACKERNEWS" to save $20 on Basic, $30 on Plus, or $50 on Premium. :)

~~~
eloff
The biggest problem I'm aware of with DynamoDB is the hot key / partition
issue[1]. Throughout is distributed evenly across nodes, you can't control how
many nodes you have, so you always have a node that's hot either temporarily
or permanently and so you end up having to over provision all your nodes to be
able to handle that hot case, which ends up costing far more than
alternatives. What's your take on this? This is the chief reason I avoid
DynamoDB, which in theory would be a good fit for some of my problems.

[1] [https://syslog.ravelin.com/you-probably-shouldnt-use-
dynamod...](https://syslog.ravelin.com/you-probably-shouldnt-use-
dynamodb-89143c1287ca)

~~~
luhn
As of a couple years ago, DynamoDB will redistribute throughput between shards
based on usage [1], so in theory this should eliminate the hot shard problem.
I haven't had a chance to test this in practice, if anybody has hands-on
experience I'd love to hear it.

You also finally have a way of identifying hot keys with the terribly named
CloudWatch Contributor Insights for DynamoDB. [2]

For exceptional use cases, you also have the option of On-Demand Capacity to
pay for what you use and not worry about capacity at all. [3]

[1]
[https://docs.aws.amazon.com/amazondynamodb/latest/developerg...](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-
partition-key-design.html#bp-partition-key-partitions-adaptive)

[2]
[https://docs.aws.amazon.com/amazondynamodb/latest/developerg...](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/contributorinsights.html)

[3]
[https://docs.aws.amazon.com/amazondynamodb/latest/developerg...](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.OnDemand)

~~~
eloff
That sounds like the problem had been solved and my information is just out of
date now. Maybe I should give DynamoDB another look now.

------
tinkertamper
After using Dynamo for 2 years now the biggest problem I’ve seen thus far is
the pretty extreme expectations it puts on your application code to manage
things that have traditionally been considered the responsibility of the data
store. We found it was a bit onerous to ensure all facets of
modeling/validation/indexing were into consideration when writing that layer
of the application. To address the constant bootstrapping you either end up
with a crap ton of utilities that form indexes or create updateExpression
strings, etc, or you end up constantly reinventing the wheel.

The JS landscape for Dynamo is a bit bare, notable options all largely ignore
the indexing principles that are the real draw of Dynamo. This heartburn
caused me to sit down and write a library myself
([https://github.com/tywalch/electrodb](https://github.com/tywalch/electrodb))
that allows you focus on the models and relationships while taking care of all
the little pitfalls and “hacky” tricks inherent in single table design.

Alex’s book covers all these things and I honestly wish I had had it sooner
before having to learn via foot shooting. It’s pricey but if you have a need
for Dynamo on your project it really pays off knowing you’re swimming with the
current, and Alex definitely gets you there.

------
abarrettwilsdon
I bought this a few weeks ago and am about 130 pages in.

It is just stunning how much better it is learning Dynamo/NoSQL in general
from this than effectively any other source. Anyone who's had to rely on AWS
docs knows how face-meltingly dense they can be.

I went back and refactored all my previous Dynamo work last night, and the
difference was night and day. I'm planning to migrate some relational
structures later this week, as well.

Is good book.

~~~
TheSpiciestDev
What has this book taught you that could be applied outside DynamoDB? I'm
close to buying but the price is kinda steep... if however I can take away
some general NoSQL insight then I'm sold.

Edit: nevermind, I see another review elsewhere and the author replying.
Though, your opinion would still be appreciated! :)

------
danenania
DynamoDB is very compelling for performance, scalability, and low ops
overhead, but I recommend thinking _very_ carefully about the limited
transaction support before going with it, as it’s likely to be a dealbreaker
for many use cases, whether or not you realize that up front. I think most
apps will need a transaction involving more than 25 rows at some point, and
with dynamo your only option is to fire them off in groups of 25 and hope none
fail (plenty will at scale).

You can get many of the benefits of dynamo (sans auto-sharding), by applying
its elegant indexing strategy to an sql database. It will be as fast or
faster, your transactions can be as big as you need them to be, and you retain
the ability to occasionally fire off un-indexed ad hoc queries for development
or convenience. Running and scaling an sql db is also fairly painless these
days with options like aurora.

~~~
zemo
> I think most apps will need a transaction involving more than 25 rows at
> some point

I ... can't think of a single time I've ever needed this.

~~~
danenania
A common one is cascading deletes when you delete a user or a ‘project’ or
something else that has a lot of stuff associated with it. Those will exceed
25 rows very quickly. Also any kind of bulk update or data import... hell even
just initializing a new account can easily require writing more than 25 rows
in a moderately complex app.

~~~
the_reformation
Still, why the need to be transactional? An eventually consistent delete seems
fine here.

~~~
danenania
Maybe it is, or maybe some of the rows are involved in a security check. Or
could cause race conditions if out of sync. Or otherwise need immediate read-
after-write consistency.

And eventually consistent isn't the worst case scenario. Being unable to
rollback correctly from an error could mean you'll _never_ end up in a
consistent state... that's a lot worse than "eventually".

------
pier25
For my current serverless project I'm using Fauna which I think is a better
option than Dynamo. You get relations, complex queries, etc. You also get
authentication and authorization baked-in.

I haven't done any serious tests but I'd say on average my reads to Fauna from
Cloudflare workers are 30ms. Seems a lot compared to querying a local instance
of Postgres but since Fauna is distributed you end up getting much better
latency on average for your worldwide users compared to a single DB in us-
east-1.

Writes take longer (probably around 200-300ms on average) but considering
these are replicated to all Fauna servers with ACID I'm ok with that.

I wrote a little intro to Fauna's query language which is very powerful if
anyone is interested:

[https://github.com/PierBover/getting-started-fauna-db-
fql](https://github.com/PierBover/getting-started-fauna-db-fql)

------
seibelj
DynamoDB is monster scale but... tricky to use and difficult pricing model.
The paying for writers / readers thing is strange to me and makes it difficult
to scale up for bursts. I recommend not using this tech for most things. You
need to know exactly why you want to use it and have a good reason.

~~~
maerF0x0
> makes it difficult to scale up for bursts

Can you tell me why the On Demand mode doesnt work for you?

~~~
arpinum
7x the cost. I find it interesting that the DynamoDB cheer squad points out
most databases only run at 10-15% utilisation and are burning money every
hour. In the next breath they suggest running on demand "till it hurts" and
paying AWS as if they were running at 15% utilisation.

~~~
abd12
I recommend On-Demand pricing 'until it hurts'[0], but that's because a ton of
people I talk to are spending <$50/month on DynamoDB. At that point, it really
doesn't make sense to spend hours of time optimizing your DynamoDB bill.

If you are at the point where you are spending over thousands of dollars a
month on DynamoDB, then it does make sense to review your usage, fine-tune
your capacity, set up auto-scaling, buy reserved capacity, etc. But don't
waste your time doing that to save $14 a month. There are better things to do.

But it's really nice to have a database where you can set up pay-per-use,
don't have to think about exhausting your resources, _and_ have an option to
back out into a cheaper billing mode if it does get expensive.

[0] - Hat tip to Jared Short for this advice & phrase

------
Nican
> While your relational database queries slow down as your data grows,
> DynamoDB keeps on going. It is designed to handle large, complex workloads
> without melting down.

I mean- hand a person a gun, and they might shoot themselves in the foot.
While you can make bad queries/workloads for a relational database, you can
just as easily make bad workloads for DynamoDB.

~~~
abd12
My contention is that it's _much_ easier to have an access pattern that won't
scale in a relational database than in DynamoDB. DynamoDB basically removes
all the things that can prevent you from scaling (JOINs, large aggregations,
unbounded queries, fuzzy-search).

This is underrated, but it's really helpful. So many times w/ a relational
database, I've had to tweak queries or access patterns over time as response
times degrade. DynamoDB basically doesn't have that unless you really screw
something up.

~~~
jeremyjh
So what is the cost of doing a bit of query tuning and de-norming every now
and then compared to the development costs imposed by DynamoDB?

~~~
abd12
It depends!

For me, I like that 98% of DynamoDB work is frontloaded. I spend the time
building the model but once it's done -- set it and forget it.

With RDBMS, it's like there's a hidden 5% tax that's lurking at all times. You
have to spend time tuning querying, reshaping data, changing patterns, etc. It
can add up to significant drag over time.

Different teams might think the costs are different for their application, or
they may be fine with one pattern over the other. Fine with me! I just know
which one I choose now :)

------
the_arun
What I like in DDB is TTL. It is a fantastic feature. I read someone comparing
it with Redis. Redis is faster because of TCP connectivity, whereas DDB is
over HTTP.

------
raynguyen
This looks like a great resource. One thing I'm struggling with is the ability
to sort and filter and was wondering if the book goes into detail about this
topic.

If I have a person entity and its attributes listed out in a table. How would
you go about sorting by first name, last name, created at, etc... I was
thinking of streaming everything over to elastic search, but that would add
extra complexity to maintain.

~~~
abd12
Yep! There are entire chapters on sorting & filtering. Note: it's different
than in a relational database, but it's doable :)

~~~
raynguyen
Awesome! Glad to hear that there's a section on that. Quick question. I'm
thinking of leveraging elasticsearch for the fulltext search capabilities. Is
the work to get sorting on various different attributes heavy from a dev
perspective and is there any advantages of doing it through dynamo rather than
querying with elasticsearch?

------
siscia
I work a little outside the standard startup hyper-scale, fast growing
business, so forgive my question.

But how widely used is DynamoDB? And for what use cases?

And what are the problems with it?

~~~
abd12
In a nutshell:

\- It was designed for super high scale use cases (think Amazon.com retail on
Cyber Monday). It has decent adoption there. Competes mostly with Cassandra or
other similar tools.

\- With the introduction of AWS Lambda, it got more adoption in the
'serverless' ecosystem because of how well its connection model, provisioning
model, and billing model works with Lambda. RDBMS doesn't work as well here.

A lot of people find 'problems' with it because they try to use it like a
relational database, which it most certainly isn't. You have to model
differently and think about it differently. The book helps here :).

------
albatross13
Well I'm a sucker for this kind of stuff- how do the videos work in the
premium package? Do I get to download them for offline viewing?

------
agustif
Can some knowledge be transferred to other NoSQL flavours like mongo or is the
book heavily specific about DynamoDB?

~~~
abd12
All the examples are specific to DynamoDB and use DynamoDB features.

That said, the principles apply pretty well to other popular NoSQL databases,
especially MongoDB and Cassandra. There will be some slight differences --
MongoDB allows better nesting and querying on nested objects -- but it's
broadly the same. If you want to model NoSQL for scale, you need to use these
general patterns.

If you want to check it out but find out it doesn't work for you, just let me
know. I've got a 100% money-back guarantee with no questions asked if you
don't like it.

------
bangbig
Wonder if anyone agrees that Uber's order processing can be handled by
DynamoDB very well.

------
timebomb0
The book looks great, but being a startup, the price is hard to swallow for
20+ engineers.

~~~
abd12
Email me, and I'm happy to discuss :). alex@alexdebrie.com

------
haolez
Does anyone have book recommendations on NoSQL modeling in general?

~~~
databrecht
Tbh I don't think that makes sense since it depends on what your definition of
NoSQL is. Some people say 'no relations' others say 'no sql' others say
'eventual consistency'. Some people call FaunaDB NoSQL because it's
distributed and scales yet it offers strong consistency and relations and
hence normalized data and joins is an option.

In others, you might have relations but lose consistency, in others you might
have relations but only keep consistency under specific conditions (sharding
keys etc)

NoSQL modeling typically depends on the specific characteristics of the
database. Essentially it's about looking at these, see what it doesn't offer,
compare that with what you need, and find workarounds.

------
Niccizero
$79 for the basic package? A bit pricey if you ask me.

~~~
cityzen
If your rate is low enough that you can learn everything that is in this book
for $80 worth of your time, then sure. Price is relative, it's not like he's
selling prescription drugs for $1000 per pill.

I bought it and have found it to be completely worth the money. I don't look
at prices for these things in relation to how much other books cost but how
much time it will save me.

~~~
znpy
Yeah I think that the medical comparison is a good idea.

We tend to criticize people for asking decent amount of money in our industry
whereas people on others industries shamelessly ask for ludicrous amount of
money even for pretty much anything (think medical or legal)

------
djstein
thanks for this. I just started creating my first DynamoDB database yesterday

~~~
abd12
Awesome! Hit me up if you have any questions :)

