Hacker News new | past | comments | ask | show | jobs | submit login
A Decade of Dynamo (allthingsdistributed.com)
107 points by werner on Oct 2, 2017 | hide | past | favorite | 54 comments



DynamoDB is amazing for the right applications if you very carefully understand its limitations.

Last year I built out a recommendation engine for my company; it worked well, but we wanted to make it real-time (a user would get recommendations from actions they made seconds ago, instead of hours or days ago). I planned a 4-6 week project to implement this and put it into production. Long story short: I learned about DynamoDB and built it out in a day of dev time (start to finish, including learning the ins and outs of DynamoDB). The whole project was in stable production within a week. There has been zero down time, the app has seamlessly scaled up ~10x with consistently low latency, and it all costs virtually (relatively) nothing.


This is the good side of Dynamo, and it's awesome that you've had that experience.

The flip side: Dynamo gets expensive and it gets expensive quick, and being a custom API (and, indeed, a very different way to think about datastores) makes migration difficult.

It's great to use, if you understand the tradeoffs. Just make sure you understand them before you make the leap.


>Dynamo gets expensive and it gets expensive quick,

DynamoDB's pricing scales sublinearly with volume; if it starts getting expensive it was an initial misuse of DynamoDB that got obvious with scale. There are a lot of factors that go into whether you should use DynamoDB and how you implement it. I recommend anyone who is considering using it very carefully understand this page first: http://docs.aws.amazon.com/amazondynamodb/latest/developergu...


This is how enterprise developers use the database, sometimes:

https://thedailywtf.com/articles/The-Query-of-Despair

You do this on your own server, slowness and bad performance are the result (but it may never, or very rarely get called). You do it on dynamo, a $10k bill may be the result.


That's not quite true. With DynamoDB, you provision capacity, so by default, it will simply get slow (throttled) if you use more than you expect.

The only way you could be "surprised" with a $10k bill is if you set up autoscaling for it with your upper limits (which it requires you to choose) high enough to reach $10k. And then you'd have to forget that you did that.


Formatting is obviously an issue but the query length isn't a sign for bad programming. I've created some queries that probably have nearly that length. They filter, join and prepare data in ways that would probably need as much LOC in other languages while running much faster (as the database server can better optimise for structures). In this case there obviously was a problem but it could've been that someone changed an index in one table which then affected the query.

But I agree, I also don't like running databases billed by load. The risk of costly bugs is just too high.


On the flip side tho, Dynamo can be cheap for small apps. The cost to provision an RDS instance with backup can easily be 10x to 20x what the same load would take on Dynamo.

Of course as you scale this an be less true, so it's all in the application.


I was at AWS from 2008 to 2016. Werner Vogels, Amazon's CTO (yep, not just AWS', but Amazon's, as he had to point out numerous times) has been one of the most talented, humble and generous senior exec I've ever met in my life.

Lots of good memories of time spent with him, and one of the sad aspects for me of leaving Amazon.

His blog writings are really interesting. If you haven't already, I suggest you search the archives, there are several hidden gems there.


I have always tremendously admired that guy; what a job he has! He is literally responsible for about half of the shit on the internet not going down (and also running the biggest internet shopping mall and making sure a bunch of crappy speakers can talk, but both of those are straightforward by comparison IMO).

Can you even imagine? I want to know what his direct report structure looks like.


He is an individual contributor


Interesting contrast to YCs path post from the other day ("Founder, Executive, Individual Contributor").


true luxury


I'm more interested in solutions like Spanner and Cockroach. Different tradeoffs for different applications, but they seem to be the most general purpose of the highly scalable databases. DynamoDB is cool and I've tried to adopt it for things, but it's surprisingly hard to imagine an application where the model isn't somewhat limiting. The capacity provisioning is also quite painful, which doesn't help matters any.


The databases you mentioned both have strong consistency, but do not have serverless pricing models. My employer FaunaDB has a similar consistency model, but a pay-as-you-go model that requires no provisioning or capacity planning.

You can read more about our ACID transactions here: https://fauna.com/blog/consistent-transactions-in-a-globally...


Pretty cool. Of course, the serverless version doesn't have too many regions yet, so some of the advantages of strong global consistency may be less useful. But I'll keep my eye on this regardless.


I think it is a worth noting that there is a difference between Dynamo and DynamoDB. One is a powerhouse academic publication that shook up modern computer science and one an enterprise tech product based upon Dynamo.

It is the 10th anniversary of Dynamo as a CS milestone.


I understand that, I'm mostly just replying to the "As we say at Amazon, it's just day one for DynamoDB" line. I do respect that Dynamo is certainly an achievement in thinking outside the box. But as for DynamoDB's place in the future of databases, I'm betting against it due to the lack of versatility for workloads that aren't purely non relational. Can't even do log style stuff due to the way sharding works :(


I'm also excited for Spanner and Cockroach, but even at a high level, DynamoDB is essentially requiring you to design how you're storing your data based on how you will access it, how it will be sharded, and how it will be pruned. That makes a lot of use cases tricky, but the end result is that if you design right, it will scale linearly quite well.

It feels like using a relational model (with SQL) is fundamentally different in that you're designing how you're storing your data based on how it relates to other data. It's certainly a whole lot easier to design for, but it makes me wonder if it is possible to close that gap in terms of "easy to design / SQL / vertically scalable" and "serverless / cheap / horizontally scalable".


Capacity provisioning was the most annoying aspect of using DynamoDB when I last worked with it, but in the last couple of months they've added autoscaling[1] which would seem to address that major bugbear (with the caveat that I haven't had reason to test it out yet ;)

[1]: https://aws.amazon.com/blogs/aws/new-auto-scaling-for-amazon...


I'd be interested to hear how others are handling read/write capacity configuration for dynamo. It seems like it would be very easy to hit the account limit of 10,000 units once you are querying any significant amount of data. I've also run into issues with auto scaling where you have to endure up to 15 minutes of downtime before the scaling kicks in [0]. Even on a table with ~2000 items I've found it becomes quite slow and costly to fetch data. Also the 25 item limit on batch writes makes it pretty frustrating to edit/delete lots of data.

- [0] https://hackernoon.com/the-problems-with-dynamodb-auto-scali...


You can request that limit be increased through the limit increase form.

Also, if you need to scan a ton of items to assemble your desired result, you should re-think using DynamoDB as a whole.


When they mention companies using DynamoDB, at least one of those actually uses their own implementation of Dynamo that they wrote to work around cost and performance limitations.

The main problems faced are not the ability to scale or reach performance benchmarks or keep data safe. They are operational, and primarily problems of infrastructure complexity and management. Oh, and having developers architect and manage the operations of a really freaking huge service is a bad idea. (No offense intended - those developers don't want to be woken up in the middle of the night either)


Direct link to the Dynamo Whitepaper PDF: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp...


I don't know why amazon is so taken with Dynamodb. I find it to be incredibly unintuitive and lacking real world application, requiring applications to perform gymnastics to work with it.


I've found just the opposite actually. While it's far from perfect, it has been amazing for rapidly standing up new apps (especially prototypes). We've used quite a few different strategies and found it to be flexible and performant.

The only downside is we do find ourselves sometimes implementing relational DB functionality at the application level to compensate for Dynamo DB's "flexibility." Postgres is still the go-to for data that is relational in nature. But man, letting Amazon worry about hosting and scaling is also pretty awesome...


>> we do find ourselves sometimes implementing relational DB functionality at the application level to compensate for Dynamo DB's "flexibility."

Yep, this is A-grade crazy, and exactly my point. I would question if it's "sometimes", or "actually almost all the time, now that we think about it, there's not much that we CAN do with DynamoDB without writing application level database functionality."


Prototypes are far bettered suited with Postgres or Mysql on RDS. When you don't know your schema or your use case upfront, traditional databases are far easier to work with, since you can change them. Once you know what your doing scaling up works far better on something like Dynamo or Cassandra, but you will be sacrificing dev time


Now that autoscaling is available for Dynamo DB, my main complaint with DynamoDB is the lack of out of the box backup solution that works at scale.

Production DB without backups is unthinkable. It just takes one human mistake to erase tons of data. Consistent and regular backups are must have for any production system.


> The Dynamo paper was well-received and served as a catalyst to create the category of distributed database technologies commonly known today as "NoSQL."

No, sorry, it was Memcached and Bigtable paper that popularized "NoSQL" term. Although there were many NoSQL databases tracing way back to 60s [1], those were the ones that "served as catalyst" for the term "NoSQL".

[1] http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.h...


Dynamo was certainly one of the products that spiked interest in the “NoSQL” datastore category.

The phrasing “served as a catalyst” seems right — it doesn’t imply the only catalyst.


Congrats to aws on the impact DynamoDb has had on the ecosystem & industry. The article does make it seems like DynamoDb was the first to publish a unique noSQL architecture. Is this true?


Lotus Notes was first. Distributed, replicated key value store.

Actually I'm not right: http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.h...


No, couchDb is older than DynamoDB, and according to wikipedia there's been nosql databases since the 60s


I would be interesting to read some comparisons between DynamoDb, CosmosDb and maybe Spanner.


Need a library sdk for a Dynamo Sync feature to allow easy development of offline mobile apps. Similar function to Cognito Sync. Also hope that AWS will release a serverless SQL db. And cheaper price.


> Also hope that AWS will release a serverless SQL db.

You mean like RDS?


RDS is serverful not serverless.


Oh that Dynamo. Not this Dynamo: http://dynamobim.org/


What is the machine in the photo?


A dynamo... ;)


amazing product


The thing doesn't even support a useable cross region replication. On top of that the whole read/write capacity is a joke (a painful one at that).

Other than a dirty js config or a prototype store this db is useless.


I would be very, very careful of calling anything a company of such very sharp people does a "joke."

One of my prior gigs was pushing a billion data points a day through DynamoDB without it breaking a sweat. We were paying for it, too--but it was there and it worked.


While I don't think Dynamo is a joke, pushing 1bn data points per day through a database system is not much. Given you have enough storage, a postgresql instance on a $50/month dedicated server can achieve that easily (from experience). You have to pay more attention to the data structure (unless you just put everything in jsonb columns) but will probably save 90% on operational costs.


Anything that can go to dynamo, can go to s3, especially at that volume. And you get proper multi region replication, read/write capacity based on actual usage and instant scaling.

I stay by my comment that dynamodb is a joke wrapped in thick layer of marketing crap.


> Anything that can go to dynamo, can go to s3, especially at that volume.

Please think very carefully before architecting your app with S3 as a makeshift-database. S3 would be a valid option if you don't care about millisecond latency; don't require safe updates; never expect your application to scale past 100 requests per second; and don't have multiple query patterns for the same data (unless you're okay with several redundant copies of the same dataset). Consider just about any other database solution if this does not hold true.


I am going to say that I did this, and will add "are willing to pay some insane amount of money to store and manipulate your data" (this was a mistake that cost me at least one if not two hundred thousand dollars).


Could you explain more please about what you mean?


If you think you can replace Dynamo with S3, your application didn't need Dynamo in the first place. There appears to be some severe Dunning-Kruger at work here.


You cannot replace DynamoDB with S3. S3 can not perform atomic and strongly consistent operations.

Edit: as other commenters have noted, you can perform a read after write on a new key.


S3 is atomic, but you are correct that it is not strongly consistent.


I am pretty sure S3 is atomic. That is, you can't get a transitory state. Your object either updated/PUT or it didn't AKA read after write consistency.


Only if you haven't already asked for that new key. :)


I don't know if your comment was just a troll or not, but we've build serious apps that use Dynamo DB as our store. It's been exactly what we needed, and we've got millions and millions of records across quite a few tables. There's certainly pros and cons to each database, but Dynamo can really shine if you take the time to understand it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: