Dynamo, Citus, and Tradeoffs in Distributed Databases

gdecandia · on May 12, 2017

Hi Folks, this is Pino (formally Giuseppe) de Candia. I was the lead developer on Dynamo. I'm happy to answer any questions/comments.

brandur · on May 12, 2017

Hi Pino, you mentioned in the article that Dynamo was built in response to an internal demand. Can you talk at all about the specifics of what sort of information Amazon puts in there?

I've overseen a few big Dynamo installations at this point, but for the most part I wouldn't store any "core" information in there (say like a customer, merchant, or product at Amazon) because after adding data to Dynamo, your ability to query and restructure it is quite limited. We've generally put data in that's append-only, is considerable in volume, and which we only ever need to retrieve in a few particular ways. To go back to the Amazon example, I probably wouldn't put a customer, merchant, or product in it, but I might put in a log entry for a purchase transaction.

What I'm trying to suss out is whether the sweet spot product like Citus is more akin to a Dynamo, or more akin to a traditional RDMS.

gdecandia · on May 12, 2017

Hi Brandur, I'll answer your final question first, then circle back. CitusDB is definitely more akin to a traditional RDMS that to Dynamo because it keeps almost all the goodness of relational databases while providing the ability to scale horizontally via clever partitioning, usually on the tenant and some other primary key (for the multi-tenant, B2B use-cases).

In contrast, Dynamo sacrificed all sorts of guarantees that relational database clients had relied on for decades - partly justifying your comments.

I left Amazon in 2009 so I can't speak to how Dynamo is used internally today (and AFAIK AWS DynamoDB is based on a different design). However, while I was at Amazon the shopping cart data was migrated to Dynamo. That was definitely "core" to the business in the sense that it was a key part of the user's experience and we couldn't lose the contents of a single shopping cart - it's a bad user experience. And for that use-case we could change the per-cart schema on write (with additional scans to clean up carts that were written to old schema versions).

sandGorgon · on May 12, 2017

> And for that use-case we could change the per-cart schema on write (with additional scans to clean up carts that were written to old schema versions).

why would cart schema change on a per-cart basis ? I'm kind of wondering what usecase would this solve ?

dantiberian · on May 13, 2017

I think the author means that different versions of the software would write different scheme versions, and it could be migrated in the background?

brandur · on May 12, 2017

Hah, thanks for being gentle on my Dynamo versus DynamoDB faux pas — that is some seriously confusing naming.

Very interesting though. Thanks.

novembermike · on May 12, 2017

Dynamo != DynamoDB.

WaxProlix · on May 12, 2017

Why no null or zero-length values?

gdecandia · on May 12, 2017

Frankly, it's been a while since I worked on Dynamo so I can't say for sure. I don't see a fundamental reason not to support null and zero-length. On the other hand it seems more like a convenience than a necessity, since you can encode those.

You may have different use-cases in mind, but it seems that if you're writing nulls you're using the key-value store to distinguish between keys that do or don't exist - basically storing a large set rather than a large map. And in that case you can write a small value to encode null and you won't change the value (rather you will delete the key-value entry).

I think this takes us to an API design discussion - I don't feel strongly about it. But I'd love to hear if that caused you significant burden or trouble in your application.

WaxProlix · on May 12, 2017

To be clear, I'm talking about nulls and zero-length strings/bytearrays as values, not as keys. I suppose it does boil down to API design though, yeah.

I guess I'm saying: Querying for something and getting null, "", or bytes("") has a different meaning and - crucially - a different type than undefined or DNE. Supporting these just seems like such a straightforward win in terms of language interop and usability that I was curious what the constraints and such were that caused the current state of affairs. If it's been a while, I certainly understand. Just something that I've wondered for a bit.

Thanks for your time and response either way.

z0r · on May 12, 2017

The author of the blog post commented here, but it appears to have been made dead. Was it possibly killed by a spam filter because it is their first post in 3 years of having a hacker news account?

dang · on May 12, 2017

Yes, but fortunately a user vouched for it. You can do that too: when you see a dead comment that shouldn't be dead, click on its timestamp to go to its page, then click 'vouch'. (There's a small karma threshold, currently 30, before such links appear.)

In the meantime we've marked Pino's account as legit so this won't happen again.

WaxProlix · on May 12, 2017

Are there repercussions or actions taken for people who vouch for trash in this way?

dang · on May 12, 2017

Yes. The idea is to vouch for good comments that shouldn't be dead. If people vouch for bad comments that should be dead (i.e. that violate the HN guidelines), eventually their vouches won't count. We err on the side of being generous about this, since obviously it's a matter of interpretation. But lines do need to be drawn.

In practice the community is surprisingly good at it. Of all the experiments we've tried, this one exceeded expectations the most, and we believe it has affected HN for the better in both obvious and subtle ways.

cookiecaper · on May 12, 2017

When this came out, they did say they would stop respecting vouches from people who abused them. I assume there's some algorithm that calculates the trustworthiness of each vouch.

The first several comments I vouched for were instantly undeadened. My vouches no longer have that effect (or, potentially, any effect). Either this is based on frequency and too many vouches dilutes the value of your vouch, or a moderator disagreed with one or more of my vouches and manually decreased/disabled it.

I would usually leave a comment explaining that I vouched and why. I think the last vouch that worked had an uncivil comment in the first sentence, but no one else raised the point he made, which I thought was critical to the discussion, so I replied saying that I vouched despite the uncivil sentence because I thought it was an important part of the discussion (and it did launch a lengthy subthread). [IIRC. Google can't seem to find that thread anymore, so maybe it got manually re-killed.]

On people whose comments didn't appear to violate the rules but were shadowbanned anyway, I'd vouch for them and usually tell them they were shadowbanned. Example at https://news.ycombinator.com/item?id=13382934 . I can only imagine that HN doesn't like it when someone tells a shadowbanned person that they've been shadowbanned, but I can't help but feel sad when I see someone putting out real, heartfelt comments without the awareness that they're not going anywhere.

I also do this out of sympathy, because I imagine that one day, I will also be shadowbanned for violating the SV orthodoxy (HN has previously "slowed" my account, imposing a [long] artificial lag time on each page; this was in place for years until I emailed requesting its removal, and the mods graciously agreed).

A few weeks ago I came across an account that had existed for several years and had been shadowbanned about six months back, making mostly-relevant and applicable comments the whole time, so HN doesn't seem to consider account track record if something goes beyond the pale. See https://news.ycombinator.com/item?id=14230352. I didn't end up vouching for his comment because it does run off into unnecessarily controversial territory, but at this point why not? I should just vouch for practically everything I don't like seeing deadened because it doesn't appear to matter anymore anyway.

If you don't want to lose your vouch, it seems that the safe route is to vouch only for obvious cases of mis-applied spam filters, like this one.

I think the vouch system may work better if each user is given an absolute number of tokens, e.g., one vouch per calendar year for each year of account age. That would discourage "excessive" vouching. It would also be better to call it "vote to unspam" if the sole intent is to bypass the autospam filter.

dang · on May 12, 2017

> I will also be shadowbanned for violating the SV orthodoxy

I'm sure you didn't mean it this way, but consider what morons we'd be if that were true. We don't ban people for such a thing, or know (or care) what it is.

I've restored your vouching privileges, but you probably overdid it at some point. Also, please try to keep the meta-commentary to a minimum? The reason we don't like people saying "@throwaway993 you are shadowbanned" is because of what it does to signal/noise ratio—that's one reason we created vouching in the first place. Meta is like styrofoam packing nuggets, if they grew like the Ghostbusters marshmallow.

cookiecaper · on May 12, 2017

>I've restored your vouching privileges, but you probably overdid it at some point.

Are there guidelines on the intended use that I missed? I pretty much just saw "Don't abuse it", which I didn't feel like I was doing. I also assumed that it worked similar to flagging and voting (weighted against other voters), and would've been more cautious if I knew there was normally a 1:1 between vouch and undeadening, which seems to be the case.

I appreciate the restoration, and I'm sure you're right that I overdid it. I will be judicious with it moving forward. Thanks for the restore.

>Also, please try to keep the meta-commentary to a minimum? The reason we don't like people saying "@throwaway993 you are shadowbanned" is because of what it does to signal/noise ratio—that's one reason we created vouching in the first place. Meta is like those styrofoam packing nuggets if they grew like the Ghostbusters marshmallow.

OK, I understand that position. And it is nice to know that "improper shadowban" is among the signals received by moderators when a vouch is made. I have other thoughts, but in the interest of keeping meta-discussion minimized, I'll leave it there. ;)