Blueprint for a distributed multi-region IAM with Go and CockroachDB

nosequel · on Aug 8, 2023

This isn't the typical 1000 word, "here's how we did it, now use our thing" company fluff blog post. What a great writeup. Sometimes reading docs, it is hard to figure out the fine details when making a decision. Your comparison of Regional Tables, Regional By Row Tables, and Global Tables is a really nice summary of the pros & cons of each. Well done.

aeneas_ory · on Aug 8, 2023

Thank you, I appreciate that feedback because this was the explicit goal of writing the article: Informing that multi region is no longer just a vision for companies that aren’t Google; Sharing how difficult it is; And some of the learnings made along the way!

Personally, I am extremely proud of the work. I believe that in a year or two, most companies will adopt multi region IAM (hopefully from Ory as we’re currently the only ones capable of this). :)

And what could be better than hearing these kind words from the critical readers on HN :)

Cheers!

leoqa · on Aug 8, 2023

I suspect most business logic can handle 25ms for authz and that’s the right trade off. I think Google’s Zanzibar is also centralized but leverages extreme caching to get lower latencies?

I work on an IAM system that is sub-ms p99 for our authz checks, with policies and keys pushed to each network edge instead of running a centralized system. The biggest perf hits are crypto verification and logging to the fs. We fail-closed to last known policy state when we have partitions, data loss would imply the application service or datastore proxy is lost. We measure policy deploy times in minutes though, and it’s eventually consistent.

aeneas_ory · on Aug 8, 2023

One of Ory’s core competencies is permissions. We built the first Google Zanzibar implementation in the world and it’s part of Ory Network‘s global multi-region platform (https://github.com/ory/keto)

A push model is also valid if you’re heavy on policies and can accept eventual consistency. We will investigate how to generally push things to the edge (like we did with Ory Edge Sessions) or to cryptographic verification wherever staleness is acceptable.

By solving the primitives correctly in the beginning (with a multi region architecture) that job does become a lot easier, which is what we decided doing at Ory :)

leoqa · on Aug 8, 2023

My intuition is that offering this as a service you’re targeting business logic that can handle 25ms authz. We’re on the core path in a latency sensitive industry, and end up running many permissions checks at various layers for a single api call.

aeneas_ory · on Aug 8, 2023

Absolutely, having P99 of sub-ms is of course way more attractive than 25ms - with a SaaS offering you always have the network latency to the provider in the path, which is why multi region capabilities are so important for this case. But you’ll never beat systems where the decision can be made locally.

Have you any documentation on your approach publicly available? I‘d love to get some education and insights from other large scale authz systems! We have a couple of ideas such as running a local replica in our customer’s stack but nothing concrete yet.

leoqa · on Aug 8, 2023

It’s quite similar mechanically to this blog post about Uber’s policy framework: https://www.uber.com/blog/attribute-based-access-control-at-....

We have an additional scaling dimension though, as our permission model is richer and mutable by end users, therefore our policies are not uniform. For special hot-path services, we use symmetric keys to reduce latency further but that makes rotations complicated.

aeneas_ory · on Aug 8, 2023

Awesome, thank you for following up! Will give this a read before bed. Would love to understand the encryption pieces, as I very much get the need for frequent updates of permissions (typically append, only sometimes remove). If you ever happen to blog about it please let me know :)

jzelinskie · on Aug 8, 2023

>I think Google’s Zanzibar is also centralized but leverages extreme caching to get lower latencies?

That's correct. In a Zanzibar-like model, you have a global storage, but individual clusters in each datacenter/edge providing consistency-aware caching. This means p99 can be something like 25ms, but p95 or p50 is often FAR lower.

Disclosure: I'm a co-creator and maintainer of SpiceDB[0]

[0]: https://github.com/authzed/spicedb

leoqa · on Aug 8, 2023

I’ve watched y’all’s Papers We Love talk about Zanzibar and have recommended authzed to organizations bootstrapping permission modeling.

It’s been awhile, is the gist that Spanner’s coordinated clocks allow tighter consensus (i.e. faster writes) and caching provides read-my-write consistency?

jzelinskie · on Aug 8, 2023

Thanks for watching our presentation and recommending our solution.

Unfortunately, nothing is ever simple; comparing Spanner and CockroachDB is comparing apples to oranges. Two years ago, we wrote an article that details exactly how the differences matter in terms of a Zanzibar implementation[0], but I can give as short of a summary as possible: Spanner is linearizable and CockroachDB only guarantees external consistency for transactions that share rows. The post outlines how we workaround this and we've also more recently talked about how we've managed to scale that to 1M requests per second[1]. Our team focuses a lot on CockroachDB because we offer a permission systems that can span not only regions within a single cloud, but across various cloud providers. However, if you're all in on GCP, SpiceDB itself supports Cloud Spanner (which we also use in production for our GCP-only customers).

[0]: https://authzed.com/blog/prevent-newenemy-cockroachdb

[1]: https://authzed.com/blog/maximizing-cockroachdb-performance

audioheavy · on Aug 8, 2023

I can attest to that statement: comparing Spanner and CockroachDB is difficult. Spanner and Fauna (where I work) are more comparable (Fauna is based on Calvin, see [0]) since they both support strict serializability (in different ways). The article referenced here is excellent, and it highlights what we've seen from some customers: CockroachDB is (to say the least) a challenge to learn and adequately deploy, I've seen a few others that have a similar lessons-learned result. I'm glad, however, that highly consistent distributed databases provide value in these implementations. Although not OSS, Fauna is comparable and more turnkey (read: much less ops) than these options.

[0]: https://fauna.com/blog/distributed-consistency-at-scale-span...

KRAKRISMOTT · on Aug 9, 2023

What does GitHub use for their Authz? After inviting a user, there is no perceptible sync delay and they can start cloning the repo immediately.

tonyhb · on Aug 9, 2023

Not sure, but I'd guess they're using Vitesse (ie. Planetscale) which is honestly really fast and durable.

prpl · on Aug 8, 2023

not sure if it applies but depending on instance type I usually see pings in the .55ms range in a single AZ in AWS, cross-AZ pings higher (implying it is hard to be sub ms for many types of durable applications, especially if disk/S3 is involved)

magden · on Aug 8, 2023

When discussing GCP, the latency between AZs within the same region is approximately 5 ms. Thus, if you have a 3-node database cluster spanning 3 AZs (all within the same region), a transaction can be committed in the 5-10ms range using Raft. CockroachDB should handle this seamlessly.

However, if you're considering a multi-region setup, the latency will depend on the distance between the regions.That's why usually you define a preferred region (that stores primary copy of the records) or deploy in a geo-partitioned mode (when data is automatically pinned to configured regions).

maxpert · on Aug 8, 2023

One of the reasons I started writing Marmot (https://maxpert.github.io/marmot/) was for replicating bunch of tables across regions that were read heavy. I even used it for cache replication (because who cares if it’s a cache miss, but a hit will save me time and money). It’s hard to make such blue prints in early days of product, and by the time you hit a true growth almost everyone builds a custom solution for multi-region IAM.

aeneas_ory · on Aug 8, 2023

That is true! And the reason why we decided to build this and offer it to everyone - building multi-region IAM is incredibly difficult and expensive and typically not the core competency of an average software company.

Also, very interesting project. I love SQLite and what the community is contributing to it - yours included!

Sytten · on Aug 8, 2023

Good post, side remark our experience with kratos have been mixed while self hosting the solution. You can feel OSS is second class for them (lots of PR never getting merged, endless debates and little progress in code), it's OK its a business and they are not doing support contracts. Just know what you are getting into. Just my experience, might be different with other products.

vinckr · on Aug 8, 2023

Hey Sytten, I work in the community team for Ory. OSS is actually very important for us and definitely not second class. (see also https://www.ory.sh/docs/open-source/commitment)

The assumption that Ory does not offer support contracts for self-hosted Ory is wrong (although we did not in the past, when the team was smaller).

We are doing contracts for companies using our software self-hosted: See here and contact us if you are interested! https://www.ory.sh/support/ This way we can assign engineers to your case and work on any issues you encounter or work on any contributions or features required.

Ory releases all features for free for everyone to use. What is not free however is our time and work. To merge a PR/add a new feature/etc. a significant amount of time is needed to make sure the code lives up to standards, passes all tests, any security implications, etc. This depends on the feature of course, but the one you are alluding to is probably one of those. See the Code of Conduct on OSS support as well: https://github.com/ory/hydra/blob/master/CODE_OF_CONDUCT.md

I hope that makes it clearer and feel free to reach out to me directly in the Ory Community on github or slack.

aeneas_ory · on Aug 8, 2023

Sorry to hear that this has been your experience! What exactly was the issue for you? It’s true that there are lots of open PRs. We’re a small team and often busy with customer requirements which doesn’t allow us to get some community PRs over the finishing line (finish tests, refactor code, fix remaining bugs, do security reviews, …).

Sometimes, PRs are not aligning with an architecture or API principle which is when they often go stale. This is why we generally require design documents for changes or additions to APIs.

Saying that the open source is second class is a false accusation in my view:

- Over 1500 PRs merged in Ory Kratos alone: https://github.com/ory/kratos/pulls

- Very active contributor and commit frequency: https://github.com/ory/kratos/graphs/contributors?from=2018-...

- A growing community and footprint

Also, we do offer support contracts for self hosted environments - this is relatively new though: https://www.ory.dev/support/

It is true though that have to balance open source work and things that people pay us for. It’s the only way to ensure that Ory open source, for which we have a deep commitment, continues for a long time.

Hope this makes sense!

Sytten · on Aug 11, 2023

I think it would be fair to say that kratos was not the priority in 2022 in terms of code you can see not much was commited (https://github.com/ory/kratos/graphs/code-frequency) so I might have had a bad first impression.

A few issues on kratos that I consider relatively important are still missing / nobody from Ory is giving their input so it's hard to make progress and I would not take my time to contribute if I dont know if the owner are going to merge it.

An example that comes to mind is the OAuth email auto-verification or the search of users that is still super basic (we only recently got the filter of identifiers).

vinckr · on Aug 12, 2023

I would say that Ory Kratos made huge improvements in 2022, and the code graph is just looking like that because there was much more foundational work going on in 2021. In 2022 it was mostly adding features, fixing bugs etc, but the API and system was generally stable already.

from the top of my head some of the features added in 2022: - verification and recovery codes - import of MD5-hashed passwords - integration with Ory Hydra - device information in session - session management APIs - session metadata - blocking webhooks - many improvements to OIDC mappers - session refresh - opentelemetry tracing - complete rewrite of docs - import identities including hashed passwords - custom email templates - passwordless with webauth - 1:1 compatibility Ory Network and Ory Open Source

Of course there was a huge amount of bugfixes and smaller improvements going on already. 2023 also already saw a ton of work being done on Ory Kratos including the 1.0 stable release. Of course there is still much to do, and feedback like yours also helps! If you are looking to contribute its always recommended to talk to the maintainer before you start coding - then we can let you know if its realistic to be merged or not. Search is a not a trivial thing to implement, on the one hand it is needed in some form, on the other hand Ory Kratos should not bloat too much.

Anyway, thanks for the feedback, will take it into consideration :-)

magden · on Aug 8, 2023

I can only concur that multi-region apps are becoming the new normal.

Deploying app instances across distant locations was never an issue. However, databases used to be the bottleneck. I'm glad to see that changing, thanks to CockroachDB and YugabyteDB.

My favorite multi-region deployment mode is geo-partitioned deployment. This is when a database automatically pins user data to specific locations, ensuring low latency for both reads and writes, regardless of user location. One-minute demo how it works: https://www.youtube.com/watch?v=9ESTXEa9QZY&list=PL8Z3vt4qJT...

plexicle · on Aug 8, 2023

Awesome post, really. One of the best I've read in a while!

Total side question, if anyone knows -- what tool (if any?) was used for the graphics in this article? The dot matrix looking map style stuff? I really dig it.

aeneas_ory · on Aug 8, 2023

Thank you! I appreciate that a lot!

Our designers will love that feedback! Unfortunately it’s not a shelf product but they used Figma to design the graphs.

endisneigh · on Aug 8, 2023

Seems that cockroachdb saved the day here with its multi region capabilities. Did the other vendors have the same capabilities? Specifically things like the regional tables and columns.

aeneas_ory · on Aug 8, 2023

We made the decision to choose Cockroach in 2018 and back then no product had these capabilities. We stuck to CRDB because they delivered on their product vision and as far as our research went they have the most advanced solution.

Even Google Cloud Spanner (NOT the same as Google Spanner - the internal DB) lacks a couple of things we needed for data homing.

proleisuretour · on Aug 8, 2023

This is a great article about building global apps that require multi-region deployments. Thanks for sharing. Curious about the transaction retry errors for UPDATE that required 2 days to resolve. Probably could of been avoided using a distributed SQL database that supports a read committed isolation level ¯\_(ツ)_/¯ For those going down this path, maybe check out open source YugabyteDB. There is a great doc about how to build Global Apps using various application design patterns: https://docs.yugabyte.com/preview/develop/build-global-apps/

Randis · on Aug 8, 2023

Flexibility in isolation levels is coming in CRDB :)

ushakov · on Aug 8, 2023

If you’d like to deploy a containers or even Ory itself to multi-region cloud, you should check out EdgeNode (https://edgenode.com), which I helped build

hiatus · on Aug 8, 2023

A friendly note: when I visited your site, I immediately clicked away when I saw that learning more about the deployment process, pricing, etc required me to sign up.

ushakov · on Aug 8, 2023

Thanks for the feedback. This helps us a lot!

We’re currently in a early stage, but more info will be available to public in the next couple of weeks. You can fill out the form, if you want to be in the known: https://tally.so/r/w2ajRb

Thanks again, appreciate your honest feedback.