Hacker News new | past | comments | ask | show | jobs | submit login
Authorization in a Microservices World (alexanderlolis.com)
253 points by zaplin on April 1, 2022 | hide | past | favorite | 69 comments



Love this article. My favourite part is:

> "So the logical thing to do is to implement an authorization service and everybody would be able to use that and keep your precious service boundaries, right? WRONG. Your hell, has just begun!

Drawing the right boundary for authorization is near impossible. If I want to check whether the user is allowed to see who left an emoji reaction on a comment response to an issue inside a repository belonging to an organization -- do I store that entire hierarchy in my authorization service? Or do I leave some of that in the application? I've yet to see a good heuristic for the latter.

Also, thank you to the author for referencing us :) I'm Sam, Cofounder/CTO at Oso where we've been working on authorization through our open source library for the past couple of years. Authorization for microservices has been a recurrent theme over that time, and we've been furiously writing on the topic (e.g. [1], [2], [3]).

We'd be interested in talking to anyone who is currently facing the challenges of authorization for microservices (or really just multiple services). We're building authorization as a service, which you can learn more about here: https://www.osohq.com/oso-cloud. It's currently in private beta, but if you would rather _not_ speak with us first, there's also a free sandbox to try out the product linked from that page.

[1]: https://www.osohq.com/post/why-authorization-is-hard

[2]: https://www.osohq.com/post/microservices-authorization-patte...

[3]: https://www.osohq.com/academy/microservices-authorization


What an evocative example! Before reading your links (sorry! They're long. I added them to Pocket), I just don't see how "allowed to leave an emoji reaction to an issue inside a repository belonging to an organization" could be anything but application logic. That is, all of the logic should be in the application (except authentication). Every noun in the sentence is application-specific! Perhaps the organization has repository-specific blocked-users lists, and disallowed-emoji lists, and (in the application) repo owners can modify the disallowed-emoji list (subject to approval from the organization owners, of course!).

It seems like building a service that tries to abstract all that (the emoji is an "attribute" of the "action" to a "object" that has an "owner" that has "rules") is doomed to succumb to the inner platform effect. The only solution I can see is to build a general rules engine, but, in my experience, those tend to just be complicated, hard-to-read ways to implement logic that should just be code.


> I just don't see how "allowed to leave an emoji reaction to an issue inside a repository belonging to an organization" could be anything but application logic.

I think it's absolutely fair to say that authorization logic is a subset of application logic! The question is whether it's possible to separate any of the authorization logic from the application.

There are two reasons you might want to do that:

1. Separation of concerns (true whether monolith or microservices) 2. You have authorization logic shared across multiple services.

(1) is still a hard problem, but you can be a little less rigorous about it. (2) is where it gets really fun.

Continuing the example, that repository-specific blocked-users lists is _probably_ going to be needed across every other service.

I don't think any of that contradicts what you're saying, just clarifying that "in the application" might still mean you want to extract the shared logic into a service.

But to get to your last paragraph: yes it's definitely hard to build a service that's both sufficiently generic to handle all the kinds of authorization use cases people typically need to do.

You'd be surprised though at how many use cases fall into very similar patterns. We have some internal (soon to be external) documentation where we managed to get to about 20 distinct patterns -- from the common one like roles, and groups, to less common ones like impersonation and approval flows. And most of these share the same 2 or 3 distinct primitives (user is in a list of blocked users for a repository, emoji is in the disallowed-emoji list, etc.).

So you can build something less abstract than a general rules engine, but that still saves you the work of building from scratch for the nth time a roles system or a deny list.


> that repository-specific blocked-users lists is _probably_ going to be needed across every other service.

Rather than a service acting as a shared upstream for multiple microservices, you might want to put this kind of generalized authorization into an API Gateway service that sits in front of multiple (internal or external) microservices.

Compare/contrast: blocking malicious origin IP addresses in a firewall appliance. But substitute "IP address" with "API key", and "firewall appliance" with "load balancer."


Then anyone inside your network has unrestricted access. That's why almost all security innovation and adoption is moving away from trusted boundaries. It can still play a role, in conjunction with other layers.


To be clear, I'm not talking about an VPC-edge WAF; I'm talking about a service that sits in front of — and encapsulates — only the specific microservices that require it. An internal ingress controller, in k8s terms.

And also, to be clear, the services would still do domain-object policy-based authorization themselves. The point of such a multi-microservice API gateway is to optimize universal, pre-authentication, static-credential-based denials (e.g. blocking specific API keys, rather than blocking specific users) out of the critical path, such that users can't DoS your backend with 403-generating requests.


> You'd be surprised though at how many use cases fall into very similar patterns.

I look forward to seeing that pattern list. I'm in an adjacent space (an authentication server offering RBAC) and am constantly amazed at the intricacies of many orgs authentication needs.

I suspect that common patterns can take care of 95% of authorization needs, but I imagine there'll need to be an escape hatch for the 5% that are really super business case specific.


Super insightful, thank you!

One concrete example of completely separated auth is AWS. It would be quite the nightmare if each AWS service had its own authorization system. Centralizing that management in IAM makes it... manageable (and only barely at that).


Even then, that only covers authorization for things in AWS that fit the general shape of control-plane API calls. There are endpoints in AWS that don't fit that shape, which do their own thing for auth (see e.g. S3 uploads with signed URLs.)


But that's the difference between authentication and authorization - sure, you've logged in, and we can verify that, but now we need to know if you're permitted to do what you're trying to do. And yes, authorization will then have awareness of some amount of application/business logic.


I really need to write more extensively about this, but I've dealt extensively with authorization as a topic.

The concepts are simple but the implementation can be very difficult.

Authorization and Business Logic are two entirely separate domains. You have to start there. They are orthogonal.

From that follows that requirements involving who you are get implemented in the authorization logic/service/etc, and other requirements get divided up into the appropriate domain logic/services/etc.

If there are requirements around access to emojis then that involves the authorization service.

Sometimes data needs to get duplicated across service boundaries and that's when you need application concepts like sagas to manage this. That's where the implementation starts to get difficult.


I’ve always wondered how authorization should be handled “properly”, as in, what is the end game that is capable of handling the problem at a scale seen at places like AWS? Are the validations and checks still integrated into a (middleware layer of the) service implementing the business logic? If so, how is all this governed, such that correct implementation of all authorization logic can easily be audited?

I would absolutely love to learn more about this, I feel like I’m unable to conceive an appropriate solution to these requirements.


Google has a paper on their system, Zanzibar: https://research.google/pubs/pub48190/

Doubt it will answer all your questions but I found it interesting.


Please let me know if you do write something. I would love to read it!


The best I've ever had it is using an API gateway that destructured a token into headers. Back end services used MTLS. This meant testing Auth was as simple as adding headers. No server needed to be up, no jwt nonsense needed to be mocked. I can't recommend enough keeping this nonsense at the boundaries.


So if I understand it correctly, a service would respond with http headers that describe the claim necessary for the action? Which begs the question, how would that work with side effects.

Or would the acquired claim be communicated towards the service in the request? Which begs the question, how does the service communicate which claim is required.

Not trying to be critical by the way, genuinely curious.


That sounds like authentication not authorization.


The way it's done at my current company is that services do their own minimal authorisation based on their own business logic, if any at all, and the rest is done by a dedicated "policy service" that is called by the request composer (which is the only service the FE calls) to do very advanced checks on what individual users are and aren't allowed to do, with the policy service being configurable through YAML "policy documents". It seems to work very well for the colleagues of mine that work on it.


A former employer (top 5 investment bank) did this as well. There was a central store of identity (well, two: One Windows and one non-Windows). Your application then included it’s own policies (written in Prolog) that could reference identity details and/or deep intrinsic request details. As soon as Prolog hit a condition where it couldn’t unify your request and the policy, you got a no.

There’s a similar OSS implementation (OPA) targeting mainly k8s but allegedly useful generically that uses Datalog.


Nice!

Is the request composer responsible for checking the authorization data? Like what roles/permissions the user has?


Not the OP but we did similar and this front end gateway/"backend for front end" would do the roles checks, yes. Back end services could do course grained checks if needed.


In the enterprise system I design/implement/maintain, I currently have a (!)limited set of roles for my main business logic service.

Each task in that system has an accompanying set of roles that are authorized to execute it, and essentially that authorization happens at the controller level and is "at the edge" of the application instead of in the business logic itself.

Is there a benefit I could gain from Oso? I will read what you've provided.


Honestly, if your authorization needs are coarse enough that you can (a) handle it as middleware at the controller level, and (b) mostly just rely on roles, then you're probably in a good place to keep going with that! It avoids a lot of the complexity of centralization.

Where we normally see people considering Oso is when the model gets more complex, or the data requirements get larger. E.g. when you introduce new features like user groups, projects, or sharing, then the amount of logic + data you need to share between the services grows beyond what's sustainable.

If you're current system is working for you, I'm not going to try and tell you otherwise! If that changes though, let me know ;)


I've been working as an IAM engineer for my entire career. This is a really good write up on a few ways on how you could handle authorization, but I think it also highlights the challenges with it

The more I come across different systems, the more I realize authorization in large distributed systems isn't ever one approach: it has to be tailored for each use case with different tradeoffs in mind. It's often directly coupled to the problem domain you're trying to solve for. It has to be integrated with the data access pathways, _and_ also has to be tailored to the authentication system it deals with, _and_ it has to be tailored for the data-locality model of the overall system.

The more authorization problems I solve, the more I realize that my dream of coming up with a generalized authz SaaS service that helps me grease out VC money & a billion dollars probably doesn't really exist. It's different from authentication, because authentication has less dimensions of coupling, and less tradeoffs (Auth0 sold for 6b, Okta worth 18b, both authentication offerings)

Maybe I'll figure it out one day. Or maybe this is one of those problem domains that is only solved by an army of engineers


Very much agree that authorization is a more domain-specific problem than authentication... but there are some common patterns that are emerging, and can help reduce how much wheel reinvention has to happen.

There are (at least) three of us startups on this thread that are trying to tackle this :) (disclaimer - I'm a co-founder of one of these - Aserto).


Do you have anything for Authorization for date withing DataBricks?


maybe soon there'll be 4 ;)


Well, I potentially have some good news for you :)

There are a bunch of companies who popped up in the last few years to solve this problem. We're one of them -- Oso (I'm the CTO).

It's definitely a fun/challenging problem to work on. So if building a generalized authz SaaS is the dream, come join us!


Just a heads up, looks like markdown formatting got into your "go get" command for the golang page.

https://www.osohq.com/learn/rbac-go


Good catch, thanks!


https://www.osohq.com/ is a very interesting company in this space. They have an open source authorization library, and a great blog.


Thanks epberry!


> "Direct DB reads, outside the domain of each service, is a bad idea and it will be obvious to you why on the first schema change. Please don't be lazy and just say no to this."

Good read, but I am not really a fan of that stance. The "separate schema per ms" pattern is an overhead which has to be justified and even paid for by customers (small 50k projects and enterprise grade projects of million $). First and foremost the db is a shared data structure which CAN be used for communication and in a one actor writes and 1..n actors read scenario (especially when all connecting services are controlled by the same team) is typically the simpler and better approach as a start. Schema evolution will also hit you with messaging and APIs which can be done properly in smaller databases as well.


A single database instance can host multiple independent schemas. No need to pay for more. Treating a database as an API is more than doable, but the whole team needs to be on point with the necessary shift in expectations, as well as the shift in tooling and best practices.


I meant the overhead - in the case of two services - of managing and developing two sets of schemas and a mechanism of data exchange (be it rest apis or queues. both come with a producing and a consuming side). There are many reasons where that makes sense (e.g. both ms have to evolve the shared storage schema in different directions) but I would not do it out of the box every time like the author implies.


I was surprised how well treating the database as an API worked at one series A company. We had a single database schema shared by ~5 services. The coordination cost wasn’t bad, and I don’t recall any major issues. It was a big net win for our velocity.

I’d do it again, if I had the guts.


You certainly can have a monolith database, but typically you'll find that many of the Conway benefits of services will be lost as a result.


Disclaimer: I am the co-founder of Cerbos.

Thanks for explaining the problem space so well and referring to Cerbos[1] as contextual solution.

We had to build these systems in the past so many times and we were tired of reinventing the wheel. With Cerbos we aim to turn this problem space into configuration rather than code and enable enterprise-grade access management for any application.

[1] https://cerbos.dev


If you're looking for an example of how to integrate Cerbos with an auth provider like FusionAuth, to augment a typical role implementation with finer grained permissions, check out this blog post: https://fusionauth.io/blog/2022/01/18/extending-fusionauth-r...


Author here. First of all thank you for making this article appear in the HN frontpage. I read some interesting comments and approaches but I feel that a piece is always missing (or maybe I did not understood). The piece of how the authorization fetches the necessary application data it needs in order to decide what to do. If you do not need this piece then you can get really creative on how to solve authorization and use an approach that fits best to the needs of your system. But if you do, then I would love to hear more details about this part.


Great article! Disclaimer: I'm a co-founder of Aserto [0], where we're building a platform for API / microservices authorization.

I couldn't agree more that the question of how to get the data to the policy decision point is one of the most interesting and hardest challenges for this scenario.

You don't want to replicate the entire world in both your application's data store and your authorization system. But you also want to follow the principle of separation of concerns as much as practical.

There may be scenarios where your authorization system has to make calls to an external system, but that can cause availability and latency concerns. Ideally the decision engine has all the data it needs to make an authorization decision, without having to query another system.

For filtering scenarios that involve returning too much data to the client (which the client would need to filter locally), you could also imagine the authorization system doing a "partial evaluation" and return you an AST which you can walk and attach as "where clauses" to your query. Here's an interesting read on how to do this with the OPA decision engine. [1]

[0] https://www.aserto.com [1] https://blog.openpolicyagent.org/partial-evaluation-162750ea...


Which is why I wont be able to find true love and I’ll die alone. I’ll die alone, without ever knowing it’s my birthday… [0]

0: https://youtu.be/y8OnoxKotPQ


If your architecture contains Kafka, you can use Kafka ACLs to authorize access to any resource, not just Kafka ones. Based on a principal, a named resource (derived from, eg, a URL path) and the requested operation, the Kafka admin client can tell you whether there is a matching ACL that permits the request.

I've had success doing this.


Disclaimer: I work at Elastic.

Elasticsearch has a general purpose authorization system as well, based on the concept of "application privileges": https://www.elastic.co/guide/en/elasticsearch/reference/curr.... It's an interesting concept I never really considered at previous jobs, to piggy-back the authorization system of some infrastructure already in your stack. On one hand I can see some team members finding it to be a bit of a "smell" since it's pretty far from the original intended use case for something like Kafka or ES, but on the other hand it can free you up from having to build this type of thing yourself from scratch or libraries.


An interesting and well written article. I appreciated that they got into horizontal trimming aka field trimming, but vertical trimming like “this person can see all things for all objects except those with property X==1” is a whole separate issue with significant performance issues, particularly around list endpoints. It’s hard to make a one size fits all solution here because sometimes the perf issue is on the frontend (which items can they see) and sometimes on the back end (now for a fragmented set of entities, fetch and return the appropriate information)


Am I the only one who thinks the initial PHP example is just fine?

Seriously, you're claiming to "solve" one "problem" by building an entire new system which is inevitably massively more complex and difficult to manage than the "problem" you're trying to solve.

Then again, I don't work in enterprise-level environments but to be honest if this is what it's like, I'm glad I don't.


Works fine for 1 application, but what about for thousands of services with many teams?

Everything is difficult with scale. We can’t expect every service owner to implement authz correctly, but if we can expose and build tools that help standardize and abstract as much as possible the difficulties of authz then service owners can focus their energy on other things.


If it’s fine, you can consider yourself lucky. Authentication and authorization complexity is IME squarely proportional to system and organization size due to network effects inside them. At some point it becomes a significant subsystem in itself and contrary to belief of some it becomes a core business problem, not a checkbox to tick off.


I know this is a space for entrepreneurs, bit it is a little bit annoying to constantly see folks jumping out of the woodwork with "you should try my product!!!!" whenever there is something semi relevant posted


I've been using Open Policy Agent for this scenario with great success. You can either use it in a AZ as a service setup or embed it into middle ware for a sort of "decision making" library.


As in OPA determines if a user has access to a resource? Do you have some resources on how to do this?


You do need to have a strategy for how to load the resource mappings into the OPA engine. If they don't change very much you could embed them in the data.json file of the OPA policy itself. But more often than not, that data is changed often (e.g. when someone grants someone else access to a resource). In that case, you'll need the OPA engine to query an external data store via an HTTP request. Or you can use a resource cache, the way we do at Aserto.

Here's a blog post [0] about the challenges we faced when using OPA for application authorization.

[0] https://www.aserto.com/blog/the-challenges-of-using-opa-for-...


OPA has a whole policy language to define how people have access to resources however you please. See details here: https://www.openpolicyagent.org/docs/latest/policy-language/


I don't understand why this is considered a bigger challenge than any other relation between small services.

If two services is chatty, they should be merged. This can be on several levels, data being one level.

Authorization services should not be chatty with other services. And why would they since you ask for claims upon an authentication request. Yes, I meant authentication.

And I believe an authorization service is something that has been solved for a long time. It doesn't have other traits than any other services you must collaborate with, that migh be your mistake to look at it like so.

The authorization service will give you the claims for the user and, each service will dictate whether the claims are valid and sufficient within the service boundary.

Don't look at it ad a special kind of service, because it's really not.


> If two services is chatty, they should be merged.

Welp, I guess I'll just merge with stripe then, thanks for the tip!


I've never understood why x.509 authorisation certificates aren't more widely used for this. They're off-line, self sufficiently asserted in the connection context, stacking, are unopinionated about application semantics in terms of authorisation rules/grants/restrictions, have well tested infrastructure for parsing and validation, and can use ocsp stapling for validity, and ct logs for audit. They're used in CERN VOMS, etc. and seem like a great fit for distributed systems doing authorisation but they don't seem widely known or used.


Because it's not so easy to setup and manage? Maybe it will become more popular with tools like https://github.com/gravitational/teleport


That's true. Honestly, it's probably almost entirely down to the fact that the openssl cli 'Swiss army knife' tool doesn't support them purposefully.


To handle direct database read by multiple micro services you can create views for the different type of authorization checks you need to do.

If schema need to change you just update the view.


I always wonder how centralised permissions works with listing endpoints.

For accessing one resource that's fine, a cached call to any k/v store would work. But when you want to list every items for a resource in a table that contains millions of rows it starts to be more difficult.

Without a JOIN you either need to do a costly WHERE IN or filter after the fetch that can results in scanning everything for nothing.

Is there any blog post on that?


I think this glosses on a very important part, which is just named in passing: "how do you actually know that bob is bob, and how do you trust that?"

article answer is 'user role [..] attached to a JWT' but that only really applies if you control your distributed microservice system, if you need to scale to etherogeneus identities you need to get into the magic world of federated authorities

and that is where the pain really is.


I agree that identity is important, but I would argue that challenge lies in authn and would be it’s own separate article. This focus was on authz. We are assuming we trust the passed in identity at this point. Eg user has authned, session is established, and we trust that the identity has been passed securely from downstream.


It says something about the power of branding that my first eye scan parsed this as "Authorization in Microsoft Word".


Nit: full-proof is an egg-corn. It's fool-proof.


Please, stop posting this every 2 days.


This seems like an unfair comment. I looked at the poster's submission history, and it does not show what you are claiming.


TFA has been posted three times since it was published 12 days ago. Which seems OK? I'll be more concerned if it gets published at a similar rate going forward, but since it got some upvotes and discussion this time around the submissions might stop. No other piece from this blog has been submitted.

https://news.ycombinator.com/from?site=alexanderlolis.com


I was the one who posted it two times. The second one was an honest mistake. This third time wasn't me but I am glad that someone did.


Yes, TFA is definitely what we want on HN. There's nothing wrong with reasonable reposting. In years past, I had been encouraged by the actual HN moderators to repost.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: