Hacker News new | past | comments | ask | show | jobs | submit login
The Difficulties of SAML Single Logout (shibboleth.net)
102 points by mooreds on March 17, 2021 | hide | past | favorite | 67 comments



SLO is the nightmare nobody touches.

In practice, logout in a federated environment seems to be

* A user logs into apps A, B, C (...)

* While using app C, the user clicks logout. App C trashes their cookies

* With luck(!) the IdP provides a low friction logout endpoint – think "GET /logout?redir=http://foo.bar/logged-out".

The app redirs to that endpoint, the IdP trashes cookies and directs back to the apps "you have been logged out" page.

Apps A and B? Go with god, and good luck. Keep your sessions short.


SLO is such an incredible pain in the ass. I'm the founder of WorkOS[0]and I have been struggling with how to address this problem for 2 years now. There's no elegant solution.

We've taken the approach of building a second API that integrates directly with the directory systems of record (Workday, Gusto, BambooHR, SCIM, etc.). These are often the actual "source of truth" and a more dependable source for the group membership changes that should trigger de-provisioning. But unfortunately even SCIM isn't well enough supported yet to be considered a base standard. (It's worse than SAML.)

Enterprise authentication and authorization is totally fvcked. Trying to fix it brick-by-brick.

[0] https://workos.com


I think it's a pain because it's one of those things you just can't guarantee. I can (try to) promise you as an IdP that website X can't just randomly ask for your information and be given it without your consent.

But once I've given that information away, it's irreversible. You can't unsend data. You can ask nicely for it to be deleted, or revoke future access, but unless you have a time machine, federated SLO for all parties in a session ought to just be considered an impossibility, and planned for accordingly with short sessions.


> But once I've given that information away, it's irreversible.

In an enterprise environment all software has a basic level of trust of "this will try to do what it claims to do". That means the ID provider can directly ping the server of whatever service was issued the data or login cookies and say "this data should be deleted please".


If you administer every SP, it starts to become a possibility. But as TFA mentions right off the bat, that's atypical.

And even if you do exclusively use SPs that support SLO, is the user supposed to wait while the IdP does all of that outreach in order to know that it worked, or 97% worked because one timed out? That depends on whether the user should even care about the outcome -- if they should care, then do they get an email when it achieves 100% hours/days later when that down-at-time-of-SLO SP is back up? Should they report an incident if they never see a success report?


> But as TFA mentions right off the bat, that's atypical.

So atypical that it borders on inconceivable. There's almost always an external party in there.

But if you somehow end up totally internal sure, try the back-channel binding. You get to toss SOAP messages around, and it might work. Of course now one of the invariants you built around no longer holds ("SAML works without having to let the IdP talk to the SPs" oh my sweet summer child...)

Just don't try SLO over the front channel, or else you'll have the joyous UX of a user clicking "log out" and all of a sudden they're bouncing logout messages back and forth between the IdP and two dozen internal applications. Hope the ninth app in line isn't down for maintenance or your <LogoutResponse> never makes it back to the IdP and the user wonders why they're staring at a server error page for an app they haven't used in hours and why half their environment is logged in and the other half isn't...


Dealing with auth, including single lockout, is all much easier if you can put all your systems behind a single reverse proxy on a single domain.

(Certainly that's not always possible, but for many systems it is).


There's a reason huge enterprises still pay big money for stuff like CA's Siteminder: it's way easier to bolt an agent along side your crappy internal app that thinks about this stuff for you.


Like https://www.forgerock.com/ formerly known as OpenAM / OpenSSO.

It's not fun to administer.


All the SLO solutions struggle with the fact session IDs are disassociated with the IdP except by policy. Policy is insufficiently strong.

The session ID needs to be cryptographically associated with the IdP so if you blow away the IdP session unilaterally you cannot decrypt or access any SPs session.

For instance but probably not sufficient as a solution, you could imagine IdPs running a JavaScript inside the SPs client session that provides half of a key pair and the SP providing the other half that combined form the session ID. Then once SLO is initiated the IdP script no longer provides its key.

I have not deeply thought further about how to completely design a flow like this but I strongly believe this kind of cryptographic session ID is the likely idea that will lead to a solution.


Yeah, it seems kind you want a refresh token that you periodically validate against the source of truth to handle these cases.


I ran into a bug like this in Australia's MyGov system.

I would log in to the main page where we can view government services. Then I would click through to the ATO (our IRS) to see some statements or whatever. I would then close the window and log out in MyGov then log back in with my wife's account. Click through to the ATO would show my details instead of hers. IIRC logging out wouldn't work either since that would mess up the session, I'd have to come back later.

Couldn't find anywhere to report this so I ignored it. I think they fixed it now.


Not the first time they've ignored issues in myGov. This guy tried endlessly to contact them about a XSS vulnerability to deaf ears:

https://www.smh.com.au/technology/revealed-serious-flaws-in-...


You've got understand that the orgs involved in myGov, such as Centrelink, Medicare, and the ATO are special bad. They're dysfunctional to a degree most people simply cannot believe.

I was at Centrelink's IT department for just a couple months, and that was the only time in my decades-long consulting career that I had seen an adult man cry. Not for personal reasons. Work-related reasons. Several men, on multiple occassions, for different reasons.

That place crushed their spirits, and the tears just had to come out...


I have known a few people working in IT in government in Australia and it sounds like absolute hell. It takes a special kind of person to work in government. You either have to be extremely incompetent to the point you cant even tell that things around you are a disaster, or you have to not give a shit to the point where waiting a month for someone to press a button does not bother you and you are only focused on what you will be doing when you get home at the end of the day.

This one guy I knew told me that he had seen meetings dedicated to working out how they could spend money since they hadn't spent the whole budget yet. Services run over hundreds of servers when it could be done in one just to keep the whole team of system admins employed.

Government departments are filled with bloat to a level you wouldn't believe.


> meetings dedicated to working out how they could spend money since they hadn't spent the whole budget yet

I think this is fairly common in government outside of Australia as well. With the budget it’s use it or lose it, so you better find ways to spend it, or you’ll have less margin next year when you might actually need it.


I know about this - it’s actually worse and very serious if you underspend.

Almost all the higher ups are paid out of budgets based on how much you spend- so the supervising agency gets their 15 percent admin fee, your managers are part of the indirect cost pool etc.

If you spend 50% of budget it is game over upstairs. And you lose the money at year end, and future allocation is dropped. And all the budgets in long chain up get messed up. Almost nothing gets more attention and priority than this.


I should have said that I don't blame any individual dev or sysadmin for failing to respond. It's a deeply entrenched organisational issue.

Imagine fixing an issue on a Model 204 mainframe, with a perpetually dwindling number of people you can rely on for help, knowing that if you don't fix it, people don't get their welfare cheque.

High stress, low reward. Unsung heroes in my books.


> people don't get their welfare cheque.

A "fun" anecdote I heard was that every 1 day of outage of Centrelink's mainframe would result in 5-7 children with broken bones.

You see, some alcoholic high-strung fathers in rural communities were spending their welfare cheques mostly at the local bottleshop. If the cheques stopped, their booze supply stopped, and their withdrawal symptoms would send them into mad rage. All too often, they'd take it out on their kids.

Apparently someone at Centrelink was tracking this kind of stuff by gathering paediatric admissions data from hospitals.

While I was there they had a 3-day outage that everyone just laughed off. You do the maths.


Terrifying


I had to implement a SP SAML logic at a previous job. The customer who wanted it just kept telling us to enable some apache mod and it works, except when you have a java fat jar without Apache, you have to add your own endpoints and we had to add integration for native clients.

I asked about implementing logout cause I saw it was a thing in the SAML docs. The customer didn't care so our app trashed it cookies as usual and didn't report it to the IdP. It just wasn't worth the time.


Cookies need an alias mechanism. Combined with explicit cross-origin entitlement you could just keep a single cookie to satisfy all of those sessions.


A merkle tree for cookie validation would be cool.


Doesn't it depend on who controls apps A and B? If they are COTS or created by different parties, then all best are off. But if all three apps are created by the same org, which can specify standards, doesn't one have some hope?


The whole point of SAML is that you can manage your users for third-party applications in your normal user store instead of needing to provision accounts in multiple places.


Somewhat, but it still requires a lot of storage of state and coordination between systems. This isn't SAML specific, it's a property of any federated login (SSO) protocol.


Apps A and B could just keep a list of expired cookies and the IdP could add to that list using a server-to-server call.


the main idea of SAML is not to do direct server-to-server communication.


This is how it should work. But I don't think any popular system implements this.


Opening that documentation page gave me flashbacks.

I made the switch out of IT to another field. Configuring a Shibboleth IdP was probably the hardest thing I ever had to do in my IT career, it really pushed my capabilities. SLO wasn't the only hard thing, the whole thing was immensely challenging and every time I'd restart Jetty I'd be holding my breath hoping it would come up again.


I had just started a new job and my manager handed me a project to implement the SP side of SAML into our monolith. He explicitly said, I looked at the docs and don't want to deal with that headache. Fuck, that was a pain in the ass. So many bugs just due to conflicting statements in different parts of the docs. Everything can be done 5 different ways.


This is a direct result of a spec that basically says "here's a grab bag of options, pick what suits you".

Maybe your IdP expects SOAP over HTTP but your SP won't. Perhaps the SP insists on encrypting AuthnRequests. God help you if one side wants to do URL encoding and DEFLATE.

I've made my life easier by refusing to ask/answer questions around SSO and instead insisting on talking about "ADFS login". We still do SAML, but at least there's a baseline implementation that I can plan for.


The SP is a walk in the park compared to the IdP.


To be fair, SAML itself isn't that difficult -- Shibboleth is just not very good.

I implemented a SAML IdP [0] in MUCH less time than it took to configure Shibboleth. The specification for SAML is pretty easy to comprehend.

The implementation is really an experiment, but the configuration and usability is significantly better. Improving the implementation doesn't affect this. In some closed-source forks I've written a production version that's been in use for several years.

[0] https://github.com/rkeene/saml-idp/blob/master/lib/saml/saml...


What are you doing now?


I'm a lawyer. Very different, I know!

I love IT and programming, but I knew if I made a life long career of it, I'd grow to hate it.

I really like doing IT related things in my own way. I'd always want to do things to best practices and not just get the job done as I was told.


I would love to pivot to law, I helped my ex do a lot of her homework and have a false sense of understanding. Only problem is that I’m old and a new start will not be easy when competing with people 20 years younger.


I know solicitors a fair bit older than me (is, in their 40s) who made the jump and find they more readily gain the respect and trust of clients, and senior lawyers within the firm, by virtue of their maturity.


Ah, the dates of the comments at the bottom helped explain the lack of modern day issues we now hit - 3p cookie depreciation and the Chrome Samesite change. Both of these break logout, but certainly weren't a concern in 2008.


It was last updated in 2017 per the date at the top, but yes, it's an older article.

This post indicates that 3p cookie issues only affect SSO providers without a hostname that lives within the primary websites domains: https://blogs.akamai.com/2020/01/cookies-single-sign-on-and-... (so SaaS offerings without domain masking). Is that your understanding?

Here's an interesting post on the samesite implications: https://www.troyhunt.com/promiscuous-cookies-and-their-impen...


Yup, that's precisely it.

We documented the login breaks from Samesite and 3p cookies, but logout was both not particularly a concern for folks (I've gotten about three inquiries in two years), and pointless to document tombstoning patterns for Samesite since ITP would break those anyway.

https://docs.microsoft.com/en-us/azure/active-directory/deve...


Is the conclusion to just use use SameSite=None?

I did some research while back and found that shibboleth supports local storage for sessions [0], unfortunately the IdP+SP I'm using do not support such a thing.

[0] https://wiki.shibboleth.net/confluence/display/DEV/IdP+SameS...


I’m not sure if CAS is even partially SAML compliant or even related to it at all, but the way CAS handled it was that for every service ticket issued, the authentication service was supposed to ping a URL destroying a service ticket. It didn’t have very good fallback (eg if it couldn’t successfully reach the service ticket destructor endpoint, it wouldn’t attempt to retry).

Perhaps the safest way is for apps is to verify periodically if a SAML session is still valid, if there’s a mechanism to do so.


> for apps is to verify periodically if a SAML session is still valid, if there’s a mechanism to do so.

In general, that seems equivalent to the SP having a short session and the IdP having a long session. Suppose it's 1 hour and 8 hours. At 1h0m1s the next request to the SP causes a redirect to the IdP, and if the IdP hasn't heard about a sign out request, it just redirects back to the SP with a valid assertion that gives you another hour on the SP. No prompt to re-auth until the 8 hour mark.


I believe the desire to make SLO work well faded (or complacency with it being discouraged grew) as screen locking became widespread. Using screen lock was once the dance of security enthusiasts, but it has become a nearly ubiquitous habit for general users.

Sharing of OS-level accounts remains an interesting challenge, so devices in those situations should involve identical account sharing at all application layers.


This article was of interest to me because my company just implemented SAML Logout: https://fusionauth.io/docs/v1/tech/release-notes/#version-1-...

and I was looking around and ran across this post from the Shibboleth project.


This thread makes me feel better with struggling to implement this. I'm not the only one.


I added SAML support to an application once and I feel “not as easy as it seems” is a good description for the entire standard.


You can use an identity broker like https://fusionauth.io/ (disclosure, I work for them) to ease the pain. We put a dev friendly wrapper over SAML connections, among other functionality.

Here's our IdP docs: https://fusionauth.io/docs/v1/tech/samlv2/

Here's our SP docs: https://fusionauth.io/docs/v1/tech/identity-providers/samlv2...


That's exactly why we built WorkOS. It's like "Stripe for SSO/SAML."

Shameless plug for my startup. Hope that's still ok on HN :)

https://workos.com/


This sits between a company’s ActiveDirectory and something like Okta? Or is it more like Sail Point that lets you write logic to link everything up?


It’s infrastructure for SaaS apps to easily connect to all the different systems used in the enterprise, including Okta, OneLogin, ActiveDirectory, Workday, etc.


Probably a dumb question, everything in my sphere is active directory - why would you have more than one identity provider?


Workday: HR system of record, source of truth on which accounts should be provisioned and deprovisioned in AD.

Active Directory: holds those accounts.

Okta/OneLogin: facilitates use of AD accounts to log in to webapps.


Ah thanks, I thought he meant they were all being used as identity providers - this makes more sense


If you’re a developer building an app, your customers use a variety of different systems. These are fragmented and you need to support all of them, not just Active Directory.


The previous post said "the enterprise" which makes me think of a singular organisation using all of them - maybe I'm misunderstanding, but why would a single org use more than one?


"The enterprise" is jargon for "the enterprise market."

It's rare that a company will use multiple identity systems, but there exists lots of fragmentation across the large companies, which makes building a universal solution into your app very time consuming.


One of our customers brought this up with us briefly when discussing integration of our application into their infrastructure.

I had independently concluded that the complexity probably wasn't worth it, but I hadn't considered shorter sessions as a mitigating factor.

Looks like the sands of time are the best solution yet again. I can easily spin 15 minute sessions as a superior alternative to SLO when talking to a customer. Determinism is the biggest point and I can play that against security & compliance very easily in my industry.


SLO is one of those things that could be good if everyone played nicely together.

If you end up having to implement it, hedge with sessions as short as your compliance needs dictate. Expire your own session at the very beginning of the dance before you lose control.

And do not, under any circumstances, let them talk you into front-channel SLO. They probably won't unless they're totally clueless. But if they do it's the surest way to end up blamed for someone else's problems. Otherwise you'll end up with a support ticket some day that says "I clicked log out on bob1029's app and got a Datadog error page, what?!". And I'll smile.


Thanks for the heads-up on potential traps. I will make sure we keep a tight leash on this conversation...


Can this be solved on the device (which is where it matters, according to the article), by clearing cookies across all sites in the browser?

This might require a trusted browse extension or browser feature to delete a "well-known" cookie across all sites.

It's fundamentally weird to ask for a feature "I want some website to end my sessions across all other websites on this device."


I've written and implemented several SAML systems. It doesn't held that SLO is only a consideration when it becomes a problem and is usually not even part of the initial scope, more of an after thought.


Feels like the end state will be adding IDP support directly into the browser / OS since it has cross domain / app visibility and can “do the right thing” in more cases.


And the pendulum swings back to Kerberos.


He he... enjoy. Hopefully this doesn't trigger any nightmares. https://twitter.com/bpontarelli/status/1099067076138827776?s...


Tweet is a reaction gif.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: