GitHub's SAML implementation is useless. The idea is that you can bring your own account into an enterprise, and that sort of works on the site itself, but it does not prevent apps where you log in with GitHub from reading your organization membership once you have authorized an app on the organization level (and if you didn't, it hides the membership from oauth tokens, so it has this capability!).
A SAML session is only required if said app fetches data via a token obtained from that user - and in my glance around, this was almost never the case - SAST tools almost always use app instance tokens and are happy to show anyone with a GitHub account in your organization your code. Tailscale fixed this when I pointed it out, Sonarcloud told me to please don't tell anyone and GitHub took a few weeks to say this is totally expected behavior - when no vendor I told did, and their docs contradicted them.
I swear, reporting security bugs is a thankless endeavor, even if you just randomly stumble over them. I couldn't imagine doing this as a job.
> The idea is that you can bring your own account into an enterprise
The issues goes beyond authorization. I’ve had Github randomly once in a blue use my personal email address as the default when merging a work PR. If anyone asks, I advice against mixing personal and professional stuff in the same Github account (or anywhere).
This is exactly why I’m so paranoid about account and device separation.
I don’t even trust Git profiles. I buy a new license for GitKraken at any job I go to, even if I could avoid it; to me the possibility of accidentally trying to commit to work GitHub with my personal GitHub or vice verse is not worth it.
It’s the same with Microsoft accounts and their infamously bad-tech-debt-caused spaghetti.
Like if you try login to Outlook on iOS and you get a threatening message to the effect of “your system administrator will be able to remotely control and wipe your entire device if you proceed”. If it’s even a possibility that an incompetent or malicious IT department wipes your personal device, then no thank you.
See also that HN thread where a father let his child use his laptop, where they signed into their Microsoft school account, and somehow his personal Microsoft account was merged into their school account and from what I could tell he was never able to fix it and the school IT department didn’t care.
Depends on the org I think now the controls are more fine grained. For example I have teams and outlook on my personal phone and the only thing they can do is delete the apps, taking a screenshot is blank, copy/paste doesn’t work etc.
Yeah someone said why the funny account name why not use your personal account and I thought "wat are you crazy". And that isn't because of SAML etc. just simple don't mix work and pleasure ethos! I don't use my personal email to send an email to a customer.
It seems to be very common to use a personal phone for work 2fa or lots of other workplace tasks. Employers seem mystified if you request a corporate device when you obviously already have your own. I even see this a little with personal vehicles.
The idea of separating work and personal seems to be becoming old-fashioned.
Tying someone's identity to a thing they barely control and find it difficult to get more than one of.
Particularly something someone might reasonably need 3 or more different instances of. E.G. Personal SemiProfessional, Personal NSFW stuff, Work but they didn't give an X this service demands.
My company does not allow any employees to use their personal GitHub for work (or Facebook, instagram, or anything else) after running into issue when employees leave.
They may decide to change their github login to <company_name>LIES, and suddenly that’s all over your old PRs and Issues. Including in public repos where customers go looking for help.
That's even more true with a dedicated work github account than a mixed personal/work one; either way they can still login and edit the account name even if removed from the company org, and if it's not shared it doesn't burn their personal account too... right?
With a dedicated work account the organization can always take over the account (via reset email if need be, since they own your work email account) and do whatever they want with it
One option for those so inclined is to cryptographically sign commits with a key that lists both work and personal email address (assuming your enterprise’s policy allows it). The employer retains control but you have a claim to credit for your work.
If we're discussing companies willing to go to lengths to scrub you from their GitHub history, they can still replace all commits you've signed with new commits. You likely have no legal rights to that work, and git does allow you to rewrite history arbitrarily.
It depends on the jurisdiction. In the US, copyright assignment is usually permanent. In the EU and Canada, you can claw back your rights to a degree and even revoke the usage altogether, if you manage to claw it back because they did "evil" things with it (moral rights).
In some cases (even in the US), if the employer does something that would be considered a "breach of contract", you can force them to remove all your code as well.
So, it would not be in the company's best interest to scrub their git history.
I think even in the EU and Canada, you don't have any copyright interest in work your perform as part of your employment. The copyright on the work you produce for your employer is entirely theirs, from the moment it is created.
Now, if you're a contractor performing work for a company, this may be quite different. But as an employee, I don't think you have any claim of authorship to the code you right as part of your job.
Sure, but the same is true for unsigned commits as well, isn't it? Or can you modify the commit metadata without changing the commit hash in those cases?
If a spiteful ex-employer wants to scrub ex-employee authorship from the entire commit history in their public repos when someone leaves I don't think there's anything you could do to stop that either way, though it seems like it would be more trouble than it's worth and likely wouldn't scale. If they don't do that, assuming your old company email address still has your name in it I don't see why you'd lose credit for the work you did.
To be fair to the vendors, Github makes it extremely difficult to do the right thing here. I built a repo/commit/pr-analysis tool (https://dev.log.xyz) and it took a lot of effort to make it so that "iff you can see it in Github you can see it in Devlog." The entire experience was beyond frustrating.
Github also makes their OAuth permissions picker extremely confusing. When I "login with Github" I am never sure exactly what I'm sharing, from which organizations I'm a member of.
> Github makes it extremely difficult to do the right thing here ... it took a lot of effort to make it so that "iff you can see it in Github you can see it in Devlog." The entire experience was beyond frustrating.
Do they? You don't have to mess with syncing teams, memberships, or assignment to repos if you don't want to. You can make one api call:
> The authenticated user has explicit permission to access repositories they own, repositories where they are a collaborator, and repositories that they can access through an organization membership.
I should've tested this endpoint. GitHub's SAML implementation is done by a different team, always lags behind in quality and does some pretty unclean patching of the data - i.e. the notification filtering is done in the templating engine, so if all your notifications are SAML gated you get the header, no "all caught up" below it and (this is live from my account) "1-0 of 113".
So I'd give it about a 50:50 chance of working.
Edit: I just realized it eats your non-gated notifications too, if they're further down than position 25, and the "Next" button just leads to the same page with "?query=". Yay, another ticket about how glued on GitHub Enterprise Cloud is. The last one (GitHub eats API calls to accept invites to SAML organizations, deletes the invite, and sends a 200, writes success to the audit log... but ends up being a no-op) only has been 2 months or so ago. Thanks Microsoft.
Yeah, it's a massive UX issue. The way to actually check if someone has a SAML session is to attempt to get their membership. If you get a 403, there isn't one. But good luck explaining to the user that they need to click "authorize" next to the organization in the OAuth flow. No way to send a hint that it may be required, and no way to do a step-up flow.
For anyone trying to connect the above to this vuln research, this seems unrelated ("GitHub doesn’t currently use ruby-saml for authentication, but began evaluating the use of the library with the intention of using an open source library for SAML authentication once more")
This is the operating procedure at every conceivable level. You would not believe how difficult it is to convince young developers raised on Javascript that client side validation is not enough, much less the business owners setting out functional requirements and budgets.
”You would not believe how difficult it is to convince young developers raised on Javascript that client side validation is not enough”
At first read, I think you’re JSplaining, but I’m willing to give you the benefit of the doubt.
How difficult is it exactly? Can you provide examples, perhaps even of the particular difficulties? Are the difficulties on the side of the convincer or the convincee, or both?
I think it is something they have to experience. Tell them if they are happy with it, give me a $10 bug bounty. Then go hack a deploy of their branch. Then tell em to keep the $10 but remember the lesson.
I convinced fellow engineers who were adamant that the code they had written was OK by writing actual exploits against their code. Twice. Worked both times, without betting on money.
I recently had to implement SAML and this headline does not surprise me in the slightest.
The SAML spec itself is fairly reasonable, but is built upon XML signatures (and in turn, XML canonicalization) which are truly insane standards, if they can even be called such.
Only a committee could produce such a twisted and depraved specification, no single mind would be capable of holding and combining such contradictory ideas.
It would be so simple to just transmit signatures out-of-band and SAML would be a pleasure to implement.
It’s much worse than you’re making it sound: XML is literally an eXtensible Markup Language, so… of course the SAML standardisation committee invented their own extension mechanism language on top of it.
Coming up with your own protocol on top of a protocol for a tiny amount of data amounting to not much more than what’s in an authentication cookie is the special kind of stupid that only the largest and most bureaucratic committees can produce.
The biggest problem with having separate accounts for everything is that a lot of the users will make their own "wish it were SSO" by setting their passwords in the systems to the same value. Then, when the weakest system is exploited, the attacker gets credentials that are valid across the organization. Yes, they should be using a password manager with unique, random passwords for each system, but realistically a good chunk of larger organizations' staff are not going to do that.
Some other headaches:
Having decentralised authentication means that onboarding and offboarding need to have a bunch of tedious manual steps, or custom automation.
Whoever does user support for the organization has to be trained to reset passwords/unlock accounts in a hodgepodge of systems.
Any security controls the organization wants to implement need to be reimplemented or approximated in a bunch of different systems. E.g. if there are regulatory requirements for account lockouts, time between explicit reauthentication, etc.
It becomes much more critical to collect the authentication logs/event data for all of those systems, and harmonize its formatting with everything else so that the security ops team isn't maintaining separate monitoring/alerting rules for every system.
For large-scale systems, there are also at least theoretically performance advantages to the kind of signed ticket approach that SSO mechanisms tend to use, versus having to do database lookups of session IDs or verify a password. It's possible to do that without SSO, but if you're going to the trouble of implementing that kind of mechanism, you're most of the way to having SSO anyway, and might as well just finish the job IMO.
SAML is not the only standard for SSO. Before SAML we had Kerberos and nowadays you can use Open ID Connect. Other standards can have their own gotcha, but SAML is uniquely horrendous.
When we get vulnerabilities in the SSO protocol (SAML or otherwise) these vulnerabilities generally only affect some of the clients (identity consumers) who have implemented the protocol incorrectly or are using a feature that the provider has implemented incorrectly. Vulnerabilities that break the entire provider are less common.
When comparing this situation to having multiple different accounts, I can't see how SSO is less secure. Sure, when you have breach that affects the entire identity provider the damage is high, but the risk of having a breach (any breach!) is lower, since implementations are fewer, more consolidated and usually developed by people with better expertise.
SAML (more broadly XML-DSIG) is literally the worst security protocol in common use. I think you should generally be taking whatever hits you need to take to transition from it to OAuth. Certainly, I would refuse to bring a new product to market that relied on it. It's incredibly dangerous. Unless there's some breakthrough in practical formal verification, I can't imagine that this will be the last or the worst DSIG vulnerability.
One day I will write an essay on all of the incredibly stupid things XML DSig does, and that's not even touching the cryptography. It's peak enterprise software brain.
Someone should go deep on the mailing list and standards body horrors of WS-* and OASIS/XACML and all that crap
My (possibly misunderstood to the point of misphrasing) understanding, is that SAML still has the point-of-difference that your sso provider can cancel a session. Is that right?
Ugh. No one should use REXML unless they have no other choice. It will happily parse invalid xml, which causes an infinite number of problems downstream.
It’s quite literally parsing xml using regular expressions. It’s an excellent case study for why you shouldn’t do that.
Projects didn’t start using Nokogiri for performance. They used it because it’s correct.
One of risks of AI code assistance is that they are not necessarily looking at the wider picture when it comes to libraries used on a large code base
I was testing o3 recently and it kept changing the library used by a block of code every time it tried to fix an issue in the block that was unrelated to the library used (haven't seen that happen with Sonnet)
Easy to see how issues could creep in because a modification is made that switches to an inferior library/gem that exists in the code base or standard library so still passes tests etc. but doesn't need a Gemfile change
He's mentioned in the article, but a major shout-out is warranted for ahacker1. He's doing really sophisticated and valuable work to secure SAML implementations. We at SSOReady are really appreciative of his work.
Related: Latacora's (2019) article, How (not) to sign a JSON object[1].
In short, nesting trees and signing them is difficult and prone to pitfalls. It's easier if the envelope holds the message as a raw string, and the signing is performed on the raw string.
Isn't the simpler conclusion here that one should look for the signature where it is supposed to be? Instead of using an excessively general XPath like "//ds:Signature" that might find any signature in any unexpected location...
Hot take, but for me the conclusion always was -- get a big stick and use it to prevent web developers from touching anything near your security sensitive code. Starting from design, protocols and data formats of it. The set of habits and design considerations simply doesn't match day to day practice of the usual web development. It's often the opposite of what you need to write normal code.
I feel most responses to vulnerabilities are so lenient, you have to throw out some baby with the bathwater, you can't surgically remove the dangerous component, you gotta chop and throw chemotherapy en masse.
If you are an IT admin with any pride, SAML is out of any future plans. The idea of SSO is suspect as a whole. Xml parsing has been hit twice in a week, avoid it in the future, anything wrong with a policy that replaces xml with json?
> Xml parsing has been hit twice in a week, avoid it in the future, anything wrong with a policy that replaces xml with json?
OAuth 2.0 and its extension Open ID Connect have been around for over a decade. They have their own gotchas (like in badly defined ID token in OIDC and the ill-thought implicit and hybrid flows), but nothing there is nearly as dangerous as SAML.
Most applications support Open ID Connect now, but I'm still seeing organization choosing to use SAML out of inertia even when they are fully capable of using Open ID Connect.
For an organization of any significant size (say, anything over 10 people), not deploying SSO would be malpractice. The point of SSO is to have a single point of control and a single, mandatory 2FA stack.
Obviously, if you can avoid doing SSO with SAML, you should.
In a security sensitive context, a parser should return an error on a duplicate key regardless what common parsers do and what the RFC fails to specify.
Implicitly, that means no security software dealing with json should be written in Go, Javascript, ruby, python, etc (where practically everyone uses json parsers that silently ignore duplicate keys)
Plenty of languages do have common json libraries w/ duplicate key errors, like haskell (aeson), rust (serde_json), java (gson, org.json, probably others), so there's plenty of good choices.
So yeah, correct parse result is '400 bad request'
And binary protocols, with index based implicit keys are and byte length prepended to variable length fields. Those are the gold standard (see ip and tcp headers.)
The sibling comment's blog post <https://news.ycombinator.com/item?id=43374972> included the relevant detail: they were just doing (...//ds:DigestValue).firstChild.nodeValue without checking that .firstChild was a Node (in the offending case, it was a Comment). Thus, the non-canonical one saw the "masked" signature, the corrected one which tossed out comments saw a Node and when two implementations differ about a signed document hilarity will ensue
Evidently it's not the same, sorry; it seems that I lept to conclusions with the two signature mismatch vulns by ahacker1 showing up so close to one another but opening the very tiny, very dark, code picture shows this seems to be xpath-centric, not nodeType as the workos link discussed
I'm guessing they didn't want to be directly responsible for dropping a zero-day that allows authorization bypass in countless systems across the planet before the parties responsible for those systems have a chance to fix them.
I'm sure the specifics will come out sooner or later.
I’m aware of the reputation of XML signatures, but it’s the first time I read about technical details, and they make my head spin.
Q: Is there any non-legacy reason to use SAML instead of libsodium’s public key authenticated encryption (crypto_box)?
Another Q: Is there any non-theoretical risk of parser differential when using libsodium’s cyrpto_box on one end and Golang’s x/crypto/nacl/box on the other end?
Wouldn't using crypto_box mean the developer would have to implement their own custom authorization mechanism from scratch?
i.e. it looks like a reasonably good way of exchanging encrypted messages, but I don't see anything in the docs indicating that it would provide the equivalent of group membership/roles/permissions.
Building something like that as custom code is a huge commitment, and could easily result in severe vulnerabilities specific to that system.
I was thinking you can strip out the retarded signature protocol in SAML (replace that with libsodium) and leave the actual payload intact, or even switch from XML to a simpler wire format like JSON for the payload. But maybe even the payload part of the standard isn't worth saving, I can't tell after reading a single article about it.
Parser differentials are expected and even necessary. What you intend to get from a signed response is very meaningful. A dilemma in modern TLS is that sometimes you want to trust one internal CA; That's the easy path. Sometimes you want to accept a certificate from a partner's CA, and you've got multiple partners - and you can no longer examine just the end certificate, but the root of that chain is equally important in your decisions.
This is also why I recommend whenever possible against AWS Sig algorithms; V4 is theoretically secure, but they screwed it up twice - SigV1 and SigV3 were insecure by design, and yet somehow made it past design review and into the public.
Interesting vulnerability! It's a classic example of how seemingly small differences in implementation (REXML vs Nokogiri) can lead to significant security holes. Kudos to Peter Stöckli and ahacker1 for finding it!
I wonder how many other libraries are vulnerable to similar parser differential attacks. It's a good reminder to be extremely careful when dealing with XML and SAML, which are complex beasts at the best of times. As asmor pointed out, Github's SAML implementation has other issues too. It seems like SAML is just inherently difficult to get right.
Also, to the person who suggested not mixing personal and professional stuff in the same Github account: wise words! I've seen that cause headaches more than once.
Maybe asking stupid question, but would older versions of puppet be affected (like 6?). Also is there a site to check deps down to what maybe affected?
I find that the "unnecessary features" and footguns are what makes XML, well, XML. I guess there must be some legitimate usage of those, or at least was back in the day. If you strip them out, you'd end up with a JSON-like (so you may as well use JSON).
No, you would have an extensible markup language. And json is not a good fit for markup.
Now, xml has also been used for a lot of things where a hierarchical format like json would have worked better than a markup format, of which SAML would be a good example. But there are also cases where a markup format makes more sense, like svg or docbook, or odf.
Some of it isn't explicitly XML's fault (although it doesn't help). SAML and especially XMLSignature are terrible standards even in ways that dont involve xml.
which is exactly the problem. if you have two parsers of the same format in a security context that show slightly different behavior (maybe in the rest 20% or maybe not) it's often enough.
Well, you can define such a subset and write or configure parsers to only use that; I've seen both XML and YAML libraries do just that, by disabling remote file loading or arbitrary code execution for example.
Disabling xml remote entities and billion laughs is a given.
In the context of saml that's hardly the least of it. Lots of the problems are things like allowing comments to sort of change the meaning of the document, allowing signatures to sign only part of the document. Allowing multiple signatures to sign different parts of the document, etc.
GitHub's SAML implementation is useless. The idea is that you can bring your own account into an enterprise, and that sort of works on the site itself, but it does not prevent apps where you log in with GitHub from reading your organization membership once you have authorized an app on the organization level (and if you didn't, it hides the membership from oauth tokens, so it has this capability!).
A SAML session is only required if said app fetches data via a token obtained from that user - and in my glance around, this was almost never the case - SAST tools almost always use app instance tokens and are happy to show anyone with a GitHub account in your organization your code. Tailscale fixed this when I pointed it out, Sonarcloud told me to please don't tell anyone and GitHub took a few weeks to say this is totally expected behavior - when no vendor I told did, and their docs contradicted them.
I swear, reporting security bugs is a thankless endeavor, even if you just randomly stumble over them. I couldn't imagine doing this as a job.
> The idea is that you can bring your own account into an enterprise
The issues goes beyond authorization. I’ve had Github randomly once in a blue use my personal email address as the default when merging a work PR. If anyone asks, I advice against mixing personal and professional stuff in the same Github account (or anywhere).
This is exactly why I’m so paranoid about account and device separation.
I don’t even trust Git profiles. I buy a new license for GitKraken at any job I go to, even if I could avoid it; to me the possibility of accidentally trying to commit to work GitHub with my personal GitHub or vice verse is not worth it.
It’s the same with Microsoft accounts and their infamously bad-tech-debt-caused spaghetti.
Like if you try login to Outlook on iOS and you get a threatening message to the effect of “your system administrator will be able to remotely control and wipe your entire device if you proceed”. If it’s even a possibility that an incompetent or malicious IT department wipes your personal device, then no thank you.
See also that HN thread where a father let his child use his laptop, where they signed into their Microsoft school account, and somehow his personal Microsoft account was merged into their school account and from what I could tell he was never able to fix it and the school IT department didn’t care.
Depends on the org I think now the controls are more fine grained. For example I have teams and outlook on my personal phone and the only thing they can do is delete the apps, taking a screenshot is blank, copy/paste doesn’t work etc.
Yeah someone said why the funny account name why not use your personal account and I thought "wat are you crazy". And that isn't because of SAML etc. just simple don't mix work and pleasure ethos! I don't use my personal email to send an email to a customer.
It seems to be very common to use a personal phone for work 2fa or lots of other workplace tasks. Employers seem mystified if you request a corporate device when you obviously already have your own. I even see this a little with personal vehicles.
The idea of separating work and personal seems to be becoming old-fashioned.
Funny that, exactly why NOWHERE should consider a phone number any form of ID.
Can you elaborate on the connection you see here?
Tying someone's identity to a thing they barely control and find it difficult to get more than one of.
Particularly something someone might reasonably need 3 or more different instances of. E.G. Personal SemiProfessional, Personal NSFW stuff, Work but they didn't give an X this service demands.
My company does not allow any employees to use their personal GitHub for work (or Facebook, instagram, or anything else) after running into issue when employees leave.
Wouldn’t you just remove them from the org?
They may decide to change their github login to <company_name>LIES, and suddenly that’s all over your old PRs and Issues. Including in public repos where customers go looking for help.
That's even more true with a dedicated work github account than a mixed personal/work one; either way they can still login and edit the account name even if removed from the company org, and if it's not shared it doesn't burn their personal account too... right?
Is this speaking from experience?
With a dedicated work account the organization can always take over the account (via reset email if need be, since they own your work email account) and do whatever they want with it
A dedicated work account _where you use your work email address_... that was the missing part throughout this thread.
But then if you do that you also lose all your open source work history, which is important from a hiring/resume perspective.
One option for those so inclined is to cryptographically sign commits with a key that lists both work and personal email address (assuming your enterprise’s policy allows it). The employer retains control but you have a claim to credit for your work.
If we're discussing companies willing to go to lengths to scrub you from their GitHub history, they can still replace all commits you've signed with new commits. You likely have no legal rights to that work, and git does allow you to rewrite history arbitrarily.
It depends on the jurisdiction. In the US, copyright assignment is usually permanent. In the EU and Canada, you can claw back your rights to a degree and even revoke the usage altogether, if you manage to claw it back because they did "evil" things with it (moral rights).
In some cases (even in the US), if the employer does something that would be considered a "breach of contract", you can force them to remove all your code as well.
So, it would not be in the company's best interest to scrub their git history.
I think even in the EU and Canada, you don't have any copyright interest in work your perform as part of your employment. The copyright on the work you produce for your employer is entirely theirs, from the moment it is created.
Now, if you're a contractor performing work for a company, this may be quite different. But as an employee, I don't think you have any claim of authorship to the code you right as part of your job.
> git does allow you to rewrite history arbitrarily.
Technically yes, but the price is too great - everybody who has cloned the repos will now have to nuke their local copies too.
Sure, but the same is true for unsigned commits as well, isn't it? Or can you modify the commit metadata without changing the commit hash in those cases?
And you could still just change it right, as long as you did so before the employer revoked your access via the work email address.
If a spiteful ex-employer wants to scrub ex-employee authorship from the entire commit history in their public repos when someone leaves I don't think there's anything you could do to stop that either way, though it seems like it would be more trouble than it's worth and likely wouldn't scale. If they don't do that, assuming your old company email address still has your name in it I don't see why you'd lose credit for the work you did.
Why not just use the GitHub generated email address you get when you hide your email?
Using more than one Github account violates their ToS though.
To be fair to the vendors, Github makes it extremely difficult to do the right thing here. I built a repo/commit/pr-analysis tool (https://dev.log.xyz) and it took a lot of effort to make it so that "iff you can see it in Github you can see it in Devlog." The entire experience was beyond frustrating.
Github also makes their OAuth permissions picker extremely confusing. When I "login with Github" I am never sure exactly what I'm sharing, from which organizations I'm a member of.
> Github makes it extremely difficult to do the right thing here ... it took a lot of effort to make it so that "iff you can see it in Github you can see it in Devlog." The entire experience was beyond frustrating.
Do they? You don't have to mess with syncing teams, memberships, or assignment to repos if you don't want to. You can make one api call:
> The authenticated user has explicit permission to access repositories they own, repositories where they are a collaborator, and repositories that they can access through an organization membership.
https://docs.github.com/en/rest/repos/repos?apiVersion=2022-...
I should've tested this endpoint. GitHub's SAML implementation is done by a different team, always lags behind in quality and does some pretty unclean patching of the data - i.e. the notification filtering is done in the templating engine, so if all your notifications are SAML gated you get the header, no "all caught up" below it and (this is live from my account) "1-0 of 113".
So I'd give it about a 50:50 chance of working.
Edit: I just realized it eats your non-gated notifications too, if they're further down than position 25, and the "Next" button just leads to the same page with "?query=". Yay, another ticket about how glued on GitHub Enterprise Cloud is. The last one (GitHub eats API calls to accept invites to SAML organizations, deletes the invite, and sends a 200, writes success to the audit log... but ends up being a no-op) only has been 2 months or so ago. Thanks Microsoft.
Yeah, it's a massive UX issue. The way to actually check if someone has a SAML session is to attempt to get their membership. If you get a 403, there isn't one. But good luck explaining to the user that they need to click "authorize" next to the organization in the OAuth flow. No way to send a hint that it may be required, and no way to do a step-up flow.
I did a full writeup here: https://notes.acuteaura.net/posts/github-enterprise-security...
For anyone trying to connect the above to this vuln research, this seems unrelated ("GitHub doesn’t currently use ruby-saml for authentication, but began evaluating the use of the library with the intention of using an open source library for SAML authentication once more")
This is the operating procedure at every conceivable level. You would not believe how difficult it is to convince young developers raised on Javascript that client side validation is not enough, much less the business owners setting out functional requirements and budgets.
”You would not believe how difficult it is to convince young developers raised on Javascript that client side validation is not enough”
At first read, I think you’re JSplaining, but I’m willing to give you the benefit of the doubt.
How difficult is it exactly? Can you provide examples, perhaps even of the particular difficulties? Are the difficulties on the side of the convincer or the convincee, or both?
I think it is something they have to experience. Tell them if they are happy with it, give me a $10 bug bounty. Then go hack a deploy of their branch. Then tell em to keep the $10 but remember the lesson.
Wow. I would never guess it was so hard to convince someone of this.
“The code I write doesn’t have XSS or SQL injection vulnerabilities,” sure. At least those are plausible things to believe.
Client side validation?? How could anybody believe in that?
I convinced fellow engineers who were adamant that the code they had written was OK by writing actual exploits against their code. Twice. Worked both times, without betting on money.
the premise of comfort from shared credentials, and perhaps of increased security from sso; breaks down the moment you have a vulnerability like this.
Any type of password store, even a physical one, or just reusing passwords, ends up being safer.
Minimalism wins again
I recently had to implement SAML and this headline does not surprise me in the slightest.
The SAML spec itself is fairly reasonable, but is built upon XML signatures (and in turn, XML canonicalization) which are truly insane standards, if they can even be called such.
Only a committee could produce such a twisted and depraved specification, no single mind would be capable of holding and combining such contradictory ideas.
It would be so simple to just transmit signatures out-of-band and SAML would be a pleasure to implement.
It’s much worse than you’re making it sound: XML is literally an eXtensible Markup Language, so… of course the SAML standardisation committee invented their own extension mechanism language on top of it.
Coming up with your own protocol on top of a protocol for a tiny amount of data amounting to not much more than what’s in an authentication cookie is the special kind of stupid that only the largest and most bureaucratic committees can produce.
Is SSO salvageable at all? It seems like the idea of just logging into different accounts is fine.
Also just the idea of connecting your accounts together such that you can get megacompromised is foundationally riskier
The biggest problem with having separate accounts for everything is that a lot of the users will make their own "wish it were SSO" by setting their passwords in the systems to the same value. Then, when the weakest system is exploited, the attacker gets credentials that are valid across the organization. Yes, they should be using a password manager with unique, random passwords for each system, but realistically a good chunk of larger organizations' staff are not going to do that.
Some other headaches:
Having decentralised authentication means that onboarding and offboarding need to have a bunch of tedious manual steps, or custom automation.
Whoever does user support for the organization has to be trained to reset passwords/unlock accounts in a hodgepodge of systems.
Any security controls the organization wants to implement need to be reimplemented or approximated in a bunch of different systems. E.g. if there are regulatory requirements for account lockouts, time between explicit reauthentication, etc.
It becomes much more critical to collect the authentication logs/event data for all of those systems, and harmonize its formatting with everything else so that the security ops team isn't maintaining separate monitoring/alerting rules for every system.
For large-scale systems, there are also at least theoretically performance advantages to the kind of signed ticket approach that SSO mechanisms tend to use, versus having to do database lookups of session IDs or verify a password. It's possible to do that without SSO, but if you're going to the trouble of implementing that kind of mechanism, you're most of the way to having SSO anyway, and might as well just finish the job IMO.
"If there are regulatory requirements for account lockouts"
Then the vendor can impmenent those? The need for control might be a source for greater risk.
Often password rules make slight variations of the passwords so that the damage is limited.
Furthermore, attackers don't try all accounts since they don't know which ones exist (unless they have email access).
SAML is not the only standard for SSO. Before SAML we had Kerberos and nowadays you can use Open ID Connect. Other standards can have their own gotcha, but SAML is uniquely horrendous.
When we get vulnerabilities in the SSO protocol (SAML or otherwise) these vulnerabilities generally only affect some of the clients (identity consumers) who have implemented the protocol incorrectly or are using a feature that the provider has implemented incorrectly. Vulnerabilities that break the entire provider are less common.
When comparing this situation to having multiple different accounts, I can't see how SSO is less secure. Sure, when you have breach that affects the entire identity provider the damage is high, but the risk of having a breach (any breach!) is lower, since implementations are fewer, more consolidated and usually developed by people with better expertise.
OIDC is better than SAML, but that isn't a high bar. And OIDC has its own problems.
OIDC's problems are nothing like those of SAML.
One can use OIDC instead of SAML for SSO.
SAML (more broadly XML-DSIG) is literally the worst security protocol in common use. I think you should generally be taking whatever hits you need to take to transition from it to OAuth. Certainly, I would refuse to bring a new product to market that relied on it. It's incredibly dangerous. Unless there's some breakthrough in practical formal verification, I can't imagine that this will be the last or the worst DSIG vulnerability.
One day I will write an essay on all of the incredibly stupid things XML DSig does, and that's not even touching the cryptography. It's peak enterprise software brain.
Someone should go deep on the mailing list and standards body horrors of WS-* and OASIS/XACML and all that crap
My (possibly misunderstood to the point of misphrasing) understanding, is that SAML still has the point-of-difference that your sso provider can cancel a session. Is that right?
Can it? In many SAML setups, there's not direct network interaction between the IDP and Service, other than at most sharing metadata via URL.
There's an optional feature in the spec I think. But in my very limited experience, it is never implemented or working correctly.
Microsoft implements this in Azure/M365.
OIDC has out-of-band backchannel logout.
Security Cryptography Whatever’s take on this week SAML non-sense will be fun.
Honestly, hadn't thought of it, but of course we should do that. Thanks!
I’d sub to infosec rant podcasts. $5/mo Patreon sub for a non-vtuber version.
Ugh. No one should use REXML unless they have no other choice. It will happily parse invalid xml, which causes an infinite number of problems downstream.
It’s quite literally parsing xml using regular expressions. It’s an excellent case study for why you shouldn’t do that.
Projects didn’t start using Nokogiri for performance. They used it because it’s correct.
One of risks of AI code assistance is that they are not necessarily looking at the wider picture when it comes to libraries used on a large code base
I was testing o3 recently and it kept changing the library used by a block of code every time it tried to fix an issue in the block that was unrelated to the library used (haven't seen that happen with Sonnet)
Easy to see how issues could creep in because a modification is made that switches to an inferior library/gem that exists in the code base or standard library so still passes tests etc. but doesn't need a Gemfile change
> It’s quite literally parsing xml using regular expressions. It’s an excellent case study for why you shouldn’t do that.
It's like a textbook example no? Don't parse non-regular languages with regular expressions.
This is a great write-up.
He's mentioned in the article, but a major shout-out is warranted for ahacker1. He's doing really sophisticated and valuable work to secure SAML implementations. We at SSOReady are really appreciative of his work.
Earlier this week, WorkOS put together a nice write-up on their own collaboration with ahacker1: https://workos.com/blog/samlstorm
>We discovered an exploitable instance of this vulnerability in GitLab, and have notified their security team
GitLab has released a fix on their end for anyone else wondering
https://about.gitlab.com/releases/2025/03/12/patch-release-g...
Related: Latacora's (2019) article, How (not) to sign a JSON object[1].
In short, nesting trees and signing them is difficult and prone to pitfalls. It's easier if the envelope holds the message as a raw string, and the signing is performed on the raw string.
[1]: https://www.latacora.com/blog/2019/07/24/how-not-to/
Isn't the simpler conclusion here that one should look for the signature where it is supposed to be? Instead of using an excessively general XPath like "//ds:Signature" that might find any signature in any unexpected location...
Hot take, but for me the conclusion always was -- get a big stick and use it to prevent web developers from touching anything near your security sensitive code. Starting from design, protocols and data formats of it. The set of habits and design considerations simply doesn't match day to day practice of the usual web development. It's often the opposite of what you need to write normal code.
I don't think it's fair to blame the skill of web developers (although if they use javascript and leftpaddings they have it coming).
The nature of web software is 100 times riskier than anything else because of the risk profiles and 100% connectivity
Anyone who thinks a publically accessible web site is secure is insane.
I feel most responses to vulnerabilities are so lenient, you have to throw out some baby with the bathwater, you can't surgically remove the dangerous component, you gotta chop and throw chemotherapy en masse.
If you are an IT admin with any pride, SAML is out of any future plans. The idea of SSO is suspect as a whole. Xml parsing has been hit twice in a week, avoid it in the future, anything wrong with a policy that replaces xml with json?
> Xml parsing has been hit twice in a week, avoid it in the future, anything wrong with a policy that replaces xml with json?
OAuth 2.0 and its extension Open ID Connect have been around for over a decade. They have their own gotchas (like in badly defined ID token in OIDC and the ill-thought implicit and hybrid flows), but nothing there is nearly as dangerous as SAML.
Most applications support Open ID Connect now, but I'm still seeing organization choosing to use SAML out of inertia even when they are fully capable of using Open ID Connect.
For an organization of any significant size (say, anything over 10 people), not deploying SSO would be malpractice. The point of SSO is to have a single point of control and a single, mandatory 2FA stack.
Obviously, if you can avoid doing SSO with SAML, you should.
Parse this JSON correctly ```json { "data": "XXX", "sig": "BAD", "sig": "GOOD" } ```
In a security sensitive context, a parser should return an error on a duplicate key regardless what common parsers do and what the RFC fails to specify.
Implicitly, that means no security software dealing with json should be written in Go, Javascript, ruby, python, etc (where practically everyone uses json parsers that silently ignore duplicate keys)
Plenty of languages do have common json libraries w/ duplicate key errors, like haskell (aeson), rust (serde_json), java (gson, org.json, probably others), so there's plenty of good choices.
So yeah, correct parse result is '400 bad request'
For Java, I think you mean Jackson, not gson, unless something has changed recently. Goes to show that even the behemoths can get this wrong.
https://github.com/protocolbuffers/protobuf/blob/6aefdde9736...
I overwrite with the last one.
Strictly not a parser problem.
Csv is also available.
And binary protocols, with index based implicit keys are and byte length prepended to variable length fields. Those are the gold standard (see ip and tcp headers.)
Its kind of annoying to explain the vulnerability in a blog post and then omit the parser differential in question.
It is like writing the introduction to a story and omitting the climax.
The sibling comment's blog post <https://news.ycombinator.com/item?id=43374972> included the relevant detail: they were just doing (...//ds:DigestValue).firstChild.nodeValue without checking that .firstChild was a Node (in the offending case, it was a Comment). Thus, the non-canonical one saw the "masked" signature, the corrected one which tossed out comments saw a Node and when two implementations differ about a signed document hilarity will ensue
Are you sure that is the one for this blog post? i got the impression that was a different vuln for a different saml implementation.
Also using comments to bypass saml is very old news. https://duo.com/blog/duo-finds-saml-vulnerabilities-affectin... is a post from 2018 about it.
Evidently it's not the same, sorry; it seems that I lept to conclusions with the two signature mismatch vulns by ahacker1 showing up so close to one another but opening the very tiny, very dark, code picture shows this seems to be xpath-centric, not nodeType as the workos link discussed
I'm guessing they didn't want to be directly responsible for dropping a zero-day that allows authorization bypass in countless systems across the planet before the parties responsible for those systems have a chance to fix them.
I'm sure the specifics will come out sooner or later.
Don't use SAML, mostly because it uses XMLDSig. Don't use XMLDSig because it's hard to get usefully right and easy to get dangerously wrong.
I’m aware of the reputation of XML signatures, but it’s the first time I read about technical details, and they make my head spin.
Q: Is there any non-legacy reason to use SAML instead of libsodium’s public key authenticated encryption (crypto_box)?
Another Q: Is there any non-theoretical risk of parser differential when using libsodium’s cyrpto_box on one end and Golang’s x/crypto/nacl/box on the other end?
Wouldn't using crypto_box mean the developer would have to implement their own custom authorization mechanism from scratch?
i.e. it looks like a reasonably good way of exchanging encrypted messages, but I don't see anything in the docs indicating that it would provide the equivalent of group membership/roles/permissions.
Building something like that as custom code is a huge commitment, and could easily result in severe vulnerabilities specific to that system.
I was thinking you can strip out the retarded signature protocol in SAML (replace that with libsodium) and leave the actual payload intact, or even switch from XML to a simpler wire format like JSON for the payload. But maybe even the payload part of the standard isn't worth saving, I can't tell after reading a single article about it.
BlueSky post with a video showing the vulnerability:https://bsky.app/profile/ulldma.bsky.social/post/3lkbi6rasl2...
Saml is insecure by design. Others have said it better before me, such as https://joonas.fi/2021/08/saml-is-insecure-by-design/, but the quote I got from an old thread here was "Sign Bytes, not meanings".
Parser differentials are expected and even necessary. What you intend to get from a signed response is very meaningful. A dilemma in modern TLS is that sometimes you want to trust one internal CA; That's the easy path. Sometimes you want to accept a certificate from a partner's CA, and you've got multiple partners - and you can no longer examine just the end certificate, but the root of that chain is equally important in your decisions.
This is also why I recommend whenever possible against AWS Sig algorithms; V4 is theoretically secure, but they screwed it up twice - SigV1 and SigV3 were insecure by design, and yet somehow made it past design review and into the public.
Interesting vulnerability! It's a classic example of how seemingly small differences in implementation (REXML vs Nokogiri) can lead to significant security holes. Kudos to Peter Stöckli and ahacker1 for finding it!
I wonder how many other libraries are vulnerable to similar parser differential attacks. It's a good reminder to be extremely careful when dealing with XML and SAML, which are complex beasts at the best of times. As asmor pointed out, Github's SAML implementation has other issues too. It seems like SAML is just inherently difficult to get right.
Also, to the person who suggested not mixing personal and professional stuff in the same Github account: wise words! I've seen that cause headaches more than once.
This is an example of a parser mismatch vulnerability.
Related submission a year ago: https://news.ycombinator.com/item?id=38743029
Maybe asking stupid question, but would older versions of puppet be affected (like 6?). Also is there a site to check deps down to what maybe affected?
Try dependabot? But it's a tool by github, maybe better not to depend on a self reporylt
XML is to authentication bypasses what C is to buffer overflow attacks
You're selling XML short here, it had its own share of straight up RCEs too.
XML could really benefit from a standardized subset that cuts out all the unnecessary features and security footguns.
GMarkup[1] is pretty close to what I had in mind. If only it was more prevalent and had an agreed upon standard.
[1]: https://docs.gtk.org/glib/markup.html
I find that the "unnecessary features" and footguns are what makes XML, well, XML. I guess there must be some legitimate usage of those, or at least was back in the day. If you strip them out, you'd end up with a JSON-like (so you may as well use JSON).
No, you would have an extensible markup language. And json is not a good fit for markup.
Now, xml has also been used for a lot of things where a hierarchical format like json would have worked better than a markup format, of which SAML would be a good example. But there are also cases where a markup format makes more sense, like svg or docbook, or odf.
Or something like RON
https://github.com/ron-rs/ron
Sad that XML has too many features for an otherwise somewhat nice, but verbose markup language.
Some of it isn't explicitly XML's fault (although it doesn't help). SAML and especially XMLSignature are terrible standards even in ways that dont involve xml.
Feature are kind of a negative for security. Imagine if yaml was used!
I think there is a "safe" subset of both XML and YAML that 80% of people actually use.
which is exactly the problem. if you have two parsers of the same format in a security context that show slightly different behavior (maybe in the rest 20% or maybe not) it's often enough.
From a security perspective that's kind of useless, as your concern is not what the "good" people do, it's what the "bad" people do.
Well, you can define such a subset and write or configure parsers to only use that; I've seen both XML and YAML libraries do just that, by disabling remote file loading or arbitrary code execution for example.
Disabling xml remote entities and billion laughs is a given.
In the context of saml that's hardly the least of it. Lots of the problems are things like allowing comments to sort of change the meaning of the document, allowing signatures to sign only part of the document. Allowing multiple signatures to sign different parts of the document, etc.