Hacker News new | past | comments | ask | show | jobs | submit login
I analyzed Stack Overflow for secrets (matan-h.com)
223 points by matan-h on Nov 17, 2023 | hide | past | favorite | 80 comments



Reminds me of this hilarious bug bounty:

1. Person reports some vuln in HackerOne itself to HackerOne

2. A HackerOne employee tries to reproduce it, and unknowlingly copies and pastes his/her cookies into the HackerOne report

3. The reporter takes those cookies, and logs in as the HackerOne employee

4. The reporter files a new vuln report "You are disclose for me you session. you are gevi me your session on last report. I am can use your session(sorry)"

5. $20,000 bounty

https://hackerone.com/reports/745324


What a great and hilarious story. Thanks!


That seems to kind of go against the spirit of doing the work to find a vulnerability. It's basically social engineering. Do you get bug bounties for that?


I wouldn't call it social engineering, because the reporter didn't intend to get the cookies while filing the first report.

It's like the Github scanner that reports leaked tokens.


What's crazy is that the reporter previously filed a bug report about hijacking sessions and then it comes full circle during a different report.

That's karma


The spirit of HackerOne is to encourage hackers to disclose rather than exploit for the reward of money. It makes a lot of sense that they's pay generously as a public statement to any hackers that find vulnerabilities on their systems.


I'd argue it's with the spirit, it's just that the vulnerability resides within your employees rather than your systems. Both are worth a call out and correcting. It's arguable how much either is worth, that being said.


Best part is the guy has a -5 rep.


The question remains, how many of these "things that look like secrets" are actual secrets, and how many are 'password="[password]"' or 'password="12345678"' (where 12345678 is not the actual password)? Going by the only category they took a closer look at, there are not that many actually "actionable" secrets...


Gitleaks regexes are fairly accurate. For example, the regex to find a GitHub PAT is "ghp_[0-9a-zA-Z]{36}" which mean it has a specific number (36+4) of characters from specific group (alphabet+number). And I try to filter out the obvious non-secrets (like 'abcd','xxxx' and '1234'). However, as I stated in the article, most of the data is not actionable: most people just revoke the token, use an old one, change some random letters, etc.


I think the real take-away is that StackOverflow does not have key detection like GitHub does.


I see real-looking keys posted to SO at least a couple of times per week (stuff like Twilio and Stripe keys are the most obvious as they’re tagged-strings; followed by GMail SMTP creds; I edit them out and flag the posts for the mods, as one does). Granted, most of the time it’s just some kid who doesn’t appreciate what secrets are worth keeping, or wasn’t paying attention when copying+pasting into their post, but every so-often I see secrets in a post from what looks like an outsourced worker assigned to a “real” business, with very real things to lose - and I get depressed from wondering how modern society even holds itself together given the scale of incompetence I witness first-hand…

(Fun-fact: the next SMS text-message you get from a major chain informing you on an upcoming appointment was likely sent to you via Twilio from a desktop client with a hardcoded AccountSID and AuthSecret strings shared by all 20,000 (multitenant) users; Don’t ask how I know, but it’s depressing; I do report these things (anonymously) to the vendors but then receive a reply from a non-technical manager accusing me of “hacking”. I haven’t yet reported them to e.g. Twilio directly because I don’t want Twilio to revoke their creds and cause potentially hundreds of thousands of people to not-receive essential comms from those tenants. Le sigh…


There was another thread a few years ago where someone suggested reporting to US-CERT or another CERT. It has some advantages like "they know what a credential leak is", "they know that people reporting security issues aren't necessarily malicious", and "they sound official when they try to get it fixed". And "your name will no longer be on the report".

I haven't had occasion to try this myself, but it sounded like good advice!


Has GitHub open sourced their key detection?


No, Microsoft is keeping all of that stuff under the wraps. They have a "secret scanning partner program" where they allow companies to have a endpoint GitHub can use for figuring out if something is a secret or not, so it's not just a library with a bunch of regex, seems like a service in itself and Microsoft doesn't really open source stuff like that.



You are correct. Though, speaking of regex, they work with partners to create the most accurate regexes possible using non-public information like expected entropy or checksums.


That's surprising (to me), because the enterprise custom scanning feature only supports hyperscan-flavoured regex.


Sorry, I should clarify that some of those things are _in addition_ to regex. You are correct that it uses Hyperscan to find initial matches, then their first-party patterns go through some additional local processing magic.

(This is my understanding based on conversations with people working on the secret scanning feature at GitHub, I don't have firsthand knowledge.)


But... But... I thought Microsoft ♥ open source?


Yep. The relevant parts from the article:

> ... I run a simple scan ... against all the 74 real looking GitHub user tokens ... and discovered that 6 of them are actually valid.

> ... only 2 of them actually have bio and email, but one of them (a c/c++ developer) has a repo with 3.4k stars ...

> I obviously couldn’t verify all the secrets. From most of them I’ll probably be banned, so I stooped here.

As an alternative to manually testing the credentials (and risking bans), I wonder if any organisations would agree to test the credentials for you if you sent them a list of suspected leaks. If the organisation doesn't tell you which ones were valid (and takes responsibility for revoking/notifying), I don't see much room for abuse. Might be hard to convince the organisation of that though!


GitGuardian has the ability to passively and non-intrusively verify credentials as well.

Public patterns for sensitive and highly used credentials have a lot of false positives, because they are overly broad. Internal knowledge about token structure that would reduce this isn’t something companies give out willingly.


> Turns out, most of it is useless: For using most data, you need more information than just the api key.

I think you mean low effort attacks.

A determined attacker would attempt to gather more information, for example research the author of the post. Many of the authors give enough clues so that you can identify a person, even if comments are written using different handles.

Also some of these secrets go in pairs with something else that is enough to get a successful auth. For example, AWS secret usually goes in pair with everything you need to connect.


Reminded me of a funny story. Maybe a decade ago, when moving to the cloud was all the rage, my then employer decided to check whether the cloud was any good. Long story short, he asked me to conduct penetration tests against the major providers. In one of the providers I pivoted through some network and hit a webpage that looked like some sort of control plane panel (but required authentication so...). I decided to google part of the HTML and... A stack overflow thread pops up with the code and parts of the backend code/logic. So much win.


> he asked me to conduct penetration tests against the major providers

That sounds madly illegal?


Most providers had a semi-automated process that granted you permission to conduct your pentest (assuming you'd share any findings reg. their infra with them). In reality though, most of the findings didn't come from poking around but from tapping the wire. I'd spin up VMs and tcpdump for hours, then look at the logs for odd packets, plaintext etc. etc. which makes it hard to detect such shenanigans

Edit: We went through the process for everything, including having a provider ship us a back-up solution to pentest. My desk became everyone's favourite place in the building :P


Knocking on someone’s front door and noticing it’s unlocked is perfectly legal. It’s actually walking in that’s illegal.


And at least in England, trespassing is not even a criminal offense afaik, just a civil one - and the owner will have a hard time winning that case too, without very explicit signage.

Unless one helps himself to the house contents, or does other Bad Things, walking through unlocked dwellings will get you at most a slap on the wrist.


Outside of the cybersecurity analogy, as an American, that's . . . very disturbing.

Much like someone open carrying a gun is seen as potentially a few seconds away from committing a Very Bad Crime, so is someone walking around your house uninvited.


England has some weird (to me) property privacy laws. IIRC, you cannot be charged for simply walking through someone's property as a shortcut. There's nothing they can do about it, you just can't linger on the property. I mean, it seems fine, I just haven't seen anything like it before.


It's the system throwing a bone to the general populace in order to maintain an extremely unequal order. Aristocratic landowners mostly do what they want, and there has been no land reform for centuries, so a few concessions were thrown in to allow peasants to make a living somehow.


Well cutting across someone's yard != walking through their house. My friends and I growing up would sometimes cut through neighbors' backyards to go somewhere, and while we didn't have formal permission, no one cared because we knew each other.


I don't the know the situation now, but in the UK you could break into an empty place, then change the locks, and from that point on they could not evict you without a long process involving going to court. There was (is?) a huge squatters community because of this.


From the story of the GP, and extending your analogy, this is more like if they walked into the house and found the safe and noted it was locked, so looked up the safe schematics online.

Not exactly legal.

But even stepping back, I suspect walking around and jiggling random peoples’ doorknobs to see if they’re unlocked is probably illegal.


It’s funny how often this works, there’s a ton of copypasta code in production out there.

I do some bug bounty hunting for fun, and just yesterday I Googled a weird snippet of frontend code from a major corporation, found the matching backend code in a blog post, and saw a bug in it. Alas, not a bug that could be used for anything interesting this time.


The pie chart has multiple segments with the same colors, how is one supposed to parse it?

I do not understand how one can write an entire article about a set of data and then not present it in a way that is comprehensible?

Edit: Just discovered that hovering over the segments of the chart will bring up a tooltip with the name of the segment if JS is enabled, but this is not obvious to readers and I still don't think this is a good way to present data.


A pie chart, while fun, is seldom a good way to present data at all.


Obligatory: "What do you mean I'm not supposed to use Pie Charts?!" https://www.geckoboard.com/blog/pie-charts/


Even if they didn't repeat, that's just way too many colors for me to be able to tell which labels are for which segments. Anything more than a handful of colors should have arrows or labels next to the segments or something like that.


Yeah, a bar chart would have probably been better in this case, then you would have been able to see the number of secrets at a glance by looking at the y axis.


I have often wonder how many passwords do security cameras capture when people type then on their phones or laptops, also stuff like one-time codes or credit card numbers and so on.


One time codes are very short lived and for many systems as their name suggests they are single use.

So, knowing 8490 worked for me is often entirely useless immediately, and if not it'll be useless within say 10 minutes.

You might think if you collect enough of them you'll be able to guess the next one. In principle that's true, but for all real systems you're fighting an actual cryptographic hash in there somewhere, so it's like you decided second pre-image attack on the hash (much harder than collision) wasn't difficult enough, you want hard mode.


A nice example was on 2018 when Kanye unlocked his phone while being filmed on live TV.


What frame rate do security cameras run at these days? Last time I saw genuine recorded footage was 1999, and that looked like 1 fps.


That's about the speed average typist types. /s


One finger per second?


Typing numbers or passwords for people over 30, I bet is around half a second, and for people over 60 one finger per second sound about right.


now feed that video en masse to GPT Vision


as long as you'll pay for it


Looks like the tone of my post above was misread.

I see photos being automatically OCRed by default now ... in Google Photos, in Microsoft SharePoint etc.

I am just imagining that in future a GPT Vision processing of all videos ever captured might become similarly prevalent (and cheap).

Which would open up a very easy attack vector / leakage of secrets ... like phone unlock pins and passwords.

I am in the same boat as the comment I replied to -- concerned about new risks that new tech might bring to the table.

My post was just "extrapolating" the risk to a larger scale -- not just one video clip being manually decoded, but all footage automatically spitting out secrets as simply as a Closed Caption / video transcript of today.


It could be nice if there was some clear general convention on a string format for secrets, e.g. `secret_<string>`, such that e.g. system copy paste facilities, email clients, chat clients etc. could provide an extra "Do you intend to share this secret?" step, ideally before even pasting it into a program, and especially before sending, to help prevent you from inadvertently exposing it.


https://datatracker.ietf.org/doc/html/rfc8959

Although I don't think the actual proposed syntax is good.


Thanks, didn't know about that RFC! Will keep it in mind for such an occasion.


I've thought about even taking this further: Adding a domain to secret. e.g. secret:example.com:abc

Then example.com could host a /.well-known/secrets.json which would include information on how to automatically report and/or revoke a leaked secret.


>"Do you intend to share this secret?"

Please let Clippy just die...


It would be a nice feature if StackOverflow would blank out (####) patterns that potentially match passwords or at least offer this if their system detects a potential password in your post.


Hmm. Fix the easy stuff, the low hanging fruit and you filter for the worse problems and get some false positives for free.

When it comes to leaking secrets, don't trust tools. It's hard, it's human, it just happens.

As for what StackOverflow should do — make it easy to fix leaks, which they do a good job at. Ie. users can edit or delete answers, comments after posting them. Even better if there's a means to create confidential back channels with poster or admin if you spot a potential leak.


You can’t “fix” a leak, you have to revoke the involved secrets. Editing / deleting the answer is irrelevant.


I agree. I asked for a feature that warn users before they post secrets: https://meta.stackexchange.com/questions/394710/feature-requ...


J, K, APL and other coders would hate this.


Perl would become just as hard to ask questions about.


Why? :)


Because this is a valid line of J:

    s1a=: 0j2 %~ ^@j. - ^@-@j.


hunter2


I hope you handle this antique meme with care, as it's become quite fragile over the decades.


What did you write? All I see is ****


He said *******


I put on my robe and wizard hat


> gitleaks : fatal error: runtime: out of memory

Should be fixed now: https://github.com/gitleaks/gitleaks/pull/1292. Thanks for highlighting this simple change I've been putting off :)


Thanks you for fixing it I`ve updated the blog. now my simple rust script is not that helpful :)


> I sent both developers an email...

This was a pretty nice thing to do. I see this on SO occasionally and edit the post to remove the secret. While the secret is already out there, it signals to the poster that they should revoke/regen the key and a bit of a reminder to help them avoid doing it again.


> For example, for stripe , you need the customer ID. For grafana, an instance url. For aws, a site url.

Uh, no? For stripe, you just need the key (customer ID is the ID of a resource in the account). For AWS, you need the key id and secret, but these are nearly always colocated if you're actually dealing with keys.


> ripgrep : freeze my system after few minutes.

What OS is this person running where this is a possibility? Outside of bad regexps with e.g. large capture groups, a simple grep over a large bunch of data shouldn't cause any issues and should use a steady amount of memory.


Yeah I don't get it either. The only thing I can think of is that ripgrep saturated their I/O bandwidth and that led to other effects.

Otherwise statements like "taking too much cpu" seem quite strange to me.

If the OP could provide more precise reproduction steps then I'd be happy to look into it.


FYI presumably the name of one of the developers gets leaked through the coffee lik.


You can choose to not make the donation public, but perhaps they didn't know there was going to be a blog post. I noticed as well and sent a coffee to kind of disguise them a bit :D


Nice try, Mr. I-post-my-secrets-on-stackoverflow!


Hahaha, I believe in transparency so here's my mother's maiden name, the name of my first pet, and the street name of the house I grew up in.


Good job on letting the affected people know, at least the ones you could contact!


What an easy way to get a free coffee! :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: