Agreed! 1/1000 is pretty frequent considering how many pushes to GitHub happen every single day. Folks probably get a false sense of security thinking no one is looking at their personal repos (spoiler alert... they are!)
There is still a lot of noise with basic tools like this (I've also used trufflehog at scale).
To properly handle secret scanning requires calling live APIs to test if keys are "real". And you need to have a way to file tickets when you do have findings... if you rotate a cred from production, that's now an outage, so you need to coordinate multiple teams.
It's a lot of work and free tools only solve one part of this. I can't speak to any of the vendors in this space but I can attest that it's a harder problem than it seems!
Those are good points. Still, it’s fairly manageable, after certain adjustments. Also, we’re using the new (Go-based) version of TH that’s both much more performant and validates secrets against endpoints. I suspect their SaaS offering is a bit more polished and turn-key, but even the open-source one is quite decent. It doesn’t swamp us with FPs, at least.
I recently tried my hand in commercializing my open source project, gitleaks (http://gitleaks.io). I'm keeping the core gitleaks project MIT but changed the gitleaks-action on GitHub to a commercial license. Revenue from the commercial license and maintenance agreements has netted me much more than donations I've received over the past couple years. I encourage any open source maintainer to try and find a business model (plugin, dual license, enterprise support, etc) for their project.
Maintainer of Material for MkDocs here. Correct! The Sponsorware strategy is an alternative to dual-licensing which works fairly well for our project. If you have a successful Open Source project that is a pain killer and product-market fit, this strategy might also work for your project.
More and more providers have been adding unique prefixes to their tokens and access keys which makes detection much easier. Ex, GitLab adds `glpat-` to their PAT.
A project I maintain, Gitleaks, can easily detect "unique" secrets and does a pretty good job at detecting "generic" secrets too. In this case, the generic gitleaks rule would have caught the secrets [1]. You can see the full rule definition here [2] and how the rule is constructed here [3].
Assuming this unverified version of the story is true, the danger of accidentally leaking credentials in code is enormous and one of the reasons I continue to maintain and develop gitleaks. Those credentials[1] would have been caught by the gitleaks' generic rule [2]
Fantastic tool. We all know that _we_ wouldn't leak keys, but we have all been the person to 'rm -rf /' or 'delete * from prod where 1=1;', so it's just a matter of time.
Is there a plugin that streamers could use to blur suspected keys on stream? Would that be something interesting to work on do you think? (I'm not a streamer but it sounds fun)
When I was looking into the streaming side of things I set up an overlay image which could be toggled with a hotkey to hide my screen (it actually also hid my desktop scene too in case the image didn't load or whatever)
My main precaution though was separating dev/prod and never looking at prod stuff online. Worst case someone could spin up some guff in my dev/test account until I can cycle the credentials
In my case the separation also included a different system user on my computer for stream work. Possibly overkill but why risk it when the costs are so low?
I can't see myself trusting a key blurring app if I'm honest. Rather fix the issue earlier in the process than rely on something that would probably break on edge cases (word wrap enabled? Here's the key but it's in two parts, that sort of thing)
I think it would be a good tool to have, I had to contact a conference organizer once who switched tabs while sharing her screen in a recording and revealed a note in Google Keep that read "LastPass master password" xD
It doesn't help that so many tools are like "give me your secret key in plain text in the config file" without at least offering a link to a webpage on the github of how you could secure your keys and use this software
Vault is not just a drop in and go system - setting up a vault instance is an ordeal in and of itself, and the pricing for vault on hashicorp cloud is incredibly expensive. The problem with the other options is that you have to get the secrets into environment variables, or out of github/lab secrets and into your application. To use most of the services like AWS secrets manager, vault, etc, it will cost you more to manage secrets than it will to host the app on a small DO droplet for example.
Self-hosted Vault within a minimum Kubernetes cluster in GCP costs us roughly $35 a month. Maintenance effort can be neglected if not scaling. Vault has its learning curve there but I think it's totally worth it, given its secret management and API-first features integrated with many other DevOps tools.
If anyone’s looking for something more secure than vanilla env vars but simpler than Vault, you could check out EnvKey[1]. Disclaimer: I’m the founder.
It’s end-to-end encrypted, cloud or self-hosted, and very quick to integrate.
How were the words selected for the regex? It's interesting that "pass" is not there and breaks detection in your first link, but I assume they were chosen based on the statistics?
`pass` by itself might introduce false positives. `passwd` and `password` are common and more likely to be in the ROI of a secret. That said, I'm not opposed to `pass` by itself. I'll have to think about this one...
> but I assume they were chosen based on the statistics?
Nope, not statistics. Identifiers and keywords are chosen based on what I see out in the wild being a software engineer.
Should be fixed now: https://github.com/gitleaks/gitleaks/pull/1292. Thanks for highlighting this simple change I've been putting off :)