Open Source authors [1] [2] (including myself) have complained of automatic security scans. They yield way too many false positives, increasing the burden of maintaining repositories. Specially troublesome are when e.g. the "vulnerability" (if it's even one) is in a devDependency that is not deployed to production.
In theory automatic vulnerability scans sounds great, but having every repo ping you with not-actually-an-issue becomes a chore very quickly. So far the vast majority of vulnerabilities I've seen are actually noise/not applicable. If this code checker is actually good, unlike all of the previous ones, that's another thing and might actually be a game changer.
Prominent open source authors have often suggested ways that GIthub can help but seem to be ignored, e.g. allowing to add friction to opening random issues would benefit open source greatly. At some point many beginner devs migrated from StackOverflow to Github because their really bad question were being closed there, and now they just overwhelm open source authors.
Microsoft has used products from Semmle (the now-GitHub and thus now-Microsoft division whose tech is in GitHub code scanning) for a few years, and I've personally used it on occasion.
From that limited experience, I'd say that false positives are less of a problem with Semmle's checkers than with other security-focused static analysis tools. This is partly due to Semmle checkers being much more customizable; Semmle has developed a declarative query language called CodeQL* which its checkers' built-in and user-provided rules are written in. Microsoft's security development lifecycle has a lot of mandates which are captured by custom CodeQL rules precisely enough to match their intent.
There's a genuine security fatigue issue (much like event fatigue) that comes from false positives. Unfortunately that doesn't reduce the value of the scanning - the onus is on the false positives.
At the very least, running and pruning scans should happen on projects so that at least we can have the conversation. It's like PCI (as an example, not an ideal); PCI isn't perfect, but at least it encourages a conversation about security. Today we're at the point where almost every single organization at least discusses security; but I remember when PCI first came out. I can't tell you have many times people used to ask why it was problematic to store passwords in plain text.
It this the best step forward, probably not. Is it a step forward, absolutely.
That's only true based on a certain level of care, which sadly is not always the case. Say you have a codebase that wasn't built well. Let's say your code scanner finds you at the very least the text "-----BEGIN R/DSA PRIVATE KEY-----*-----END R/DSA PRIVATE KEY-----". In my codebases that'll find nothing, but for many people, they just found SSH keys committed in plaintext. The same is true for any 1k/2k (++etc) string. It highlights what could be a private key and worth at least looking at.
Neither is proof of a vulnerability. Both I will gladly, always 100% validate.
Which is of no issue of they're just there for easy mocking etc.
I agree with the parents point, if there is value in such tests, it's so miniscule that it dwarfs the added effort to filter out all the false positives if the developers aren't total newbies. And if they are, code reviews by more experienced developers should be a given, making it redundant as well.
Sure, but the reported issues are usually not nearly that severe.
Most of the time what I see is
"A dependency of a dependency of a dependency of Webpack is vulnerable to a Regular expression Denial of Service attack" or prototype pollution or something like that.
Webpack generates code which might be loaded by users and additionally it's very common to run it on a CI server (which can sometimes have network access out to other machines at a firm).
In general, it's quite strange to me that vulnerabilities in `devDependencies` are considered less important than those in `dependencies`. These dependencies are generally for tools that are run within your company network, and contrary to what people insist, that seems quite risky to me.
What would be the attack vector in this case? Assuming that it's a vulnerability rather than an actually malicious package, how would an attacker exploit something running on your CI server? The only way I see is if they already have the ability to modify your source code, at which point it's of course already game over.
"It's already game over", a favorite phrase of many, but a sign you are giving up before the race has even started. Security and insecurity theater at once.
You don't install your dependencies' devDependencies, so by definition 100% of the devDependencies on your the libraries you use are not a concern! I'm talking as a library author here (thought that was clear?), when a random vulnerability scanner marks one of my devDependencies as having an issue, that in principle won't affect the people using my library (except in some very very rare extreme cases).
Also, many times those "vulnerabilites" are not really applicable because the application code can not trigger it if it never ends up in the vulnerable code path (for example when a vulnerable regex is never passed untrusted input).
I imagine those cases could be detected using static analysis but the current vulnerability reporting tools that do not even look at actual code are completely useless in that regard.
How many of your customers, who can detect these same vulnerabilities, care about whether or not it's triggered? What about the fact that these are all latent bugs waiting to be hit with an approach like this?
It needlessly accumulates risk when you prioritize explaining why things don't need to be fixed rather than just lowering the bar to fixing them
Github's notification system is incredibly spammy if you have a lot of repos, and there's no obvious way to manage it. There's also a huge need for an "unsubscribe all" in the notification inbox.
There's also not a severity indicator - some minor issue not encountered in normal use is just as noisy as an extremely important issue that affects every user.
Other tools like Jira and Gerrit are far better at this.
The GitHub notifications story is really quite poor for developers, it's extremely difficult to get an alert routed only to the person who triggers it.
That's actually a very interesting point in regards to switching from StackOverflow to Github - I have noticed the trend that what I could normally find on Stackoverflow, I now often find on Github issues
Seeing many people surprised or like "finally someone said it", I thought this was very common knowledge? I might have been in some specific communities at the critical point 2-3 years ago, where bootcamps and other educators would advice new devs not to go to StackOverflow but instead go to Github. It didn't occur to me back then that the consequences would be so bad.
At some point many beginner devs migrated from StackOverflow to Github because their really bad question were being closed there, and now they just overwhelm open source authors
I'm glad it's not just me seeing this - My repos aren't even that popular and some of the issues just seem to be "help me build my project..."
I don't mind help in finding these problems, what I LOATHE though is when people trust the tool more than me and (for example) prevent me from pushing something that disagrees with the tool. So somewhere I imagine some manager will force their devs to make this tool happy and that's wrong.
It is tricky. I would say the perfect tool still needs to be developed.
And I agree with you. Tools need to get better in detecting dependencies. However, it can be helpful to know at least that you rely on vulnerable code even if it does not get deployed to prod.
To be automated, security scans are like unit tests. Just because a unit test is green does not mean your app is working and vice versa.
So security scans are more like test levels: unit tests, integration tests, and end-to-end tests. Different scans, different results, and it takes us, humans, still to put it in perspective.
> Open Source authors (including myself) have complained of automatic security scans. They yield way too many false positives, increasing the burden of maintaining repositories.
Unfortunately all tools have either false positives or false negatives, and in practice often both. Tools can (and should) take steps to minimize them or their impact. Nothing makes you use any specific tools; if you don't like what a tool does, don't use it.
> Specially troublesome are when e.g. the "vulnerability" (if it's even one) is in a devDependency that is not deployed to production.
This particular GitHub tool is for analyzing source code, and would not not normally analyze your dependencies. So that doesn't seem relevant in this case.
Of course, someone could mindlessly use this tool (or look at its results) and complain. That's easy, just ask for funding to fix the problem, or at least a pull request that fixes it.
We have this problem with our non- OS project but found that after the initial review (of which most were false-positives) we could permanently suppress them with a comment and fail the build for net-new work without a huge impact on developers.
> A lede is the introductory section in journalism and thus to bury the lede refers to hiding the most important and relevant pieces of a story within other distracting information. The spelling of lede is allegedly so as to not confuse it with lead (/led/) which referred to the strip of metal that would separate lines of type. Both spellings, however, can be found in instances of the phrase.
As far as I can tell “lede” is entirely a relatively unpopular neologism being unnecessarily pushed, and the phrase spelled “bury the lead” outdates it by almost 100 years?
Lede is just lead spelled incorrectly with the same meaning.
Why? How does vulnerability detection for PHP hurt you?
They stated in the comments here that they are actively working on it, and I would personally appreciate it in addition to the number of other static analysis tools I use.
actually, i'm very surprised that PHP isn't in the list seeing how WordPress account for 75%+ of the sites internet and the slew of plugins in its ecosystem.
Recently I learned in a conversation [0] (about SaaS in general, not GitHub in particular) that you passing is actually the desired outcome and it's by design.
So, I guess, "well done"? (it hurts a little though, I'm too in the camp of wanting to see the pricing beforehand)
I think the real question is if their pricing design is optimal. Would they make more money with clear pricing? I think so.
One of my ancestors had a company selling commodities. He wouldn’t answer the phone until the customer had called three times and left messages. He said this was a filter to identify the customers who really needed his product.
The logic is sound and on the surface clever. But would he have made more money servicing all customers? Or perhaps marketing?
I'm sure it's tough to tell, and the side with less risk is the one that often costs fewer resources.
My dad runs a shop restoring classic cars, mostly as a hobby. The number of minutes you can spend on the phone discussing some potential client's great-grandfather's old clunker and their dreams / aspirations of getting it fixed is nearly limitless, but the number of clients willing to actually pay and wait for it is a tiny fraction.
Similarly I looked into having a hair transplant done and found most places near me actually charge for consultations! It probably makes sense, though. The market of "guys that want to be less bald" is gigantic, but the ones serious enough to plop down tens of thousands of dollars for it is much smaller. Narrowing that group down made for a consultation that was much more personal and focused than it'd have to be if they offered it for free knowing 95%+ will never be seen again.
I agree with both of you. The level of business that grants a "write us to know prices" message must be well above the small to medium customer... but still it itches when you're interested and realize that their product is not directed at you at all.
We could argue that they would get more money, but at the end of the day they probably decided that a single bigger fish is definitely worth losing business from a hundred smaller ones. I guess it's one of those "nice problems to have", if you grow so much that you have it.
Well considering how much money we recently spent deploying CheckMarx and integrating it into all our pipelines (hundreds of man hours of engineers on 6 figure salaries + a 6 figure licensing fee per year) that is quite expected.
If you're in an organisation that's are already using GitHub and this scanning capability is as good as that of CheckMarx, Snyk, etc. then it would be a no brainer to upgrade your Enterprise GitHub plan (if you're not already on Enterprise).
We need a crowdsource information for this problem. Imagine a volunteer ask them the price and share it to the world. Glassdoor but for "sales price". Package it as a browser extension. Everyone becomes happy.
What’s frustrating is that it’s not part of the $21/month. I have that and have been trying to get pricing info for a few weeks. I’ve gotten mixed messages that it costs nothing extra and just uses Action minutes on their price schedule or that it costs some unknown price that is extra.
My impression is that they haven’t picked pricing yet.
It frustrates me when the price answer is “contact sales and let’s talk about it.”
These marginal services really depend on the price, I think.
They are still trying to determine pricing, and figure out how to position themselves against competitors (who are about to have thier lunch eaten if gh is smart)
I wonder if the problem is that the marginal cost is near zero because the rules are mostly open source and there’s no special infrastructure.
So the problem is that they are trying to figure out the “value” instead of giving it out for free.
What’s interesting about gitlab’s sast is that it’s really mostly just curating open source tools and using their existing CI stack. So it’s possible to just look at their code and set up, but a hassle.
I’m hoping GH just gives it away for free. Or does some sort of sonarqube method where you layer on indemnity or something.
The problem is that per user pricing sucks because most users don’t care. The PM or maybe one or two are interested, but paying for 400 users with licenses to please 5 audit people doesn’t make sense when it’s really just being run by a single Jenkins robot (or equivalent).
Hmm. It says "included with gihub enterprise" on the product page, then on the pricing page it says $21/user/mo. Github has done a good job on making something complicated simple... but this appears to be the opposite... making the simple complicated.
>In the not-so-distant future: Code snippet scanning for copyright infringement
Will never happen. Can you imagine what would occur if Github started harrassing private repo owners for including GPL licensed code? Or automatically making them public? All their customers would bolt immediately. Copyright infringement is Github's bread and butter.
Stack Overflow attribution is equally unlikely. The same group that says Oracle's claim that Java's API is not original and unworthy of copyright protection, cannot then turn around and claim 30 lines of code from SO is original and deserves recognition.
Stack Overflow attribution is equally unlikely. The same group that says Oracle's claim that Java's API is not original and unworthy of copyright protection, cannot then turn around and claim 30 lines of code from SO is original and deserves recognition.
I would bet most stack overflow answers don't qualify as copyrightable, at least in the US. Though I think automatically finding copied stuff would be very useful. Any time I copy something small I try to include a link to where I got it from, if someone has to troubleshoot my code it may help them to see where it came from.
> Can you imagine what would occur if Github started harrassing private repo owners for including GPL licensed code?
Why would Github do that? It's perfectly fine to not distribute code you received under the GPL. The license just says that if you do distribute the code in binary, you must provide recipients with the source as well.
Many companies pay good money for scanning services like that, so Github adding it to their Enterprise offerings would make sense. (No clue how you get to "harassing" and "automatically making them public" from that suggestion...)
It is. The authors own the content but must license it to SO under CC-BY-SA (the version having changed over time: https://stackoverflow.com/help/licensing ) and SO then publishes it to readers under that same license
Hopefully GH will apply this to Github Actions, which seem like an exploit vector just sitting there.
GH Actions are:
- complex enough to develop that third-party solutions are attractive, especially for simple-seeming tasks (I have just been through this)
- but yay there's a "marketplace" for actions!
- the code in the marketplace is the wild-west, but you'll find something that seems like it'll do what you want
- you'd better hope the code that was committed to be executed is actually the compiled source code (if, eg, it's based on the official Typescript example and committing some [com/trans]piled JS blob that is what actually gets executed)
- it has access to your code, maybe including write access
As far as I can see they also have access to your secrets. I really haven't gotten myself to use any non-GitHub-defined Actions yet, because I haven't been able to verify if that's actually safe.
We're adding Ruby support to CodeQL (the scanning engine used in code scanning by default). It's our top requested language, and one we use extensively internally. Adding each new language to CodeQL takes about 6-9 months and needs a team to maintain it in perpetuity, which is why we don't have it yet, but we're starting that work now.
The other languages we hear the most demand for CodeQL support on are PHP, Kotlin and Swift. We'll get to all of those - it will just take a little time.
In the meantime, all of the code scanning experiences are extensible, so you can use other scanning engines with it, like Brakeman for Ruby.
CodeQL is based on an existing product from a company called Semmle which GitHub acquired in late 2019 [1]
They have been part of GitHub for barely a year so it's not too surprising, especially given they are continuing to support the product for the enterprise customers they had previously not just GitHub.
>Ruby is no longer a "Default supported" language in many projects.
Well it has always been like that. Amazon and Google has always had a thing about Ruby, and it is a minority market so I am not surprised and it doesn't make sense from Business perspective. But Github is a heavy Ruby users so I would have thought Ruby would be a first class citizen. I wonder if it has something to do with the language complexity.
We (GitHub) absolutely plan to expand the list of languages CodeQL supports, and Ruby is a language we'd love to add (we're heavy users of it internally). In the meantime, because code scanning is extensible you can plug in third party analysis engines to scan the languages that CodeQL doesn't support.
Code scanning itself isn't language specific. There are open source and commerical tools for many languages available. CodeQL is GitHub's first party tool, and it supports C, C++, C#, Go, Java, JavaScript, Typescript, and Python.
> This open-source repository contains the extractor, CodeQL libraries, and queries that power Go support in LGTM and the other CodeQL products that GitHub makes available to its customers worldwide.
The pretty well describes pretty much every static analysis, automated security scanning tool I've ever used. They generally work ok if you use a common framework in a standard configuration using a standard project layout (once tuned properly to filter out false positives). Once you get to that point you can actually pay attention to the alerts and may even catch some significant issues.
Isn't this already done by Dependabot[1]? I've been using it for some time with my JS/TS repos to keep my dependencies up-to-date. It's not the greatest as it sends alerts about vulnerabilities in devDependencies. But with automatic merge checks I only get an email that the issue has been fixed so it isn't the worst thing in the world.
- Dependabot looks for vulnerabilities in your dependencies, and creates pull requests to update you to fixed versions.
- Code scanning looks for vulnerabilities in your own code. So, for example, if you have written code that takes user input and creates a database instruction from it without escaping it, it will flag that you are introducing an SQL injection vulnerability.
(As an aside, we could definitely improve Dependabot to treat devDependencies differently. You do need to care about vulnerabilities in your devDependencies in _some_ cases (code exfiltration is the obvious one) but not in many - we should to get smarter about distinguishing between those cases.)
Hey totally off topic sorry. Can you get someone to turn off pull-requests for the unofficial mirrors that you guys created for some open source projects? Users are being mislead into thinking opening PRs there is productive, but they're not monitored by our project and we don't own the repo anyway. https://github.com/wine-mirror/wine/pulls
I've tried contacting your support, but they just tell me that they don't own the repo, which is obviously false[1]. I don't know who to reach out to.
You may take a page from FFmpeg: https://github.com/FFmpeg/FFmpeg/pulls (although currently open PRs probably need to be closed somehow for that to be effective).
Thanks, that makes sense and I probably should have looked into the article properly.
I'm not sure, how easy it will be to filter the devDependencies. Maybe scan for config files (webpack, babel...) would be a good alternative to manually tag each npm package.
Maybe this is just an issue with my phone, but I found the embedded demo video frustrating to view because it’s 1) low quality / hard to read and 2) in a loop that can’t be paused. I watched it 3 times trying to read through the code each time then gave up in frustration.
So I was researching Snyk they raised $452m which is absurd amount for small startup and they provide "security analysis tools used to identify open-source vulnerabilities."
I hope this GitHub feature bring their valuation to the ground and their investors to the reality.
I’ll look at this, today. I have bunch of OSS repos, and, even though I might feel smug about my quality and security, I don’t think it hurts at all to shine a klieg lamp on it.
I’m fairly aghast at the state of things, and every bit helps.
I do love that cartoon (its the one where all internet is a stack of Jenga blocks, clearly depending on one block labelled "OSS project some random in nebraska has maintained since 2003".
This sort of project (ought) to level that playing field out. It ought to be clearer to us all who depends on what, and that clarity ought to give true leverage to the valued authors, or possibly point out the OpenSSl like risks we are taking.
(In fact regulators will one day wake up to just this risk, and then this will be a really valuable area to be in)
I found this GitHub feature very useful. I love it. In my 30,000 something line repository it gave me eight code scan alerts of which seven were useful and had like specific coded up example workarounds how I could fix them, and I followed all the advice. And one was not relevant because it was a shell script that I don't use anymore, but there's no way the code scan could know that I don't use script.
I don't have experience of the security fatigue and stuff that other people seem to be talking about. Maybe I just write better code, or use fewer, and fewer problematic, dependencies? ¯\_(ツ)_/¯
Anyway I think this is a really cool feature and I'd love to see more of these sort of value added and free features on top of public repos. Is there a place where like you can create your own like a marketplace or something?
In theory automatic vulnerability scans sounds great, but having every repo ping you with not-actually-an-issue becomes a chore very quickly. So far the vast majority of vulnerabilities I've seen are actually noise/not applicable. If this code checker is actually good, unlike all of the previous ones, that's another thing and might actually be a game changer.
Prominent open source authors have often suggested ways that GIthub can help but seem to be ignored, e.g. allowing to add friction to opening random issues would benefit open source greatly. At some point many beginner devs migrated from StackOverflow to Github because their really bad question were being closed there, and now they just overwhelm open source authors.
[1] https://twitter.com/sindresorhus/status/1123986529498664961
[2] https://twitter.com/FPresencia/status/1311551520689713152