Welcoming Semmle to GitHub

eatonphil · on Sept 18, 2019

The linked blog post [0] and the new security marketing page [1] both have a little more detail on what this actually means.

Basically, Semmle offers a static analysis tool that operates on your source code as a graph (from what I understand) and points out bugs and security holes in your code. Github is now offering that for free on repos at all tiers.

[0] https://github.blog/2019-09-18-securing-software-together/

[1] https://github.com/features/security

DannyBee · on Sept 18, 2019

Semmle is basically datalog over source code.

For what it works for, it works nice. But it is not a pancaea.

Security vulnerability finding is almost certainly the wrong target for Semmle - I am unsure why they are trying to push that angle. There are much better stories in things like refactoring and understanding. (I say this having overseen a number of deployments for various reasons, some successful, some not)

muricula · on Sept 18, 2019

I've seen coworkers run semmle queries across the entire Windows OS codebase and find hundreds of issues which were/could result in security vulnerabilities. They've also leveraged it for variant analysis. If I'm not mistaken, the security teams are the largest internal users of Semmle at Microsoft.

You're right though, it's not a panacea, and it could probably be great for other uses too.

wglb · on Sept 18, 2019

Any idea what the false positive rate is?

tru3_power · on Sept 19, 2019

If you use the out of the box rules for any of these tools the false positive rate will usually be pretty high. The trick is to write custom rules that are more tailored to your code.

wglb · on Sept 19, 2019

So how much is involved in writing the rules, and at the end of it, what is the net false positive?

tru3_power · on Sept 19, 2019

It took me a few months to get decent results with a low false positive rate. We haven’t had the tooling in place long enough to give hard stats but our aim is to have a false positive rate of less than 25%. Another great thing that these tools provide (if the results are valid) is that they let more junior members of the team/developers not as familiar with security issues to be able to understand the vulnerabilities found as they display a nice call flow graph/diagram that’s shows source to sink.

sneak · on Sept 18, 2019

Nothing is a panacea. Things that help move the needle without requiring tons of time or effort are useful and valuable. I'm really glad to see more efforts in this area.

DannyBee · on Sept 18, 2019

While it's true that there is no pancaea, Semmle will not move the needle on vulnerability finding. This I have extensive data on.

(I mean this in terms of capability, not sudden popularity)

It would move the needle on a bunch else. It is a good tool for sure (and im very happy for them), i just think they will disappoint people by pressing this particular narrative, and wouldn't do so with a different narrative

intern4tional · on Sept 18, 2019

I'd be curious as to why you think that. Are you able to provide more detail on that claim? I have extensive experience with Semmle and my experience drastically differs from you.

With certain languages and a strong and diverse ruleset Semmle has it's strengths. In particular with native code (C, C++) and decent rules I have seen Semmle be very successful at finding certain classes of bugs.

lvh · on Sept 18, 2019

The Datalog part is interesting! Do they have a bunch of rules to make graph queries work nicely, like Datomic pull syntax or maybe some pattern matching syntactic sugar? Is the underlying thing still an EAVT store? Is any of that information publicly available?

dantiberian · on Sept 19, 2019

It looks like they have reasonable docs on their query language, in particular https://help.semmle.com/QL/learn-ql/about-ql.html#properties... has some info on the QL language.

https://help.semmle.com/lgtm-enterprise/user/help/generate-d... says "LGTM generates a database for each commit stored in a repository. Each database is a relational database that represents the structure of the codebase for a specific revision, or snapshot, of the code.", though a triple store could qualify as relational here. I couldn't find much more than that about the implementation details though.

lvh · on Sept 19, 2019

Right. I found those docs but they didn’t look like datalog queries at all. Of course that doesn’t mean they don’t compile down to datalog :)

lawnchair_larry · on Sept 18, 2019

I would love to hear more about this data, as everyone I know who has used it for vulnerability finding has very good things to say. Semmle have also demonstrated its capabilities with some high profile examples.

tptacek · on Sept 18, 2019

Who have you talked to about it? Outside Mozilla and Microsoft?

lawnchair_larry · on Sept 19, 2019

I’d rather not name drop individuals or companies, partly because I don’t know that these entities want their business relationships public, but neither of the companies you named. I’ve also used it myself (you can too, at lgtm.com). I’m aware that MS is a customer, but I don’t think I’ve talked to anyone there about their experiences with it. As far as static analysis goes, which is inherently limited, it’s far better than anything else I’ve tried (which is most of them).

tptacek · on Sept 19, 2019

Fair! Thanks!

mphi · on Sept 19, 2019

Could you share the details of your experience? My experience has been quite the opposite.

I have been using Semmle daily to automate much of the vulnerability discovery process and I am extremely satisfied.

We run it over millions of lines of Java code and have not yet run into scale or perf problems.

Developing custom queries and defining security invariants in a logic language is, quite honestly, a joy.

UncleMeat · on Sept 18, 2019

Semmle does not scale, both in terms of their index design and their overall system design. This makes it poorly suited for truly global program properties and much better suited for things like refactoring.

muricula · on Sept 18, 2019

It's being run frequently across the entire Windows OS repo. I have heard there is more work to be done to make it scale better, but it can scale.

intern4tional · on Sept 18, 2019

Am Microsoft.

Mountains were moved to make it scale, but that has been achieved. Semmle can scale with work - it just takes a lot of effort and code.

DannyBee · on Sept 19, 2019

Yeah, I think that is at least one of the issues. For us, for what it provides, it is not worth the time/effort vs just building our own tools or other options.

lawnchair_larry · on Sept 18, 2019

What had to be done to make it scale?

mynegation · on Sept 19, 2019

I was not part of this effort but I did scale another static analysis tool for industrial size codebases and have a patent on it.

To simplify a bit you can think of most static analysis algorithms in terms of graph problems where nodes are statements and functions and edges are flow of control and calls. On large codebases the amount of edges, nodes, and calculated data is just too big to keep in memory. The trick is to break the graph intelligently into parts, calculate some sort of summary information for each of them, distributing between cpus or computers, move up to the supegraph of graphs and perform higher level calculations on it.

tru3_power · on Sept 19, 2019

I was looking at it earlier and the query syntax seems awesome. I’ve spent the last few months writing custom rules (queries) for fortify sca- which is another static code analysis tool and I must say, Semmle seems like it’s a lot easier to use.

Static code analysis tooling is never the end all be all for vulnerability research, but it does let you express vulnerability patterns for implementation type vulnerabilities and find them at a mass scale (that is if your rules/queries are legit).

psygnisfive · on Sept 19, 2019

> Security vulnerability finding is almost certainly the wrong target for Semmle

CVE-2019-5876

CVE-2019-16230

CVE-2019-16231

CVE-2019-16232

CVE-2019-16233

CVE-2019-16234

CVE-2019-15026

CVE-2019-14192

CVE-2019-14193

CVE-2019-14194

CVE-2019-14195

CVE-2019-14196

CVE-2019-14197

CVE-2019-14198

CVE-2019-14199

CVE-2019-14200

CVE-2019-14201

CVE-2019-14202

CVE-2019-14203

CVE-2019-14204

CVE-2019-14437

CVE-2019-14438

CVE-2019-14498

CVE-2019-14535

CVE-2019-14534

CVE-2019-14533

CVE-2019-14776

CVE-2019-14778

CVE-2019-14779

CVE-2019-14777

CVE-2019-14970

CVE-2019-15119

CVE-2019-14524

CVE-2019-14523

CVE-2019-7307

CVE-2019-11476

CVE-2019-13115

CVE-2019-3570

CVE-2019-13110

CVE-2019-13112

CVE-2019-13113

CVE-2019-13108

CVE-2019-13109

CVE-2019-13111

CVE-2019-13114

CVE-2019-3560

CVE-2019-9721

CVE-2019-9718

CVE-2019-9717

CVE-2019-9720

CVE-2019-9719

CVE-2018-20222

CVE-2019-3828

CVE-2019-6986

CVE-2019-5414

CVE-2018-4460

CVE-2018-16491

CVE-2018-16489

CVE-2018-16490

CVE-2018-19476

CVE-2018-19477

CVE-2018-19475

CVE-2018-19134

CVE-2018-16472

CVE-2018-18820

CVE-2018-4407

CVE-2018-16487

CVE-2018-4259

CVE-2018-4286

CVE-2018-4287

CVE-2018-4288

CVE-2018-4291

CVE-2019-5413

CVE-2018-16461

CVE-2018-16469

CVE-2018-16486

CVE-2018-16460

CVE-2018-16492

CVE-2018-11776

CVE-2018-8018

CVE-2018-8294

CVE-2018-4249

CVE-2018-8013

CVE-2018-5388

CVE-2018-1295

CVE-2018-4136

CVE-2018-4160

CVE-2018-1000140

CVE-2017-15692

CVE-2017-15693

CVE-2017-13904

CVE-2017-15089

CVE-2018-6834

CVE-2018-6835

CVE-2017-15713

CVE-2017-12634

CVE-2017-13782

CVE-2017-7545

CVE-2017-14949

CVE-2017-14868

CVE-2017-8046

CVE-2017-8045

CVE-2017-9805

CVE-2017-1000207

CVE-2017-1000208

CVE-2017-12612

CVE-2017-0141

https://lgtm.com/security/

DannyBee · on Sept 19, 2019

Uh, I'm not sure why you believe this is an effective retort, perhaps you would like to explain?

captn3m0 · on Sept 19, 2019

>This list provides details about security vulnerabilities discovered by the Semmle Security Research Team using Semmle QL.

Clearly, it works.

tptacek · on Sept 20, 2019

You want to see how long a list I can make for you for grep?

wglb · on Sept 21, 2019

How would this list compare to other methods?

lawnchair_larry · on Sept 18, 2019

You’re wrong on that, security teams at the major tech companies love it, especially for variant analysis. Ask your coworkers at Google! One of which recently left to become Semmle’s Chief Security Officer.

DannyBee · on Sept 19, 2019

I know Fermin, quite well, and I know the state of using it at Google. I technically am the one paying for the contract at this point!

I've also met repeatedly about it with all Google customers over the past few months, Both those currently using it, and those that stopped.

Prior to that, I had met with most that used it but stopped, at the point they started using it. There were several large scale attempts/efforts to use semmle in various ways, by various teams.

I'm trying to be as nice as possible here, since, as I said, I think it is a great tool for a lot of things, and I've been a strong supporter for this kind of technology for those cases (for years, in fact, as i'm sure some folks at Semmle can tell you) so I'd rather not burn it all down, which I expect is what would happen if i did a point by point explanation of everything bad about it.

So I will instead reiterate my claim, and take my downvotes ;)

lawnchair_larry · on Sept 19, 2019

Interesting - I’d like your unfiltered take, but I totally understand your position. I’ve heard very positive feedback from one security person at Google, and I don’t know Fermin, but taking a CSO role there would suggest to me that he believes in it.

If you are comfortable sharing more, I’d be curious what you found that it struggles with, without burning it to the ground. It’s possible that different security teams use it in different ways, and it might be more suitable depending on your expectations. It’s also possible that the feedback I heard was during the honeymoon period, and practical issues outweigh the utility once you use it more.

But I’ve seen real 0-days found with it, first hand and second hand, so I’m having trouble reconciling that with your account that it’s not useful for security.

DannyBee · on Sept 19, 2019

"and I don’t know Fermin, but taking a CSO role there would suggest to me that he believes in it."

Sure, but Fermin was also offered a fairly ridiculous amount of money and a serious promotion :)

I mean, he doesn't not believe in it, of course, but i also think most folks would have taken the role in his situation.

IE it's not the kind of offer that really required a lot of faith

I'll try to write a bit more later after i think about how to frame it :)

fjserna · on Sept 19, 2019

Fermin here. Danny, I loved working with you at Google but let me correct you about my promotion and money. Not true.

I respect your point of view around our technology. You may like it or not (some folks love it), but please do not make statements about me you do not really know :)

And to be clear, I believe this technology makes security researchers scale on different aspects. At least I had first hand experience with this and our goal is to make security easy for non security folks... this technology enables us to do this.

Happy to sync in private over a coffee!

on Sept 19, 2019

[deleted]

fjserna · on Sept 19, 2019

All good Danny, we had good times at Google... let's remember those :)

Coffee offer is still there!

oegerikus · on Sept 19, 2019

Danny, I am the CEO and founder of Semmle. I will refrain from arguing about the value of our product and technology. However, I must correct your statement about Fermín which is utterly false. He took a huge pay cut to come to Semmle. Please stick to facts when talking about people.

lawnchair_larry · on Sept 19, 2019

So I did read a whitepaper about static analysis at Google, and how it was largely self-serve - let developers run the tools and fix what it tells them to as they see fit. I’m wondering if it was under this model where you found it was not useful. I would not expect it to provide much value in that scenario, and would not be surprised by your feedback.

If your data is closer to a model where security bug hunters whose sole job is to find vulnerabilities and audit code, and it was deemed not useful in that scenario, then yes, I am at odds with your claim. Admittedly, that’s a pretty niche set of customers. If you don’t learn Semmle QL, and you aren’t writing queries, it’s probably not for you.

collingreene · on Sept 19, 2019

Here is our experience building and using program analysis as part of our product security efforts at facebook: https://engineering.fb.com/security/zoncolan/.

Its run in both self-service (output to developers), guided (output to product security oncall of security engineers) and used ad-hoc to power up manual security reviews. Depending on the accuracy of each rule and the impact of the pattern of security flaw the rule finds it is promoted to ultimately output to developers directly.

It finds about a third of the security vulns we unearth each year.

lawnchair_larry · on Sept 19, 2019

That’s been my approach as well. An astonishingly large number of companies think they can buy an off the shelf static analysis tool and pipe the default output to developers. That’s counterproductive. A very small percentage of developers will understand the output, be able to assess the exploitability/severity, and care about fixing it. One might think you could then just have them take the “better safe than sorry” approach and fix everything, but FP rates for all of the commercial tools make that completely untenable. At the same time, you can’t expect to convince small teams of developers to model everything out and define sources/sinks using some obscure DSL that they have to learn. But, there are classes of issues that are extremely high impact, but only low accuracy static analysis rules can find the candidates. It’s that part in the middle that you don’t want to throw out, but you need security experts to vet. Other cases with high confidence checks are appropriate to short circuit straight to the devs, but it’s a bad first step.

on Sept 19, 2019

[deleted]

jlokier · on Sept 19, 2019

I think there's some great feedback here, for anyone at Semmle thinking about how to develop the tool further.

jlokier · on Sept 21, 2019

Annoyingly, now the GP post is now deleted, the context to my comment looks different, and I can't delete my comment.

Since the GP has been deleted I'll respect that and not reference specifics, but I want to clarify for any passers by, much of the GP comment I replied to was of a detailed and technical nature, about things like performance enhancements, features and semantic analysis approaches that could make the tools useful in more use-cases - very different from the rather general and personal criticisms I see elsewhere in the nearby comment tree.

It's the suggested technical and product enhancements that I felt was potentially useful feedback, rather than any of the criticism (I can understand why those are deleted).

tptacek · on Sept 18, 2019

First of all, Daniel Berlin is pretty senior at a reasonably large tech company a lot of us here have heard of.

Secondly, I know Microsoft loves it, which is presumably where your telemetry comes from, and I know a lot of security people on Twitter are fans of the technology, but I've been asking around and "love it" is not the signal I'm getting from software security blue team people. "I installed it, I guess it does some stuff, we never think about it" is the modal feedback I've seen.

I'm very interested in hearing success stories about this; the problem Semmle addresses is a huge part of the cost basis for my practice, and I'd love to hear that someone has gotten it working well.

lawnchair_larry · on Sept 19, 2019

It’s absolutely useless in the wrong hands, so I don’t think you’d necessarily get a good signal by asking your average blue teamer. It’s a godsend for someone who spends a lot of time auditing code and has some experience writing code analysis tools.

I mean think about it, if you wanted to write a query against the AST of a target, would you find that useful? Or in a given codebase, if you find one bug, would you like the ability to capture that in a query that can tell you if a similar mistake was made elsewhere?

Out of the box, it isn’t going to give you much value. It’s the power of the query language, if it’s your job to do that, where you’ll see the benefits.

But don’t take my word for it, just try it out.

Their licensing model may be problematic for your use case though. I only vaguely understand what you do, but last I asked them about it, it’s not possible to get a personal license that a security person can use for multiple projects, and my read was that they had no interest in selling to individuals anytime soon.

tptacek · on Sept 19, 2019

I mean, queries against an AST is sort of standard security tooling; the difference appears to be that Semmle (1) properly assigns types in C/C++ and (2) exports that query language. (1) makes sense to me; (2) I don't know how much better I'd get than just hand-writing tree walkers.

apaprocki · on Sept 19, 2019

Anecdata, I have also successfully used it to unearth complicated bugs that I already knew must exist, but didn’t have a good way to find.

Think of a query like: Find all calls to function A that have an output pointer-pointer of type B in the last argument position and also have a boolean return type, verify the input dereferenced B is NULL, then verify that iff A assigns output to B that the return type is false and that the calling frame also includes a later call to function C to clean up B. This type of thing run on millions of lines of code.

This can get as crazy as you want it to be and did work, but there is a “but.” The query language is so verbose and powerful, there were many, many ways to represent the same query that all had drastically different performance profiles. The docs were woefully underwhelming in that regard — they simply stated what a member did, but not anything at all related to memory or performance implications. That, coupled with the fact that nearly every complex query during dev hit the wall of the JVM killing it for taking too much time/mem, it became apparent that they must have tooling for the tooling to analyze performance and memory profiles of queries (not unlike all the effort put into SQL over the years). Also, no debugger; resorted to “datalog printf debugging” (ie, including internal clauses as outputs and chopping off later parts of the query...)

I spent a few weeks only writing queries and in every non-trivial one I needed a senior person there to review/rearrange a few clauses to go from e.g. 30 minute runtime to 15 seconds. That left me with a feeling that I was constantly fighting the docs and lack of tooling and would constantly need their help to tune things. Nothing was fundamentally wrong with the queries — it was just not documented anywhere that some filtering should be done caller->down vs. other kinds of checks in the same query should be callee->up.

I had questions regarding scaling up to 1B+ LoC like others in the thread but didn’t really get that far.

It did successfully find the C/C++ bugs I was looking for once the queries had hours of investment from both sides, and we were also able to find bugs in a large custom JS codebase by mocking all the things it would need to understand to eval the code. Whether or not that investment makes sense is an individual/team/org question, but if they ever seriously invest in a query debugger and profiling, they’ll be pretty hard to beat.

tptacek · on Sept 19, 2019

This is exactly the kind of thing I'm interested in knowing about Semmle. Thank you!

strogonoff · on Sept 19, 2019

I’d presume the tool would be used by a whole team, not just Thomas alone, and hence be licensed to a company… Are you saying that they don’t sell the tool to infosec businesses working with multiple end customers?

lawnchair_larry · on Sept 19, 2019

That’s correct. They sell to whoever owns the code being looked at, and charge accordingly, based on how many developers the codebase has. The incremental cost for people on security teams to use it is actually $0, no matter how many of them are working with it. If you have 500 developers checking in code, that’s what they charge you for, and read/query access to the results is “free”.

nickpsecurity · on Sept 18, 2019

The funny thing is I was working on a pitch to get Atlassian to buy them so they don't end up in Microsoft's hands. I thought integration with a repo company would be good since they could cross-sell it for code understanding and maintenance. Then I see this article. (sighs) At least I got it right on the type of company that would grab them.

I'd push something else for security, though, to complement it. RV-Match is my favorite commercial one because they built on a formal semantics for C, it's set for low false positives, and they open source a lot of stuff. They have something for Java and smart contracts, too. Past that, what's good depends on what language you use.

tptacek · on Sept 18, 2019

Just a small fraction of the industry's ongoing software development is done in C. Obvious, Google, Microsoft, Apple, and Mozilla still write a lot of it, but you don't acquire a whole company to address 4-and-change customers.

nickpsecurity · on Sept 18, 2019

They'd have them retarget to what's popular post-acquisition in my concept. Especially among their paying customers. Keep adding languages or just useful things for it to look for.

tptacek · on Sept 18, 2019

Security problems aren't interchangeable between languages, and C has a very particular set of concerns that don't translate.

iainmerrick · on Sept 18, 2019

Why would Atlassian be better?

nickpsecurity · on Sept 18, 2019

They were kind of a default since there's only a few huge ones. Main win: it's not Microsoft. MS already has lots of internal tools from MS Research they were wasting. They probably couldve built Semmle themselves. They're also a known patent troll. I don't know if Semmle's methods are patented, though.

I'd rather a company like them not acquire them in favor of one that is constantly developing new services with no attempts to financially drain 3rd parties. Not to say good won't come out of Github integration given huge number of projects in it.

tensor · on Sept 18, 2019

Am I reading that right that it's only on public repositories though? For private repositories I guess you have to buy through Semmle directly (via call us pricing)?

hanniabu · on Sept 18, 2019

This would make sense since Semmle would likely need access.

throwaway744678 · on Sept 18, 2019

I'd be ready to put money on the fact that GitHub has access to all repositories, even private ones!

smudgymcscmudge · on Sept 19, 2019

Of course they do, but they also have safeguards in place that prevent access without alerting auditors and eventually the repo owner.

Dirlewanger · on Sept 18, 2019

So I'm guessing they'll be merging what they have now with Semmie's tool? Because they've had the free vulnerability check for a while now.

tptacek · on Sept 18, 2019

Different things. Github has features that scan repos for "known-vulnerable" dependencies. They do not have features that scan for new vulnerabilities.

thomasahle · on Sept 19, 2019

Yes and no. LGTM is about known vulnerabilities. It doesn't (currently) use artificial intelligence to discover new vulnerabilities, but it allows writing complicated yet efficient patterns for vulnerabilities found by human intelligence.

So it's more advanced than simple "know bad dependencies", but it's also not quite "new vulnerabilities".

braindongle · on Sept 19, 2019

Thank you for the zero-indexed reference list. Brought a smile.

_xnmw · on Sept 18, 2019

I hate that these kinds of Orwellian phrases "Welcoming X to the Y Family" have now become idiomatic of corporate English. Ugh, no. There is no "family" involved here, not by any stretch of the word.

javagram · on Sept 18, 2019

To be fair, if a “parent corporation” is a thing, then logically it has children and can be a corporate family.

_xnmw · on Sept 18, 2019

Intent matters. The phrase "parent corporation" has no PR or emotional intent. "Welcoming X to Y family" has a clear emotive intent.

andyfleming · on Sept 18, 2019

Maybe the intent of the writer was to make the new hires feel welcome aboard to their new company.

That's also not mutually exclusive of the emotive intent you are describing. What makes that Orwellian though?

devmunchies · on Sept 18, 2019

The parent comment seems to be criticizing the higher-level corporate trend to use this lingo, and isn't talking about Friedman or Github specifically.

devmunchies · on Sept 18, 2019

Github has a cartoon cat-octopus all over the site I would expect as a baby toy. Clearly linked to "emotional intent" "Github is so fun, guys!" I think they've outgrown that style (the black and white one in the header is ok)

Dirlewanger · on Sept 18, 2019

Welp, guess it's time to bust out the "I'm offended and we need to change this lingo" card because corporations aren't people and we should stop referring to them that way

striking · on Sept 18, 2019

I mean, in all seriousness, aren't they a little bit like people? https://www.npr.org/2014/07/28/335288388/when-did-companies-...

rndgermandude · on Sept 19, 2019

They are slow AIs: https://media.ccc.de/v/34c3-9270-dude_you_broke_the_future

mytailorisrich · on Sept 18, 2019

It actually took me a while to understand that the announcement likely meant that they had acquired Semmle...

spjt · on Sept 18, 2019

They have to say that anyway, because Microsoft is acquiring Semmle, not GitHub. It is joining the GitHub "product family".

_xnmw · on Sept 19, 2019

If they had said "Welcoming Semmle to the Github Family of Products" instead, that would've been much more tolerable.

smitop · on Sept 18, 2019

When I read the headline I assumed it meant that GitHub was hiring an employee named Semmle, which confused me until I realized Semmle was a business.

xvilka · on Sept 19, 2019

Free hint for the GitLab - they can integrate a similar but open source tool - Infer[1]. Essentially it provides the similar features, just lacks a good interface to do so. They also have a query language, called AL[2]. It is way less polished than Semmle, but opensource and with a good potential.

[1] https://github.com/facebook/infer

[2] https://fbinfer.com/docs/linters.html

chuckgreenman · on Sept 18, 2019

Interesting to see the differences between Github and Gitlab's strategy in this arena.

Github appears to be going the aqui-hire route with Semmle, dependabot, pullpanda etc, where as I don't think Gitlab's made an acquisition for a year or two.

troydavis · on Sept 18, 2019

GitLab published what they're interested in: https://about.gitlab.com/handbook/acquisitions/. It's an amazing, one-of-a-kind doc. One of their constraints (https://about.gitlab.com/handbook/acquisitions/#what-we-offe...) is quite limiting, though:

> The total purchase price of the deal, paid in cash, will not exceed $1M and will be the total and only compensation for the entire deal.

andrewprock · on Sept 18, 2019

They are looking at companies that: "Raised under $10M total investment funds, last round being over 3 years ago"

This implies that in addition to self-funded ventures, they are looking for fire sales from failed start-ups.

pnako · on Sept 19, 2019

It looks like they're buying (big) features, not complete solutions or companies. That's actually an interesting approach; I'm sure others do that, but maybe not as explicitly.

It would allow a small team of hackers to have a decent exit without having to go through the whole startup road.

claytonjy · on Sept 18, 2019

Gitlab hasn't generally seemed interested in these sorts of free scanning tools. I wonder if that's because their users are much more weighted towards private/self-hosted than Github's are? Because so little open source happens on Gitlab, they can't buy good PR through this kind of strategy like Github can.

leblancfg · on Sept 18, 2019

I've been looking quite a bit into this recently, and even though they might not be screaming it from the rooftops, Gitlab offers quite a few security-related features. There are code scanning, dependency tracking, etc. features at various levels of readiness.

https://about.gitlab.com/devops-tools/ https://about.gitlab.com/stages-devops-lifecycle/secure/

prepend · on Sept 18, 2019

They’ve had SAST tools for a few releases, but high up in the paid license types. With GitHub providing for free, they may need to move them into CE.

beckler · on Sept 19, 2019

Their scanning tools are "source available", but they're definitely not open-source. The license is gonna be a non-starter, but how they built their SAST tool [0] is actually quite interesting.

It just uses existing open-source analysis tools, but orchestrates them all into a single tool by coordinating a bunch of docker images.

[0] https://gitlab.com/gitlab-org/security-products/sast

andyljones · on Sept 18, 2019

Microsoft has $130bn cash-on-hand.

The surprise is really that they're not being more aggressive in their acquisitions.

pbhjpbhj · on Sept 18, 2019

They should be like Yahoo was and buy everything they see? (for billions, only to sell it at a massive loss later)

I've not heard from Yahoo in a year at least, do they still exist ...

chongli · on Sept 18, 2019

I just got an email from Yahoo about a settlement in a class action lawsuit over a massive data breach. It said something about Yahoo paying for 2 years of credit monitoring service to anyone affected by the breach.

Maybe that's not exactly what you were looking to "hear from" Yahoo about, though...

archon810 · on Sept 18, 2019

Semmle's post: https://blog.semmle.com/secure-software-github-semmle/.

pja · on Sept 18, 2019

Github has been really working on their source code analysis toolkit recently & this acquisition makes perfect sense as part of that strategy. Congratulations to Oege & the team.

fnord123 · on Sept 19, 2019

First project I look up on lgtm.com is rust.. Second alert I find is this:

https://lgtm.com/projects/g/rust-lang/rust/snapshot/f5aa590b...

exist_ok is available from python 3.2, so this isn't a good impression.

https://docs.python.org/3.7/library/os.html#os.makedirs

dazbradbury · on Sept 18, 2019

Huge congrats to Oege and the team at Semmle - couldn't be happier for a hugely passionate and smart individual (and a previous professor of mine!)

Am sure this will bring some amazing advances to Github and thus a huge % of the developer community.

rishicomplex · on Sept 18, 2019

"Human progress depends on the open source community."

What a way to begin an article.

chromeguy66 · on Sept 19, 2019

Especially considering M$ owns GitHub

tom-jh · on Sept 19, 2019

I've just tested their lgtm.com on our codebase:

1) identified str.replace('[ABC]+', '') correctly as a bug (looks like a regex but is string literal)

2) identified various unnecessary code that TypeScript overlooked

3) identified double-unescaping of html (this one would have probably gone unnoticed for years)

And a bunch of other stuff. No actual vulnerability in our case, but still very useful. I'm enabling their checks on every future PR.

This was TypeScript but they support the rest of our stack too (Python, Java). I wonder if this includes Kotlin - will try.

tom-jh · on Sept 19, 2019

Tested, Kotlin is not supported, nor is Swift.

throwaway744678 · on Sept 18, 2019

> Human progress depends on the open source community.

(Non native speaker here). Am I misunderstanding something, or is the author explaining that humanity can not progress without the open source community?

mytailorisrich · on Sept 18, 2019

That's right. That's called hyperbole (with a 'e').

sounds · on Sept 18, 2019

As the other comments point out, it's hyperbole. It's also an aspirational statement.

Aspirational, as in, they wish that github would be the key factor in human progress. Maybe I can say that even plainer: to the leadership at github, the progress of the human race depends on github.

The open source community depends on github.

That's what it's implying.

(The reader is left to identify that as wishful thinking.)

networkimprov · on Sept 18, 2019

That's the meaning I took from it (USA native).

rishicomplex · on Sept 18, 2019

I've used semmle's tools at Google, they seemed pretty powerful.

notus · on Sept 18, 2019

I spent way too long thinking that Semmie was just a badass programmer

z3t4 · on Sept 19, 2019

Would be cool if the tools would be made open source in order for everyone to get more security.

robbystk · on Sept 19, 2019

So this is the excuse they're using to build infrastructure to scan through everyone's code to find whatever they want.