Is it the lack of liability and regulation that clears the way for this kind of corporate citizenship? Is it cultural?
When you openly celebrate really nerdy technical achievements
When you incentivize and promote side projects and open source
There grows a camaraderie with the projects and tools that make the web what it is, and the volunteerism that open-source projects require
Google also thrives in a healthy internet.
Reminded me that I heard that Google's 20% time had generally like 10% participation and has had a number of conditions including manager approval for the last 4 years. Is this article breathing new life into the myth of 20% time or does this reflect the revival of the process?
Either way, incredible accomplishment on the patch army!
I took copious amounts of 20% time in my time at Google (2009-2014). Usually I'd start a 20% project with neither my manager's knowledge nor his approval; if it looked like it had legs, I'd let him know about it and ask what he thought. I never had a manager outright forbid me from working on a 20% project; responses ranged from "You should consider this your main project now; it's critically important that we understand this area" (along with a spot bonus for delivering on it) to "Well, you can work on it, but you are unlikely to get credit for it come promo time." In general, as long as I got my work done, my managers didn't care what else I was working on.
This suited me fine - usually the way I took 20% time was to fit it into time periods when I didn't really have much else to do or I was bored with my main project. And splitting it up like this let me focus more intently on both of them, which helped in delivering.
It certainly depends on the particular manager or even org, but in general 20% projects are still a thing for those who want them (not many).
Why is that? Any more insight for that?
And even when a person is enjoying the 20% opportunity, most of the time there is no good project to take on. During my stays at Google, I've done many 20% projects (>10), but only had a 20% project may be a third of the time.
IMHO, this is a good example of 20%.
Google benefits greatly from healthy code ecosystem. Any exploit could easily come back and bite them in the ass. As big as they are and as much code as they pump out, they still rely on third party code and less exploits in the wild is a big win for them.
On a tangent, why shouldn't we celebrate projects that benefit both private and public interests? IMO, society would be quite a bit better if the two were more often aligned (like you suggest they may be in this case).
I wasn't implying that it's bad when private and public interests are aligned. What I'm saying is that this is not the same as altruism, and if private and public interests are at odds, the private ones will likely win as they have before at Google. This is just the nature of a capitalist company. Let's not have illusions about that.
How do you think we got here:
I have to point out that Microsoft are still the biggest supplier of desktop Web browsers.
So I'm on Google's open source team.
Rosehub was driven entirely by rank-and-file Googlers who saw the MUNI hack and thought "This sucks and it's so simple to fix and it shouldn't happen." I remember Justine sending out some messages to various mailing lists with the identified vulnerable repos in a spreadsheet, and then people volunteered to send pull requests to repos they claimed off the spreadsheet.
It's the sort of thing that happens because Googlers are actually passionate about software in the large; they just happen to work at Google. The only things I think that Google as a corporate entity does to foster that is hire the same sort of passionate people, and let them decide how best to use their work hours. This is the sort of effort that naturally arises from that.
It's the same sort of thing that causes Googlers to want to open source their code (and get the director approval to do so). Googlers do it because they want to give back to software in the large, not for any perceived benefit of getting a community to work on the code (we have thousands of repos on https://github.com/google that never get any external contributions).
However, the MUNI hack certainly did motivate being more public about the project and writing this blog post, since it really helped underscore the severity of this vulnerability in very real, concrete terms.
That must be sad life, is this common? I work as game programmer and me and almost every other game programmer I know code for fun, and we are not paid very well.
I think Google is doing some of the most impactful security work as a company I have ever seen. Between GOOG and MICROSOFT (within past few years) they have been instrumental in moving the bar and the industry. They both catch flack for the privacy issues (which is a leg of the InfoSec triad) but I think most people should be grateful for the work they do.
Products like Coke are internationally recognized but recognition != likeability, which is why they continue to pour in big $$$ in advertising.
This is advertising BigQuery which is a Google Product people may be less familiar with. This nicely shows off what can be done using a large dataset.
I'm grateful for the work they've done here, and I have no doubt individual employees donated either their free time or 20% time (or w/e it is now). But that doesn't negate the fact that this is an advert for BigQuery.
If you want to see other use cases - I've collected plenty of other stories from multiple parties at:
Disclosure: I'm Felipe Hoffa and I work for Google Cloud (https://twitter.com/felipehoffa)
I submitted this post as "Googlers used BigQuery and GitHub to patch thousands of vulnerable projects". After it got to #1 on the front page, mods silently changed the title to "Operation Rosehub – patching thousands of open-source projects"
I wish HN had a more transparent way to show that the mods changed a title and why. Since HN does not, the least I can do is add this info for transparency.
(related https://news.ycombinator.com/item?id=6572466 https://news.ycombinator.com/item?id=4102013)
You add your repo and a bot is constantly checking for insecure and/or outdated packages and sends you a pull request if you need to update.
It's free for open source projects at https://pyup.io
I was initially concerned that constant PR's for dependencies would be too noisy for our team but it turned out that it's configurable enough and gracefully handles us ignoring or closing PR's that we evaluate and decide to wait on.
It's a great service that all python developers should be using. (I say that because I want as many pyup users as possible so it never goes away)
A quick suggestion, consider adding a full sample configuration along with your config docs: https://pyup.io/docs/configuration/
It's a lot easier to see how it all sits together with a sample.
CircleCI has a great example: https://circleci.com/docs/1.0/config-sample/
This it how it looks like: https://github.com/pydanny/cookiecutter-django/pull/1065
> it would be like hiring a bank teller who was trained to hand over all the money in the vault if asked to do so politely, and then entrusting that teller with the key. The only thing that would keep a bank safe in such a circumstance is that most people wouldn’t consider asking such a question.
This does not only work for deserialization issues.
It is a great analogy for a huge class of IT security issues!
Maybe we should use that one when communicating with the media. This this works much better than the usual burglary analogy. I like how it points out that this is about stupid and/or malicious behaviour (code), where the attacker (hacker) just needs curiosity, and may find this out even by accident. The attacker did not have to break something, and did not damage anything, to get into something. In particular, this makes clear that this is caused by irresponsibile behaviour of the organization and/or other entities to whom they delegate trust.
Even for more complicated scenarios, I like the bank teller analogy more than the classic burglary analogy. In that case, the attacker observes multuple bank tellers, and notices e.g. that if you ask the first teller for form A and put in certain words, another bank teller will accept it and give you a stamped form B, which you can show to a third teller in another branch office who will look a bit confused, but finally accept it and hand over all money to you.
We need to get over blaming the messengers, buying zerodays and declaring cyberwar. What we really need to do is to finally make our computer systems secure and trustworthy, at least up to a certain minimum-level of sanity: no exec, no injection (i.e. typing/tagging), no overflows (i.e. static analysis), input validation, testing, fuzzing, you name it.
And this cannot work by just adding more and more complex security measures outside, but more importantly simplifying and cleaning up inside. Although rewriting software from scratch is very risky, radical refactoring is not! And every good software engineering course tells you how to do it correctly.
 security researchers, but also "amateur" hackers, or just someone running into it by accident because the security issue became so large it finally had to be noticed by someone.
 in the sense of: everyones!
libraries.io did make it to the front page a few months ago, but I think its underlying vision might not have been driven home from just glancing at its home page. It supports 33 package managers (not just Java, though I'm sure Rosehub doesn't just do that either) and Github/Gitlab/Bitbucket, not just Github. And it provides both email notifications and auto PRs.
But that's just the overlap with Rosehub. On top of that it offers the means to discover libraries based on a Dependency Rank (think Page Rank but using dependencies instead of hyperlinks). Which in turn allows it to surface projects with a high "Bus Factor" -- projects maintained by few committers, but depended on by many (so they'd be more affected by said committers getting run over by a bus). AND it mines the licenses for a project, notifying if any of the dependent licenses are incompatible with the parent license. What's more it's a non-profit organisation receiving enough funding to employ 2 full time devs.
I think libraries.io is Rosehub and more, to quote the about page;
Our goal is to raise the quality of all software,
by raising the quality and frequency of contributions
to free and open source software; the services,
frameworks, plugins and tools we collectively refer
to as libraries.
Here's some links:
FROM (SELECT id,content
FROM (SELECT id,content
WHERE NOT binary)
WHERE content CONTAINS 'commons-collections<')
BigQuery has since released ANSI 2011 "standard SQL", which would does have an optimizer and would push predicates down).
(work on GCP and worked on BQ until recently)
(like this https://www.youtube.com/watch?v=cO1a1Ek-HD0)
Note that the published query scans 2.25 TB of data. While impressive, for a better workflow and cost management I would split it into a 2 step process:
- First extract all the files I'm interested in to a separate table (all pom.xmls?).
- Then run whatever analysis you want over those files.
SELECT pop, repo_name, path
SELECT id, repo_name, path
FROM `bigquery-public-data.github_repos.files` AS files
WHERE path LIKE '%pom.xml' AND
WHERE NOT binary AND
content LIKE '%commons-collections<%' AND
content LIKE '%>3.2.1<%' AND
id = files.id
difference.new_sha1 AS id,
ARRAY_LENGTH(repo_name) AS pop
CROSS JOIN UNNEST(difference) AS difference
ORDER BY pop DESC;
As a disclosure, I work on the project to support standard SQL in BigQuery.
Interesting fact: Justine was the founder of occupywallst.org, which was the highest-trafficked publisher/web hub for the Occupy Wall Street movement before she worked for Google.
Are these pull requests that the project would still need to approve/merge or were they just pushed in?
It would have been great to see a count of how many PRs have been accepted.
Interesting that 2100 of the PRs are "Upgrade Apache Commons Collections to v3.2.2" and just 7 were "Upgrade Apache Commons Collections to v4.1".
So even if they don't get merged, they still serve a purpose, even if it's just for a few people who behave like I do.
Still pretty awesome.
Thank you for adding in the part about the bank teller.
"it would be like hiring a bank teller who was trained to hand over all the money in the vault if asked to do so politely, and then entrusting that teller with the key."
To read that from Google is frankly disappointing. While this is true of many open-source projects, it doesn't have to be that way. Red Hat (and Google!) are brilliant proofs of this.
a) make a contract with a company that takes responsibility for X, or
b) hire somebody who takes responsibility for X, or
c) take responsibility for X on your own
It doesn't help to "buy" closed-source software X from another company if you can't count on them in case of emergency, i.e. if they vanish, go bankrupt or put their lawyers onto you.
Then, better take open-source software where you can take responsibility on your own, for which it may help to hire one or more of the lead developers.
Eh, why not just get rid of the bad version? Alternately, release a bug-fixed copy with the same version number.
Any breakage is a case of "oh well, you're safe now". Leaving the security hole is probably worse breakage.
If you just re-publish the old version, it's difficult to know whether you've taken the change.
If you are going to reissue the same version number - why bother having version numbers at all?
And this work is very useful in so far as I'm sure the benefits it provides is going to massively outweigh the cost. However, if you have a naked ObjectInputStream#readObject in your code then you probably still have an exploitable security issue. Have a look at how well Jenkins strategy was to fixing this issue which was basically the same strategy as Operation Roshub. ie: removing the ability to access classes that were known to be used in gadget chains. Surprise, surprise it didn't last very long and people just found new gadgets.
And if you read this blog post then you might be mistaken into thinking that removing commons-collections from your classpath or upgrading commons-collection to the 'safe' version would make object deserialization safe but this is not the case. if you have a naked ObjectInputStream#read in your code then you are vulnerable to remote code execution.
While gadgets may not be the root weakness, the gadgets certainly help. We may never be able to have perfect security. Hopefully the systemic paradigm shift infosec professionals are advocating will come some day. But until that day arrives, we can make people so much safer, with minimal effort, by simply disabling these gadgets.
Almost no one uses them. Out of all the projects I found, I was only able to identify one or two that were legitimately using the gadgets in question.
Guideline 8-5 / SERIAL-5: Understand the security permissions given to serialization and deserialization
Permissions appropriate for deserialization should be carefully checked. Additionally, deserialization of untrusted data should generally be avoided whenever possible.
And do you want to have a guess as to how many times Serialization was used to bypass the Java Sandbox between when Sami Kouvi made his blogpost and someone made a con talk on about Apache. Hint: it is greater than 1. [https://tyranidslair.blogspot.co.uk/2013/02/fun-with-java-se...]
We have also demonstrated numerous times to the programming community that deserialzation of user data is dangerous. For example Stefan Esser has shown numerous times that PHP deserialization is dangerous both because PHP deserialization is a source of bugs in itself and because it is a source of bugs because it interacts with application code in unexpected ways. We have seen the same thing in both python with unpickle and ruby with YAML.
I'm going to let you in to a secret within the infosec community. You can find bugs by just applying existing research in new and novel ways because developers do not follow security research.
I feel like I'm falling into some rationalism fallacy by ranting at you because you are doing something useful to improve security. But you could be doing much much more. You have a voice and people will actually read your blog as compared to Sami :( You could have mentioned that we people should stop doing ObjectStream#readObject() or you could have pushed for updating the JavaDoc to say: THIS IS A BAD THING DO NOT DO IT.
EDIT: apologies to anyone that realized that java serialization was bad before the Sami post. I wouldn't be surprised if this was part of the Java secure code guidelines before then or if someone had exploited the issue before then. It just so happens that Sami's post was my introduction to Java Serialization vulnerabilities.
The core problem really stems from the idea that OO models encapsulate data and behaviors. Behaviors mean code execution - so, anything that will deserialize objects is giving the person who serialized them the ability to control the execution flow. If this is a listener on the network, than things are really bad :-)
So, it's great that a set of gadgets have been removed, it's neat to see the application of resources to make that happen. I have to agree with Ben, that any system that relies on object serialization from untrusted sources (in any language) is still vulnerable, it just might require a more specific gadget chain. Too many vendors have fixed their products by just updating the library and not removing the dependency on dangerous object deserailization.
"So replacing your installations with a hardened version of Apache Commons Collections will not make your application resist this vulnerability."
Google has already shown leadership in this regard, by making Protocol Buffers open source. Protobuf is a library that has served our company well. We use it at all layers of the stack. BigTable stores them. gRPC transmits them. Business logic operates on them. Closure Templates render them. Many developers outside Google have chosen to embrace this technology. We hope that it has helped them keep their users secure, just as it helps keep Google users secure.
We considered mentioning this in the blog post. We decided against it. The goal of Operation Rosehub was to take simple steps that will keep people safer in an imperfect world. Suggesting that people change, or that they should adopt our way of doing things, seemed orthogonal to our mission.
However there are developer evangelists working in the company who try their best to communicate the benefits of using Google development technologies. We support their efforts too.
It raises a lot of questions about what sort of transformative spectrum (excuse pun) would be applied here though. It's is incredibly abstract as presented.
But even at the abstract level, the one thing I know would absolutely happen for sure is that the fixes that made the biggest difference would be hand-waved out of existence by infecting them with viruses, creating scare-campaigns, etc.
Source: I've learned a lot about Big Pharma over the past 10 years as I've quietly found real solutions to my own mental health issues. I'm sadly too scared to share what I've found and I keep seeing products disappear off the market or suddenly attract customs/overseas shipping issues. Suffice it to say that the medical industry is opposed to anything they can't patent - and that, as an industry, it must ensure its own survival. Interpret that any way you see fit.
I still think it’s a good idea. It would be even better to search for a few C pitfalls more, but strncpy is probably the easiest to search for.
But I don't see where it discusses sending PRs to affected repos, only detecting them.
Thousands of volunteers work in the saltmines and get nothing.
Business as usual. Myths like "Google sponsored Python!!!" propagate when they do nothing at all.
I think you're asking the wrong question. This wasn't some top-down directive from some VP trying to come up with ways to make Google look good in the open source community. This was a bottom-up effort, that happened simply because Justine wanted to do it, and the easiest way to get it done was to recruit other like-minded engineers to help her rather than having to do it all by herself. She would have done it at any other company that allowed her (though, knowing her, she would've done it regardless).
As engineers we have agency. The decisions that I make in my day-to-day work are mine, not my employer's. I can directly impact and affect lots of things, and the only motive you need inquire about to explain it is mine.
If you've already got the data conveniently preloaded into a SQL database for you, and all you need is a very simple SELECT statement with two WHERE clauses ... why would you use anything else? Spinning up an entire graph database unnecessarily seems like over-engineering.