A bit far-fetched but: Have you (or anyone else at Google) looked at Amazon Builder's Library  and/or various re:Invent / re:Inforce talks from 2018/19  that focus on similar topics as in this book and other SRE books? If so, what are some ideas (infrastructure, blast radius, incident management, resilience, recovery, deployment strategies, crisis management, disaster planning, aftermath etc) you folks think that contrast / complement Google's approach to building hyperscale systems?
$ tar -zxvf srs-epub.epub
gzip: stdin has more than one entry--rest ignored
tar: Child returned status 2
tar: Error is not recoverable: exiting now
I haven't looked at those other resources, but I'll ask if others have.
We use it here to help expand people's minds, shifting their thinking from just writing applications to designing for large-scale HA systems, with all the fun pitfalls that lurk in a cloud.
We wanted to write a book that focuses on integrating security and reliability directly into the software and system lifecycle, both to highlight technologies and practices that protect systems and keep them reliable, and to illustrate how those practices
interact with each other.
We’d like to explicitly acknowledge that some of the strategies this book recommends require infrastructure support that simply may not exist where you’re currently working.
Because security and reliability are everyone’s responsibility, we’re targeting a broad
audience: people who design, implement, and maintain systems. We’re challenging the dividing lines between the traditional professional roles of developers, architects, SREs, systems administrators, and security engineers.
Building and adopting the widespread best practices we recommend in this book requires a culture that is supportive of such change. We feel it is essential that you address the culture of your organization in parallel with the technology choices you
make to focus on both security and reliability, so that any adjustments you make are persistent and resilient.
We recommend you start with Chapters 1 and 2, and then read the chapters that most interest you. Most chapters begin with a boxed preface or executive summary
that outlines the following:
• The problem statement
• When in the software development lifecycle you should apply these principles and practices
• The intersections of and/or tradeoffs between reliability and security to consider
Within each chapter, topics are generally ordered from the most fundamental to the most sophisticated. We also call out deep dives and specialized subjects with an alligator icon."
(disclaimer: I work at Google)
(disclaimer: I worked on the book)
(Book author here)
In the meantime, you can open it in a browser and email it to yourself. Not ideal, but a workaround.
No, the hard thing for everyone is to recognize is that most companies are not Google and don't have Google's problems, resources, or time to follow these practices.
Definitely read the material, I will thoroughly, but don't apply this blindly. Solve YOUR problems, not theirs.
While it's technically true that the advice would apply to startups in the sense that it would improve their reliability, the elephant in the room is that it doesn't matter. The engineering skill at a startup is understanding what's actually critical, and this book doesn't speak to that.
In reality, this question is almost always instantly answerable. You're either still building out your MVP and desperately need customers to validate your idea, in which case the answer is "No", or you're an established startup with runway and a growing customer base, in which case the answer is "Yes".
I think it's best startups are provided with the most tools/options based on their priorities -- including the underlying lessons this book attempts to deliver - is the right path. Then it's up to their values and priorities.
Ignoring my startup experience (as they are all security-related and therefore took it serious), I believe startups that are handling any amount of customer data should be looking at security very seriously.
Now whether or not they do take it seriously is another problem, that doesn't mean the opportunities and advice shouldn't exist.
>I believe startups that are handling any amount of customer data should be looking at security very seriously.
What you believe has no bearing at all on the cost/benefits of running a business. In the current regulatory environment, leaking customer data in the US costs less money than losing one big customer for a b2b startup. Guess what that means when it’s time to decide to work on a feature for a specific customer or to do a full source code audit of all dependencies for vulnerabilities?
99% uptime is 14 minutes of downtime per day. There are an awful lot of processes and even whole businesses that can eat 14 minutes of downtime a day. Especially if it's not a full outage.
In fact, I'd argue the risk factor is significantly different across startups, so exploring the tradeoffs is the only way to approach the problem generically.
(Disclaimer: I work at Google and was involved with some aspects of the new book.)
I love the part in that other SRE book where they say to “keep it simple” right after describing probably the most involved, meticulous and vast set of software engineering practices of the last 10 years.
“Simple” if you have 10B+ in the bank and 1000+ engineers to run the show.
I'm just looking for an easier way to communicate core principles and concepts to my team without asking them to sink into 500 pages?
The second book (the SRE workbook) is more prescriptive, walks through practical ways of implementing it.
The most base description of SRE principles is simply that:
1) You automate aggressively and develop or use self-service tools as much as possible (over ops work)
2) you define what “availability” really means; institute an allowance of errors based on budget. Highly reliable systems should get much more attention and budget than lower requirement systems. Make an SLO dashboard; alert based on your “error budget” being eaten too quickly.
3) try to avoid allowing your staff to work more than 50% on operations work; that’s your indicator for being overloaded.
SRE Workbook (https://landing.google.com/sre/workbook/toc/): Chapters 1, 2, 5, 6, 8, 16, 17, 19, 20, 21, All the Appendixes
Disclaimer: I work for Google
Can any mod here restore the comments?
The first book is a solid overview of how Google does SRE and outlining each of the various concepts (error budgets, blameless culture, etc..). The second is more of a practical guide on deploying SRE into an organisation, a lessons learned type of book.
(I work for Google but not SRE, just enjoyed reading the books)
If your leadership declines to take you up on this, escalate. If that fails, you must choose between continuing to do the repetitive operational work as instructed or leaving.
There is never some big meeting where everyone decides that 'operations are going to change! From now on we are going to X!" Well sometimes there are such meetings, but often means merely adding another demand. Like a new years resolution, it is a strongly voiced command for better results, with no real changes behind it.
Real improvement is about adding practices and habits. Starting with the most needed/highest payoff items and gradually building on them.
Do Googlers get time to fix things?
In the parts of Google I've seen, engineers are largely given projects with a timeline of weeks or months and left to structure their time themselves. You're usually given more projects than can feasibly be accomplished in the time allotted, and learning which are high priority and which to let slide a quarter is a bit of an art, but as long as your projects are moving forward you can usually structure some time for paying down tech debt or a side project, either an official 20% or just experimenting with your area of responsibility outside any roadmaps.
I tend to structure my weeks with Tuesday as the designated day to fix anything that's broken or, if everything is running smoothly, to pay down tech debt more for the opposite reason, so that I don't get caught in a rabbit-hole of fixing things that aren't broken and neglect to make progress on the new work.
Given that the usual ration is 100% : -100%, 50:50 is going to be helpful in escaping capability traps.
Ultimately, Google engineers can move between teams pretty freely, so if we allow a team to descend into an operational or a deadline-induced death march, and we don't address that quickly, chances are the engineers will move to another team. It's sometimes frustrating not to have more control as a manager, but it's a very nicely self correcting mechanism and fixes various incentives for us managers.
(I'm a manager in Google SRE. Not speaking for Google.)
My reference to "capability traps" wasn't accidental, it's a serious risk to any business where maintenance and improvement has low observability vs production outputs. In that situation economists rightly predict that effort is skewed towards what can be observed more easily ("equal compensation principle").
Under those conditions it's easier to fix a time-spent target and observe time allocated, even if only approximately.
I feel your pain.
Copy/paste from the preface: "We recommend you start with Chapters 1 and 2, and then read the chapters that most interest you. Most chapters begin with a boxed preface or executive summary that outlines the following:
• The problem statement
• When in the software development lifecycle you should apply these principles
• The intersections of and/or tradeoffs between reliability and security to consider
Within each chapter, topics are generally ordered from the most fundamental to the most sophisticated. We also call out deep dives and specialized subjects with an alligator icon."
A bit surprised to see that in Building Secure and Reliable Systems, as far as I can tell the word reliable isn't given a precise definition, even when it's contrasted with security.
The preface opens with "Can a system ever truly be considered reliable if it isn’t fundamentally secure?" but the terms don't seem to be clarified anywhere. It appears not to mean the same thing as service availability, given the section on the CIA triad.
Am I missing something terribly obvious?
Disclaimer - I work for Google and worked on this book.
You can check out the whole O'Reilly menagerie at https://www.oreilly.com/animals.csp.
> I ask the authors to supply me with a description of the topic of the book. What I am looking for is adjectives that really give me an idea of the "personality" of the topic.... Sometimes it is based on no more than what the title sounds like. (COFF, for example, sounded like a walrus noise to me.) Sometimes it is very much linked to the name of the book or software. (For example, vi, the "Visual Editor," suggested some beast with huge eyes).
My guess is the animals of the series is kind of decided by the first animal. Several Kubernetes (operators)/Cloud-native books are with birds on front page.
Well, that actually makes sense since Python derives its name from Monty Python and not the reptile.
That is not my experience. Yes, the most simple SQL injection a newbie attacker would try, is running a query directly on your database using stuff like" ' OR '1' =='1' "
However, one can do a lot of other things like getting the schema, table names and the actual data in the tables by observing the answers and timing. When I did my master's degree, on the course about database security the teacher said there isn't any mean to 100% prevent SQL injection.
There are other means to protect data, like not using a single app user to access the database, use security rules at database level together with security rules at app level.
One clever trick is to return fake data if you detect a smart ass is trying to access data he shouldn't, rather then tell him he is forbidden. Let him enjoy his fake data. :)
That sounds like a lot of effort for something that should never happen if your real security systems are working, and a huge problem if something breaks and returns fake data to real users. It would look like their accounts have been compromised which is far worse for the business than any amount of enjoyment you might get messing with an attacker. Honeypots are useful in some very specific situations, but you need to be really careful where and how you implement them. Generally, leave them to the network security team.
In my experience anything you do that tries to be 'clever' is a bad idea. Implement the simplest possible solution that solves the problem, otherwise it's going to blow up in your face one day.
If that's the strategy you want to use that's up to you, but I think it's immensely risky and provides no practical benefit.
I'm curious what sort of injection gets past parameterized queries they talk about in your master's degree.
If someone does have a counter example for parametrized queries, I'd be curious.
great marketing google!
seriously - what is going on at google nowadays? has it always been like this?
EDIT: looks like this comment didn't resonate well with some readers.
But r̶e̶l̶e̶a̶s̶i̶n̶g̶ posting this hours after a huge outage that affected most services for over an hour and also less than 12 days after a similar multi-hour outage seems somewhat ironic.
EDIT: guess I hurt someone’s feelings.
That's the basic premise of an "error budget".
As a user of their cloud services, my perception of their reliability is pretty low compared to competitors. I still like GCP the best though.
I guess we tend to notice more the flaws of the services we use the most
It just promotes bad mindset
Le*v*andowski != Le*w*andowski
(I'm Piotr Lewandowski.)
NOTE: As a heavy user of GCP we we're affected by the three most recent outages (GCIC20005, GCIC20004, GCIC20003), but I definitely feel for those that were impacted.
In particular, I am interested in specific projects or initiatives they directed or lead. The state of systems before and after these projects. If there were any long-term regressions after their involvement.
To be even more concrete if possible:
1. What was the project and what would occur in the event of unmitigated compromise?
2. What was the threat model?
2a. Why was that the appropriate threat model given the possible outcomes?
4. What level of resources would be necessary to compromise the systems they were trying to protect?
4a. Would the system prevent compromise by a red team with a $1 Billion, $1 Million, $1000, $1 budget?
4b. What resources did the red teams have?
1. Would you feel comfortable using the processes you have used in the past to develop a system where compromise would result in the loss of human life?
2. If you answered yes, what project and process and why do you believe that it sufficient?
3. If you answered no, do you have any first hand knowledge of systems that achieve that standard?
4. What is the best system that you have first hand knowledge of that has achieved at least that standard? Is there a non-theoretical gold standard?
I think a cursory LinkedIn or social media search for any of the title authors or chapter authors will demonstrate their credentials. There were many people involved in this book, all of whom carry the necessary credentials and experience.
> Personal questions for the responder:
Guidelines help us scale, but at the end of the day, some services are unique and require additional review. I recommend reading the bits on threat modeling for more information.
Yes, guidelines are not the end-all-be-all and you can never be sure, but when a civil engineer approves a bridge, they assert that they are confident that human lives can be trusted to the bridge (in certain configurations). They can do this with reasonable confidence because they have seen systems that have stood the test of time that prove out the techniques that they are applying. That is what I am interested in, do you/they have that level of confidence? What justifies that confidence? What systems prove out the techniques that were used? Did any techniques they invent stand the test of time (this provides evidence they can invent new techniques)?
Are there any such systems deployed in the world today?
As an example of consequences of a belief that it is not possible:
The JP Morgan hack resulted in the loss of 76 million records. That means if each person's record is worth more than $14 it would be profitable for someone to hack JP Morgan if they had a system that could not prevent compromise by a red team with a $1 Billion budget. Given your question, I will assume you do not believe that such a system does not exist, in fact, you probably believe there is no system that is even in the general vicinity of that number (apologies if I am misinterpreting your statement). If we assume a $1 Million budget is all it takes, then each record would only need to be worth 1.4 cents for it to be profitable to hack JP Morgan. How do you think people would feel about that? Do you think it would be problematic for JP Morgan if they announced they protect every account with $14 of security, let alone 1.4 cents?
Now the JP Morgan hack is a little old, it happened in 2014, so let's use something newer. In 2019 Wells Fargo lost over 24 million records. At $1 Billion that is $40 per record. At $1 Million 4 cents. Do you think it would be problematic for Wells Fargo if they announced they protect every account with $40 of security, let alone 4 cents?
I don't think that's a fair question. Life-or-death software systems (such as avionics) are built using very different methodologies than ordinary software, and their development costs (and times) are orders of magnitude higher. If ordinary software were held to the same standard, almost no software development projects would pay for themselves.
I go one step further in asking for third or first-party confirmation of expertise in this interactive forum because self-made blurbs are easy to manufacture, and credentials and experience can be gamed. It is much harder to distort the opinions colleagues or people who use their systems every day after they have left. I ask the more detailed questions because I find that answers to general questions are usually pretty wishy-washy and thus not particularly useful. The detailed questions ask for more detailed metrics and observations that make it easier to sift out useful information. For instance, I ask what projects they lead or directed since I want to filter out cases where they may have been on important projects, but in an unimportant role or possibly even being carried by the rest of their team.
In other words, your question is misformed. The abilities of the list of 3 technical authors in this case isn't relevant, the question that matters is if you believe Google's security organization and apparatus is competent.
If you do, then the specific individuals who could or could not "game" experience is irrelevant, what matters is that the book was written and reviewed by multiple people who all generally agree on the guidance.
A security organization is made of people, so the question as to authors is still relevant, it is just more numerous and hopefully better than the sum of its parts. In my mind, an adequate answer, which you have no obligation to give, would highlight individuals who have material authority over the content of the book and what they have done in specific and how that indicates an understanding of developing and deploying secure systems. Even better would be their personal confidence level on the capability of those systems and how secure they believe they are in quantitative terms. This provides a falsifiable statement about their capabilities.
For the question of the organization's competence, I use my default opinion on information security organizations on it as I have no knowledge as to the internal capability or competence of Google's security organization other than through public information, hearsay, and extrapolation from my own experiences of the information security industry. By default, given the rest of the information security industry, I see no reason to believe in the competence of any security organization. I base this on personal experience working with people in security organizations, the regular reports of large organizations (that I and likely the average person would naively assume to be competent) being compromised trivially, and the lies that most organizations tell about their security before, during, and after breaches. These experiences lead me to default to non-trust and distrust organizational reputation in favor of specific concrete examples showing capability which is why I asked for such.
I hope this adequately explains my viewpoint.
Even if you were provided a list of their projects and their involvement, how would you know their competence without reading code and understanding implementation details?
Further still, can you even trust yourself to provide a competent assessment of their designs and implementation?
A single security breach should not invalidate ones credentials more than a single lost patient should invalidate a Dr's medical license, with the exception being cases of gross negligence or malfeasance.
In lieu of that, the questions I mentioned above were meant to be reasonably objective questions that would help me identify if an organization seems to be qualified in my mind without relying on reputation. Do you think any of those questions seem invalid? I tried to make them procedure agnostic to avoid discounting techniques which I am not aware of (so no, must use SHA, 2FA, antivirus, SGX, etc.). If you are willing to, are you able to identify any individuals you think have give good answers for most of those questions? Are there any questions that you think are unfair?
For instance, "Would you feel comfortable using the processes you have used in the past to develop a system where compromise would result in the loss of human life?". If the answer is no, I would not consider them an expert as there are many systems that are deployed today where compromise would result in the loss of human life (note that this does not mean those systems are appropriately secure). As an analogous case, if I found a civil engineer who made multiple bridges and then asked them, "Would you feel comfortable if any of the bridges you made in the past were used by humans?" and they said no, I would not be asking them for advice on making bridges for humans since they have never done it before.
If they answer yes to the question, I want to know why as I believe the default answer for most people is somewhere between "no" and "are you crazy". A really good answer would reference systems where they are confident in security, where compromise is valuable, have stood the test of time, and have withstood deliberate attempts to compromise against attacks in the vicinity of the value or compromise. For instance, "Person X did the security for Y bank. Y bank has a $500 Million bug bounty for system compromise (which is not onerous to collect)." would be pretty convincing. If your answer is a $500 Million bug bounty is absurd, you can look at some of my other replies under my original comment for why I believe that is actually too small.
If your answer is then that nobody is an expert, then my answer is that nobody should develop these systems since nobody knows how to achieve the minimum standard of safety. If it can not be done safely, then it should not be done at all no matter how hard you try if human lives are involved.
For example, here are four statements I believe to be true:
1. Google has a track record of transparency when security is compromised.
2. Google has a better-than-average track record of detecting security intrusions.
3. Google is the target of state level actors.
4. Google has not recently publicly acknowledged any successful attacks by state level actors.
From these, one could reasonably conclude that Google is adept at rebuffing nation-state level attacks. Putting specific $$ values on things is a bit reductive, since at some point the weakest link is paying every member of the security team $100 million instead of any form of technical attack.
> If your answer is a $500 Million bug bounty is absurd, you can look at some of my other replies under my original comment for why I believe that is actually too small.
You're making a (fairly common on HN) mistake of assuming that the bug bounty value is going to be anything near the value of the compromise. If I can extract $X from a compromise, I'm not gong to pay you $X, I'm going to pay you $Y, which is less than $X, and probably much less than $X. The market for compromises isn't enormous, so it may not even be possible for you to sell your compromise to anyone but me. So then if you turn to the bug-bounty offering company and try to sell your compromise for $Y, you're committing blackmail, so the companies offer $Z, < $Y.
So yes, I think you have deep misunderstandings of the state of the security industry and those are miscoloring your mental models.
> If it can not be done safely, then it should not be done at all no matter how hard you try if human lives are involved.
This is blatantly ridiculous. Risk can't be fully mitigated, and most systems aren't life critical. You're jumping from "a bank was hacked and people's personal information disclosed" to "this kills those people", which isn't a reasonable jump.
I actually can't believe that someone's response to "are there people capable of speaking on software security" is "we shouldn't attempt to build secure software, because it's too hard".
I was trying to avoid being over-pedantic. By safely, I mean mitigating the risk to an appropriate level. So, to reword the statement. I mean, "If the risk can not be mitigated to an acceptable level, then we should not do that thing." The appropriate level of risk depends on the action in question. If it involves human lives, then society should evaluate the acceptable amount of risk. If it is bank accounts, then it is up to the bank, customers, society, on the appropriate amount of risk to take on.
My response is a IF X, THEN Y, so should be read: "IF you do not believe anybody can reduce the risk of software in systems that can kill people to a societally acceptable level, THEN we should not use software for those systems." The statement takes a belief as an input and outputs what I believe is a logical output. I frankly find it hard to disagree with the inference since it is nearly tautological. Like, "I do not believe anybody can reduce the risk of software to a societally acceptable level, but we should build such systems anyways." seems like something almost anybody would disagree with. So the primary concern is if you believe the antecedent. Personally, I believe software that reduces the risk appropriately can be made, so I believe we can make software to manage these systems which is contrary to what you think I believe.
My bug bounty point is actually that a long-standing large bounty provides reasonably strong evidence that the difficulty of compromise is on the order of the bounty. Consider, if somebody offered a $500M bug bounty that is easy to collect and nobody collects it for 10 years, I think this provides strong evidence that it is really hard to compromise, possibly on the order of $500M hard to compromise (the main problems at these scales is that $500M is actually a lot of capital, so there is significant risk involved in actually getting into the general vicinity of investment, so you might actually be limited by capital instead of upside). This is consistent with my original point which was providing an example of a concrete indicator of expertise that I, and likely others, should agree is convincing.
What I meant by $500M is too small is that $500M is a small number relative to the expected difficulty of compromise someone might expect from these institutions. For instance, a large bank can have 100M customers. So, if a total breach could get all of their information for a cost of $500M (which I actually believe if way, way too high), it would be "economical" for a nefarious actor if the per-customer data was only worth $5 (obviously they would want profit, so more, etc.). I don't think a big bank would advertise that in their commercials, "Your data is safe with us, unless they have a $5.", let alone the massively lower number that I actually believe it is. Obviously I do not mean that any specific person can be targeted for $5, just the bulk value rate of a large hack for it to be economical.
Putting a $ value on things is reductive, but also quantitative which I find very useful. Using your joking example, if the weakest link is paying every member of the security team $100M and they probably have thousands on their security them, then that is ~$100B which I will accept as being excellent security in their problem domain. However, by putting a number on it, it makes it easier to see if that number sounds like what we want. Since you said it in jest, I will assume you think that is a ludicrous number. However, if you apply that level of security to a different problem domain, such switching all US cars to self-driving cars (not that I am suggesting Google is suggesting such a thing at this time), I think that is far too little. If the entire US used self-driving cars and you could get a total system compromise, you could drive millions of cars into oncoming traffic within a minute killing millions. With such a risk, the number that you think is impossibly high is not enough to mitigate the risk acceptably in my mind. So logically, the actual state of security, which you believe to be less than your impossibly high number, is not enough and indicates that we are very far from achieving the necessary level in my mind. Obviously, you can disagree with this statement by determining that the problem is smaller than I am saying, the chance is low, the socially acceptable risk profile is different (we accept small chance of catastrophic failure vs. the status quo of frequent small failures), etc. this is just an example.
I have no concrete information with respect to statements 1 and 2 you mention. As a result, I do not conclude what you conclude. Why do you believe statements 1 and 2? In fact, both seem inherently hard to demonstrate convincingly since both are related to knowledge of an unknown variable.
For statement 1, how do you evaluate the transparency of a system when you are not privy to the underlying rate? A possible strategy is a sting-like operation where some entity, unbeknownst to them but authorized, compromise their systems and evaluate if they are transparent about it. You might need to run such an operation multiple times to get a statistical amount of information. Has such an operation been done? Another alternative is a trusted third-party watchdog which is privy to the underlying information. Are there such entities with respect to Google internal security? Another one I would find highly credible is a publicly binding statement indicating that they disclose all compromises in a timely manner and being subject to material damages, lets say 4% of global revenue like GDPR, if they fail to comply. I am sure there are other strategies that would be convincing, but these are some I could come up with off the top of my head.
For statement 2, how can you tell they have a better-than-average record when even they are not privy to the underlying rate? It is pretty much like trying to prove a negative which is notoriously hard. My primary idea would be to subject them to various techniques that have worked on other entities in the past and see if they defeat or at least detect all of them. Given that nation-state actors such as the CIA can allocate billions of dollars to such attacks, does Google have a billion dollar red team to simulate such attacks? Do they routinely fail? Another strategy is proving that their systems do get compromised. Vulnerability brokers routinely have zero day vulnerabilities for Google Chrome and Android. This demonstrates that systems developed by Google have compromises that Google is not aware of. Those are high profile projects, so it seems reasonable that they would be representative of Google security as a whole, so if we extrapolate that to other systems then those systems likely also have vulnerabilities known to someone, but not known to Google which can be used to compromise their systems. State-level actors are some of the main clients of vulnerability brokers, so them being able to compromise systems, but not be detected is a highly likely outcome in my mind. Is there some reason that Google Chrome or Android are not representative of the security used by Google on their other systems? If so, why, since Chrome and Android seem kind of important in my mind. Do you have other ideas for how to know if they are detecting all state-level actors?
I largely agree with your first point that I will probably not get answers that are materially different than PR speak. I'm not really sure how that is related to the questions themselves being interesting or not, anybody can give a boring answer to an interesting question. My goal is questions that, if answered honestly, would elicit useful/filtering responses. I can see an interpretation where the questions are boring because nobody will give them an honest response, but that is half-orthogonal to the question itself.
I think you're just grossly overestimating the "risk" for most software.
> I glossed over the point that the bug bounty should generally be order of magnitude the cost of discovery
The bug bounty is the order of magnitude of the cost of discovery. Otherwise freelance security vulnerability finders wouldn't do what they do. There's a market price for vulns when found by non-nefarious actors, that is approximately the bug bounty.
> Consider, if somebody offered a $100M bug bounty that is easy to collect and nobody collects it for 10 years
I really, really don't think you understand how most bug bounties work. Most of the time, the bugs that are bountied aren't entire exploit changes, but single exploits. Further, once you have an exploit chain, that doesn't make you money, you still need a plan to exploit it. So if a bounty of, say 100K is offered, the actual value "protected" might be an order of magnitude larger, since you're options are either "file a bug report" or "avoid detection while you can figure out a nefarious thing to do that is profitable and then execute that thing and get out".
Most enterprises have (or believe they have) defense in depth mechanisisms, so once an exploit exists, the potential risk still isn't 100%.
> I have no concrete information with respect to statements 1 and 2 you mention.
For both 1 & 2, "Operation Aurora" is perhaps the best public information. Google was one of the only companies that detected an intrusion, and was the first to make notice of it publicly. I'm not suggesting that Google's track record is perfect. I'm merely suggesting that it is (at least going by public data) better than pretty much everyone else.
Because, importantly, if Google is better than everyone else at security, we should listen to them, even if they aren't "perfect".
> If so, why, since Chrome and Android seem kind of important in my mind. Do you have other ideas for how to know if they are detecting all state-level actors?
As a general rule, it is much easier to monitor a closed system (like Google's datacenters) than an open system (like "the android ecosystem").
> My goal is questions that, if answered honestly, would elicit useful/filtering responses.
Mostly, they don't seem related to anything remotely real world. They, as I've said, seem to rely on a vulnerability market that doesn't, as far as I know, exist. And the people who do know just aren't going to answer. So they're useless in practice. Further, as I mentioned previously, they make perfect the enemy of better. If your bar for proselytizing about security best practices is to be perfect, there's no way to learn from people who have better security practices.
And there's decent evidence that Google has better security practices than pretty much everyone else. (cros vs. windows, android vs. ios, gmail vs. anything else, chrome vs. any other browser, etc.) I don't think there's a single category where Google's offering is clearly less secure. And there's quite a few more where it's clearly better. Not to mention corporate/production systems like borg and BeyondCorp.
That does not mean that a "better" inadequate solution is not a path to adequate, it could very well be if the "better-ness" can scale all the way, but that is hard to judge. One strategy for doing so is trying to estimate how far you are from good, that is the point of quantitative analysis. Using the tree to the moon example, if you could estimate the distance to moon, you would quickly realize that every tree you know of is pretty far from the moon, so maybe a different strategy is warranted. In this case, I want to estimate the "security" of an adequate solution. Is $1 enough, $1K, $1M, $1B? For what problem domain? How far are "best practices" from that goal? That is how I would decide if best security practices are worth listening to. The other point of quantification is to compare systems, you claim Google has better security practices, how much better? 10%, 50%, 100%, 1000%? That would change how compelling listening to their practices over others would be.
As you stated above, bug bounties are order of magnitude cost of discovery which, in my opinion, is a reasonably good quantitative security proxy. The Pixel 4 kernel code execution bounty is up to $250K. The iOS kernel code execution bounty is up to $1M. That appears to indicate that Google's offering is less secure by this metric. Even ignoring that, is $1M enough protection for a phone model line (since a bug in one is a bug in all, so a zero-click kernel code execution could potentially take over all phones, though in practice it will probably not even assuming such a vulnerability were used to achieve mass infection)? There were more than 200 million iPhones sold last year, so that is only a per-phone value of 0.5 cents, is that an adequate amount of security? Personally, I think no and I would bet the average iPhone buyer would be less than pleased if they were told that (still might not change their buying habit though). What do I think is adequate? Not sure. $50 is probably fine, $5 seems a little low, $500 is probably high since that is approaching the cost of the phone itself. If I use $50 as the metric of adequate, they are 10,000x off from the measure of adequate which seems pretty far to me. Think about the difference in practices between $100 and $1M and that needs to happen again, how do you even conceptualize that? Even at $0.50 they are still off by a factor of 100x, 1% of adequate from this perspective.
On the point of overestimating the "risk" for most software, I half agree. I believe the truth is that almost nobody cares about security, so the cost of problem for insecurity is almost nil. Companies get hacked and they just keep on going their merry way, sometimes even having their stock prices go up. However, I believe this is also an artifact of misrepresenting the security of their systems. If people were told that the per-unit security of an iPhone is 0.5 cents, they might think a little differently, but instead they are told that the new iPhones are super, duper secure and all those pesky vulnerabilities were fixed, so it is now totally secure again, this time we promise, just ignore the last 27 times it was not true.
On the other hand, large scale systemic risks are massively underestimated. Modern car models are internet connected with each model using a single software version maintained through OTA. This means that all cars in a single model run the same software meaning that bugs are shared on all the cars. If a major vulnerability were discovered, it could potentially allow take over of the steering, throttle, and brakes by taking over the lane-assist, cruise control, and ABS systems. If this is done to all cars of a given model at the same time, it is extremely likely that at least thousands would die. Even ignoring the moral implications of this, that would be a company-ending catastrophe which puts the direct economic cost of problem at value of the company which is a few billion to tens of billions for most car companies. Again, $1M is pretty far from this level, and there is no evidence that such techniques scale 1000x. Any solution that only reaches the $1M level, even if it is "best practices", is not only inadequate for this job, it is criminally negligent in my opinion and I believe most people would agree if it were properly explained to them.
And again, you consistently overestimate the value of a hack. You're not going to get root on every device. So the idea that apple is spending 5c per device isn't correct.
Again, you're overestimating the risk by imagining a magic rootkit that can simultaneously infect every device on the planet. That's not how things work. It lets your imagine these crazy values of a hack, but again: that's not how things work.
If it did, you'd probably see more hacks that infect everyone so that some organization can extract minimal value from everyone. But you don't see that.
Why? Because that's not a realistic threat model. State actors who, at this point are the only groups consistently capable of breaking into modern phones aren't interested in financing. They're interested in targeted attacks against dissidents.
So anyway, what makes you believe that Googles safety isn't adequate for it's systems, since at the moment anyway, they aren't manufacturing cars.
I stated in a parenthetical that I did not believe they would actually root every device in practice. I used numbers, you can change the numbers to whatever you believe. If someone wanted to mass infect devices using a zero-click kernel code execution, how many do you think they would be able to infect? Let us call that X. $1M is order of magnitude the cost of discovery (since bug bounty ~= cost of discovery) for such a compromise on iOS. Divide $1M / X, that is the per-unit value. Does that number seem good? I said $50 is probably adequate. Therefore, for that to be adequate given this model, you would need to expect a zero-click kernel code execution deployed to mass infect would to infect 20,000 or fewer phones. Do you believe this is the case? If so, then your logic is sound and in your mind iOS security is adequate. It is not for me, since I do believe it would only infect 20,000. As a secondary point, that is only the cost of the compromise with no infrastructure. If they spent another $1M developing techniques for effective usage of the compromise such as better ways to deploy, infect, control, hide, other compromises, etc. how many do you think they would be able to infect? Let us call that Y, Y >= X. In that case I would do $2M / Y to determine the adequacy.
As a counter-example, large-scale ransomware attacks which extract minimal value from large numbers of people occur and have been increasing in frequency and extracted value. Why aren't there more given how easy it is? I don't know. Why didn't somebody crash planes into buildings before 9/11 or drive trucks into people before the 2016 Nice truck attack? These attacks were not very hard, possible for decades, and seem to be highly effective tools of terror, but for some reason they were not done. Hell, it is not like anybody can stop someone from driving a truck into people right now, why aren't terrorists doing it every day given we saw how effective it is and how hard it is to prevent? My best guess is that so few people actually want to engage in terrorism or economic hacks that only an very tiny fraction are done at this time.
This leads into the next point which is that state actors are not the only entities that CAN break into phones; financing a $1M cost of discovery is chump change for any moderately sized business. The government is just one of the few entities who want to as a matter of course and face minimal repercussions for doing so. If you are not the government and hack people for financial gain you can go to jail; not a very enticing prospect for most people. This means that the impact of a compromise is not different, it is just less probable at this time. However, that is not a very comforting prospect since it means you are easy prey, just nobody is trying to eat you yet. And this ignores the fact that since any particular target is easy, if someone is targeted in particular they are doomed. Essentially, not being compromised is at the mercy of nobody looking at you funny because if someone wants to compromise you they can. To provide an example of why this is a problem, if I were a terrorist organization, I would be running an electric power generator and transformer hacking team with an emphasis on bypassing the safety governors and permanently destroying them. It does not matter that there are more economic targets to hit, as long as they choose one in particular they can cause incomprehensibly large problems.
As for Google's security, if I use my default security assumption (based on experiences with other security organizations) that a skilled red team with $1M would be able to compromise and establish a persistent presence with material privileges and remain undetected for a week, then I believe their security is inadequate since I believe such an outcome would easily be able to extract $1M, let alone damage if the goal were just destruction. If the goal were pure damage, I believe that such presence, properly used, should be able to cause at least $100M in damage and I would not find it unreasonable if it could cause $10B in damage if the goal was optimized damage to Google in both material and market ways with no thought for the consequences if caught.
To separate this out for you, there are two primary statements here:
1. The damage that a skilled red team can cause assuming it has a persistent presence with material privileges and remains undetected for a week.
2. The cost of creating a persistent presence with material privileges that remains undetected for a week.
I assert (1) is probably ~$100M. I assert (2) is my default of $1M. Any situation where (1) is materially higher than (2) is inadequate in my opinion, so a convincing counter argument on your side would be convincing me of a value for (1) and (2) where (2) is higher than (1). I find it unlikely you would convince me of a lower number for (1). So, you would need to claim that (2) is ~100M for Google. If you believe so, what is your justification? The minimal standard that would cause me to consider further (not convince, just not directly dismiss), which you are under no obligation to provide, would be: You stating that you talked to an internal security person at Google and they firmly claim that (2) is higher than 100M (I will take you at your word). If you do not know what to ask, you can try: "If you were in charge, would you feel comfortable going to DEFCON and Black Hat and putting out a prize for $100M if anybody could do (1)?". The other requirement is you stating (again, I will take you at your word) that you talked to an internal security person at Google and they inform you that this has been tested internally using a red team with resources in the general vicinity of $100M or more. There are potentially other strategies that might pass the minimal bar, but that is one that I could think of that would be pretty solid. Again, I am not demanding you do so, but if you wish to engage on this point then I don't think any other type of response is particularly productive.
If not, why not? Do you believe that 20000 people would never noticed such a thing over a sustained period?
As for 2: there are public examples (again, Aurora) of teams with more funding being caught in less time. So I think you are underestimating the security capabilities of Google (and similar companies). For example, are you familiar with beyond corp?
5 zero-click compromises. Thousands per week for a total of 2 years before discovery. The 5 chains being: 3 months, 6 months, 10 months, 6 months, 3 months each. At thousands per week, that is 12k, 24k, 40k, 24k, 12k new compromises per chain at a minimum, probably closer to 5x those numbers. Incidentally, at the bottom of the initial post they mention: "I shan't get into a discussion of whether these exploits cost $1 million, $2 million, or $20 million. I will instead suggest that all of those price tags seem low for the capability to target and monitor the private activities of entire populations in real time." which is consistent with my perspective. As a secondary point, I do not claim that Google does not have good offensive security techniques.
Looking at Project Aurora. The wikipedia page states that the attacks began mid-2009 and Google discovered them mid-December. So a potential 6 month window before detection. Google also declares that they lost intellectual property, though the nature of that is unclear, so could be anything from one random email to everything. Given that they already lost information, the attack could have already succeeded in its goal by the time of detection (6 months is a really long time to exfiltrate data, you could literally dump terabytes of data if you had a moderate unimpeded internet connection), "We figured out that we were robbed 6 months ago." is a failure in my book. There is also little information as to the difficulty of the attack. They say it was "sophisticated", "we have never seen this before", but that is what literally everybody says. If you have access to a specific breakdown on the techniques used that would be helpful.
> At thousands per week, that is 12k, 24k, 40k, 24k, 12k new compromises per chain at a minimum, probably closer to 5x those numbers.
That assumes every visitor uses iOS 10-12. Which is...not likely. My understanding is that these sites were likely Chinese dissident forums, and I don't think that iOS 10-12 makes up even half of browsers in china. Nor does it make sense that every user is unique. This isn't to downplay the danger of these attacks, but no you're likely looking at compromising 1-2K devices total when it comes down to it.
But again, you're looking at state actors (not even nation state actors at this point, but like the Chinese version of the NSA/CIA) with hundred million or billion dollar budgets. If those are the only people capable of exploiting your software, you're doing an objectively good job.
As for random commenters, since I can not use the reputational apparatus for information, I would need to get direct information. There are likely people here who work with the stated individuals, so there is a non-zero chance I could get that information. Hopefully random commenters would not lie for no reason, but to deal with that I avoid incorporating information that can not be cross-referenced.
Android, their most popular end-user product, is a security disaster .
Chrome, "the most secure browser in the world", has a huge list of serious vulnerabilities .
The security failures that allowed the NSA to come in were comical. Deliberate choices that Google made are for the most part responsible for the Android fiasco. In fact, Google is one of the behemoths that put us all at -ever increasing- risk in the name of profit and so far have done precious little to reverse course .
They caught the Chinese trying to compromise their systems, and stopped them before they got very far. In the process of investigating it was discovered that the Chinese had totally compromised a ton of other companies in the same operation. Sounds like top-tier security by Google. Nobody's perfect, and defending is so much harder than attacking that they're not even really the same industry.
> the famous slide deck that highlighted lack of encryption inside Google's network - and security intelligence even put a smile face there!
They hadn't reckoned on their own government physically tapping the network cables inside of their datacenters. And it's hard to blame them. Snowden's leaks wouldn't have been so shocking if they weren't, y'know, shocking. Once they added this to their threat model, they went and encrypted all internal traffic.
And Aurora "became public" when Google announced it, it was the other 30+ companies affected by it that kept silent on the issue (some to this day).
It's also not obvious to me that clearpath is more secure than android, mostly because I can't actually find any information about what it is, there's only marketing jargon :/
CVE entries are both a function of security and interest. My github projects don't have any CVEs, not because they aren't woefully insecure to anyone who bothers to investigate deeply, but because no one cares.
"Android" is installed on more devices than any other OS in the world. So it stands to reason that there would be more interest in finding exploits in android than in OS's that are often airgapped or locked away behind firewalls.
So if Google doesn't care what the OEMs do with Android, it definitely shows that Google doesn't care about security on Android as a whole, as long as it can write blogs about how perfect the security in Pixel devices looks like, which by the way are on sale just in a couple of selected tier 1 countries.
That isn't caring about security, what Apple does, it is caring about it.
"my parents pockets" isn't an experimental lab, I don't think.
> So if Google doesn't care what the OEMs do with Android
I don't think I said this.
> That isn't caring about security, what Apple does, it is caring about it.
Open ecosystem, maximally secure ecosystem, pick 1. Android offers equal security to iOS if one chooses to pursue it. That most OEMs don't give a shit about security reflects badly on those OEMs, there's only so much any software provider can do.
"That most OEMs don't give a shit about security reflects badly on Google security polices".
Google can go ask Microsoft how it does make OEMs play by the rules, or legal about how to properly write contracts that enforce such security practices.
Until it happens, how secure a Pixel device might be in theory and Google blog posts, isn't representative of the Android that 90% of the world actually gets to use.
OEMs of what? All the custom forks of windows floating around? The mobile device market doesn't work anything like the deskop market, and you know that.
Unless you're suggesting that the drivers for the networked, LED-light-toting hyper-gaming mouse you can get from Razer is more secure than OEM Android, because that's the closest things I can come up with, and it's laughable.
We're well off track though, the original question was if there was a more secure (implied consumer) os. You mentioned two non consumer OSs, so I think it's safe to say that the answer is no.
My Windows 10 devices still get more security updates than a couple of Asus Android ones I have here lying around about the same age.
You are the one moving the goal posts to consumer OSes, in a failed attempt to protect Google's security story.
Well, if you want to go that way, then iOS has definitely a better security story than Android ever will.
Every iOS powered device has the same security hardware, and update story regardless where in the world it gets bought.
Android, well better have luck with the OEM device, despite what gets written in Google blog posts and demoed at IO.
Any zero days found in Android devices will never be fixed, other than on Pixel and a couple of selected flagship handsets, while everyone else will be naked with their devices.
Thus brokers will have a gold mine on their hands, being able to target thousands of devices without their owners being able to protect themselves, just like Windows XP before SP2 was released.
I can't tell if you have a point you're making, or if you're just trying to disagree with me :/