they had 6 months.
You say that like 6 months is automatically a lot of time. "She had 6 months to give birth". Yeah, only it takes 9, so 6 is short.
Consider the scope and depth of the issue and the fact that they probably couldn't involve too many people on this effort.
Doesn't sound like she was trying at all.
Inversely, though, I'd argue that Meltdown is a relatively small problem! It's strictly around memory usage, cache, and calling patterns. There's not a lot of systems at play, though there's the hard "figure out which order of instructions gets the state machine in a dangerous state" problem. There's a lot less coordination involved than, say, a system call bug that would subtly return the wrong answer half the time and you know that programs sometimes rely on this and others crash because of it.
Some things are hard, other things are hard but at least they're basically math, and math has a bit more determinism involved. Imagine if UX design or debugging strategies could always be broken down into state machines!
[E I see walrus01 already got that]
Except this whole train of thought falls apart once you consider the difficulty of "hooking up 9 mothers to a single fetus". In the same way you down play the difficulty of coordinating multiple teams for a solution around breaking research. Show me a working solution of the former and I'll accept the corollary.
We're comparing Linux and Windows, an operating system that contains 3.5 million files (of course, not just the kernel in this case). That isn't really fair. Code is as perfect as humans can make it, and it certainly does not help that there's so much to take into account.
They revived previous work on this as part of the KAISER work in November 2017, and still had major bugs with it in February 2018 (ie, 4 months later). That's pretty similar to the 6 month timeline mentioned here.
MS (I think) uses IBRS to help with Spectre, and IBRS is not so great. Retpolines have a more fun name at the very least :)
It sucks, but what else can one do?
After a quick web search, I found https://patchwork.kernel.org/patch/9712001/ which records the initial submission of the KAISER patch set at 2017-05-04. The repository at https://github.com/IAIK/KAISER has an older version of the patch set dated 2017-02-24, indicating that work on it had started even earlier.
Finally, the timeline at https://plus.google.com/+jwildeboer/posts/jj6a9JUaovP mentions a presentation from the authors of the patch set at the 33C3 on late 2016. Note that this page puts the submission of the KAISER patch set at 2017-06-24, but I believe that to be wrong; searching the web for "[RFC] x86_64: KAISER - do not map kernel in user mode" finds several mail archives with that message, and they all agree that the date was on May, not June.
That is, even if Microsoft had been immediately warned by Intel (or by Google), the Linux kernel developers would still have had a few extra months of head start, by basing their work on the KAISER patch set. Was it luck, or a side effect of the Linux kernel being used for academic research?
From the meltdown paper:
> We show that the KAISER defense mechanism for KASLR  has the important (but inadvertent) side effect of impeding Meltdown. We stress that KAISER must be deployed immediately to prevent large-scale exploitation of this severe information leakage.
This seemed to work well for Windows audience in the past, also for Linux audience, due to the fact that they have different uses and audiences.
People seem to have segregated into those users that just want stuff working and those that want powerful operating system that allows them to do whatever they want.
At least that was until sometime the Windows 10 came...
What if we outsourced the QA to India?
Come on, software is hard but when you fix a vulnerability and expose a far far worse one, and had months to plan, execute, and test it, then you are most certainly justified in being criticized.
It's not like we're saying the code is shoddy and needs work, which is entirely excusable in a short timeframe. It's that they've left users far worse off in the end then from where they started.
Jumping on the next worst thing does not excuse them either. Nor is taking another analogy to the other extreme helpful at all in this discussion.
A solid pool of talent with complete flexibility resource-wise and a strong critical-level mandate is nothing like a single person with a fixed biological timeframe, with relatively limited resources, no matter which way you'd like to spin it.
The person who discovers the bug may not come up with the best repro case. The person best equipped at fixing the bug may not be best person to track it. Being able to spool up new people on a problem for cheap keeps the whole experience lower stress and generally improves your consistency with regards to success.
If the cost of someone trying a crazy theory is linear in man-hours and O(1) or even O(log n) for wall clock hours you're going to look like a bunch of professionals instead of a bunch children with pointy sticks.
From what I understand, Microsoft has never gotten there. They got too big to fail a long time ago. And certainly wouldn't have for Windows 7.
In human organizations, that glue itself gets incredibly complex and expensive, as number of pieces grow.
People are the plywood, fragile, finicky, and useless if left to their own devices. Management is the middle school kid who needs to take the wood he's been given and make something that will hold up to all the weight that'll be put on top it. In order to do this, he's been given a hot glue gun and enough glue to mummify the entire thing if he so chooses. Most of the kids will rush bullheadedly (or should I say uncaringly) into gluing the sticks together into something that "looks like it should work." They use too much glue, the structure isn't optimized for load handling, and when the day of truth comes, it crumbles down when the bucket that's supposed to hold the weight, destroys it!
What is glue? Whatever management wants it to be. It can be a team leader or a hastily configured IRC channel. In my experience (this includes organizing, delegating, and making sure that 40 devs-et-al get what's needed done), if you choose your sticks right, taking the time to make sure they're not hiding any structural faults, you can make the job 65% easier. If you lament that choosing sticks if difficult, I reply with "it's just practice."
The main issue I've seen, has been the all too common "there are no good managers." Especially in technology. The remedies for this? There's no bandaid. Each manager has to realize his personal shortcomings and fix them. But, to throw up his hands and say "the more people working on a project, the slower it'll get done," is a nice way to say "I can't handle all these people, but I'll excuse that away by saying it's inevitable. It's even industry 'common sense!'"
Not to mention that the whole problem would most likely leak once that many people knew about it.
It's ultimately a matter of talent, resources, and proper management. Which is hardly an insurmountable problem for a major tech company which decades of experience solving world-is-ending bugs.
Nine month analogy is widely known in software development.
Intel had 6 months.
I have had the misfortune of having to pull them out twice in my career - in both cases they offered little in the way of guidance for the particular situation that came up.
The set of unknown unknowns that are typically missed make most of them unless in all but the most narrow of cases, because many companies write them, and then forget them. Especialy if they are as large as intel.
Do test your plans (this is not aimed at you personally zer00eyz - you probably know better than most).
There are a lot of unknowns but the basic model of a real DR plan is pretty sound these days, if you can afford it or wing it in some way. An example:
Another site, a suitable distance away. On that site there is enough infra to run the basics - wifi, a few ethernet ports, telephony etc. There should also be enough hypervisor and storage for that. Some backups are delivered there as well as on site. Hypervisor replicas are created from the backups (or directly) depending on RPO requirements and bandwidth available. The only thing that should be able to routinely access the backup files is the backup system (certainly not "Domain Admins" or other such nonsense". Ensure that what is written is verified.
Now test it 8)
The company in question had a rather large on site server room (raised floor, fire suppression) and a massive generator to deal with any power issues as well as redundant connectivity. This room was literally the backup incase their "real" data center went off line.
The problem is that the room was "convenient" so there were plenty of things that lived ONLY there (mistake one) -
When the substation for the office went, and the generator started everything looked fine. The problem was that no one had ever run the generator for that long... after a few hours it simply crapped out (over heated, problem two).
A quick trip to home depot got them generators and extension cords that let them get the few boxes that were critical back up - however one box decided to not only fault, but to take it's data with it.
This is when I got a rather frantic call "did I still have the code from the project I did?" - they offered to cut me a check for $2000 if I would go home right then and simply LOOK for it.
Lucky for them I had it - and the continuity portion of the DR plan got revisited.
In hind sight after I said I had the code, I probably could have asked them to put another zero on the end of the check and they would have done it just to be a functioning business come 6am.
Thank you - I'm happy to listen to (nearly) everything.
"I probably could have asked them to put another zero" - ahem that's not the IT Consultant's Way exactly. We have far more polite ways of extracting loot. We are not lawyers and should have morals.
"I have a file with 900 pages of analysis and contingency plans for war with Mars, including fourteen different scenarios about what to do if they develop an unexpected new technology. My file for what to do if an advanced alien species comes calling is three pages long, and it begins with 'Step 1: Find God'."
Wouldn't shock me at all if there was very little actual dev work done for the first few months, and then it was all super rushed at the end. Quite possibly the devs with the required knowledge didn't even know this was in the pipeline for months. That's par for the course at every decently large company I've worked at (i.e. 100+ devs), and at a beast like Microsoft I imagine it'd be way worse.
Unfortunately Spectre and Meltdown aren't straightforward and go to the very heart of how the OS works. It's not at all easy to fix this when you have enormous amount of software working on top of it depending on every little quirk your solution provides.
Honestly, Microsoft is really big into automated testing. I'm surprised this slipped through.
volatile unsigned long *ptr = (volatile unsigned long *)0xFFFFF6FB7DBED000;
Also, I tried the command from the orginal article:
pcileech.exe dump -out memorydump.raw -device totalmeltdown -v -force
This creates 5GB file which does look like a raw memory dump. I'm not sure how to interpret this; I don't know what the behavior should be with or without the bug.
CVSS 3.0 base score of 7.8.
$ x86_64-w64-mingw32-gcc meltdown.c -o meltdown.exe
"if, it doesn't crash ..." nope
"if it, doesn't ..." nope
"if it doesn't, crash ... " nope
"if it doesn't crash, the " yep!
"if it doesn't crash the, bug ..." nope
"if it doesn't crash the bug, is ..." nope
"if it doesn't crash the bug is, present" nope.
When it is present, it does help to separate the if and then, particularly in the absence of the word "then".
Without the comma, the prefix "if it doesn't crash the bug" can be scanned as a viable clause, only to find that the suffix becomes a fragment.
One dev or 1000, who cares, whatever they chose did not work particularly well. Are "they" to blame? Yes. Are the engineers to blame? Probably not. Is management the culprit? We don't know.
What's left? Next time your customers bug you about some random downtime caused by an overworked datacenter intern, don't feel stressed. Take the time to remember that even if you would've had billions of dollars, years of experience and thousands of employees, you could've messed up, just like MS did :)
When you are the direct, contracted, IT support for a company then you do have responsibilities. You might be considered responsible for timely delivery of patches - a fair argument in court I think. Mitigations might involve helpdesk logs as well as contracts.
well_done: Your tone comes across as BOFH. I'm possibly a simple PHB who owns an electric cattle prod that is wired up to the mains (three phase) but I prefer to get sign off for a project via work committed and not threat done.
I didn't read it like that, and I'm usually the first to read things negatively. I read it as motivational: don't feel bad about your own mistakes, even the big guys with tons of money and a lot of really smart people mess up sometimes. So cut yourself some slack and just do the best you can.
I missed the :) which might sound a bit naff now but was probably intended to deflect comments like mine. Hit taken. However I did invoke BOFH which is (I hope) normally seen as an indication that a comment is not to be taken too seriously.
EDIT: BOFH => Negative - nope, not here.
"Have you seen the boss's new toy?" "Yeah. Coincidentally, I'll be working remotely moving forward. Good luck!"
I just thought that if you're working in a high pressure environment (and this applies to virtually every coding shop I've ever known) and get trouble from all sides all the time it feels reassuring to know that in fact, not you are the problem, neither is your employer, stuff like this just happens even to the best.
How? Is it mapped to a static location?
The worst I can do here in userland is crash or delete data. And that's pretty bad already
Better testing (as in more than a superficial glance) would have caught this before review but there always exists the possibility that subtle bugs can sneak past even well thought tests.
Just my own experience and opinion.
What I discovered was that the event recorders on certain locomotives updated GPS at the 20th second rather than the 0th second. This meant that the GPS entry next to each line was in fact offset by 20 seconds - i.e. the entry for 02:11:40 was in fact what was sampled at 02:11:20. I think they must've held the GPS coordinate in memory somewhere but updated it AFTER writing that second's entry, so they wrote 02:11:20 whilst holding 02:11:00's GPS, then updated it, but then written that update at 02:11:40, etc. This was a fault with the design of the event recorders, not just one loco, as it occurred on each of that type that I looked at.
This confused me so much because it looked right - it was in the right general location, it was updating, etc. - but for a solid few days I did a bunch of analysis thinking the train was in a different location to where it really was. I eventaully picked up on it when subtle things kept not adding up and verified it by watching another loco come to a stop but then see the GPS keep moving for a bit afterwards until it settled.
I agree with you, subtly wrong results are the worst.
Laying off all those QA and testing people will have some downside - MS seems to be letting older versions take the hit.
Bad coding on one product isn't exactly the most convincing strategy to get people to try a different product.
If they can be replaced with a script, your employer has a management failure and is wasting a lot of money on short-term savings.
I was hoping to see someone who said:
(1) I tested a Windows 7 X64 machine without the Meltdown patch (pre-December 2017) and couldn't read arbitrary memory.
(2) Next I tested with Microsoft's Meltdown patch KBnnnnnnn (Jan or Feb 2018) and could read arbitrary memory. The system is insecure.
(3) I then tested with Microsoft patch KBnnnnnnn (March 2018) and can no longer read arbitrary memory. They fixed it.
Is there a way to just get faster without the complexity. What would a new cpu architecture and OS look like if we started again? is there room for open hardware to save us all?
The answer in https://stackoverflow.com/questions/8389648/how-do-i-achieve... is interesting; I'm particularly fascinated by the temperature warning (the answer author's CPU got to 76C in testing).
My CPU's sitting at 30C right now. It maybe climbs to 48C if Chrome's being stupid, and 50+C if I'm doing something mildly taxing. I've never made it go beyond 60C IIRC.
So, modern CPUs are so efficient that they're simply just never hitting their maximum throughput. I think that's pretty incredible.
The sad thing about CPUs that don't use modern (superscalar, multi-stage, microarched, etc) design is that they just can't keep up.
And people's OCD about speed and (more frequently) parallelization nowadays drives what they'll buy. Something had better have a killer feature if it isn't fast or highly parallel.
So it's possible, but a huge headache. Whatever you built would likely be highly purpose-specific.
Oh well, it gives me something to do of an evening 8)
2008R2 I can see doing it by hand, but Windows 7 clients is odd. Particularly as it seems to be taking you two months to apply urgent patches.
(Nearly) All of them are on the end of an IPSEC VPN that I can get at from home via the office web proxy and another VPN or via magic. Some of them have 192.168.0/24 or 192.168.1.0/24 - those are on the end of OpenVPN. I wrote this: https://doc.pfsense.org/index.php/OpenVPN_NAT_subnets_with_s... . You have no idea what networking is about until you've had to do that sort of nonsense a few times 8)
I don't have the luxury of one WSUS to manage, we have loads of the bloody things. Some customers have pretty skilled local IT depts, some have somewhat vocal users that accuse you of resetting their passwords after spending hours doing way more than they have paid for and would not understand what you are on about in the first place. I love them all equally as any parent would ...
When I get bored of watching Windows Update I run apt update && apt upgrade && reboot on a few machines and keep a weather eye on the monitoring system. When I get really bored, I run up yaourt on my laptop or my office PC. When I've got a newly installed Win system or two to patch, I fire up a few emerge -Uvh --deep --newuse --keep-going @world sessions (I'm not really joking here) or run up genkernel.
Yes, there is the default state designed by .... bbzzzzrrt .... soz, lost it, and then there is reality. Could I also remind you that there is rather more to patching than WSUS:
* Firmware - Dell, HPE and Co have had to do rather a lot of work here and had to start again in Jan when Intel dropped the ball
* Hypervisors - I generally see VMware - that's a lot of patching and don't forget that some of them were buggered, so need excluding.
* VM vHardware versions - yep, all those little lovelies have their own hardware types to worry about
* "My fooking factory runs 24x7 - what are you going to do about it?" .... "Yes but you didn't go for the full cluster version sign I'll see what I can do" ...
You think I'm odd! No mate, my little company are well aware of automation and use it where we can but we are pragmatic and have to deal with a lot of reality.
We could of course bind our customers to our iron will and enforce our policy and stuff. They would not work on weekends or other odd hours. They would not insist on doing things their way and they absolutely would pay us on time - they generally do 8)
This fairly significant change wasn't backported to Windows 7.
Then when they went to backport the meltdown fix to windows 10 they set a 'this page is user accessible' bit in the page tables by accident.
Also, does windows 7 map the entire address space into kernel memory? That is, would this have enabled direct memory access to other processors.
"Only Windows 7 x64 systems patched with the 2018-01 or 2018-02 patches are vulnerable. If your system isn't patched since December 2017 or if it's patched with the 2018-03 patches or later it will be secure."
"I discovered this vulnerability just after it had been patched in the 2018-03 Patch Tuesday. I have not been able to correlate the vulnerability to known CVEs or other known issues."
(TBH, this is already unfair, comparing the kernel with an entire OS)
Ironically, wouldn't that make it even more unfair for Windows? Shouldn't all the 'millions of eyeballs' looking at the linux code be making it more secure?
>If you look at the big picture, it's not like Windows is known for it's security.
True, but security bugs are easier to reason about, than feelings.
The number of bugs found should be trending towards zero since millions of people have the opportunity to improve the source code and prevent the bugs from being introduced in the first place. There are ofcourse other advantages to having the source be open, but if there is no security advantage to open source, that's going to put a dent in some of its marketing.
>Fewer eyeballs on Windows would imply fewer discoveries, and fewer bugfixes as a result.
Why would fewer people be looking at Windows compared to Linux? Security Researchers don't really discriminate. Or did you mean just the MS developers? Hmmm, I don't know how many windows bugs were found through external sources vs internal. Perhaps someone has already done that analysis..
As I wrote in the original post, this is because Linux is open-source. There are few people looking at Windows, simply because there is no source to look at, and as a result there are 10 times less people in the world who potentially even can look at it and check for bugs. That's why.
With Linux you need basic systems programming skill and ability to code simple exploits. With Windows you either need to be working there (and be assigned to this task) - or reverse-engineer, which is a much rarer and complicated skill.
Only if no new features are ever introduced.
Yes, this is exactly what happens, from my experience.
-> more people look at code
-> they find (and fix) more bugs
-> the system is more secure, because all bugs are found and fixed, instead of being kept inside the code and being sold on hacker forums and agency surveillance projects.
You also know that Linux is not just one codebase from 20 years ago, it constantly changes and adds new features? Of course there will be new bugs (like any other recent OS).
Where is the evidence that this happens? Do you have data (Open vs closed) showing more security bugs were found through developers, versus external sources?
>-> the system is more secure, because all bugs are found and fixed, instead of being kept inside the code and being sold on hacker forums and agency surveillance projects.
Why would a hacker fix a linux bug for free, but chose to sell a windows bug? That doesn't make sense to me.
It is really pointless using the count of CVEs as a measure of how vulnerable a product is.
>It is really pointless using the count of CVEs as a measure of how vulnerable a product is.
I read the article, and that is certainly the opinion of the author here.
Security is a large field. You can reduce it to number of bugs. You can reduce it to the development process used to create the product. You can reduce it to methods of defending against future vulnerabilities. You can reduce it to methods of tackling bugs. You can reduce it in along any axis. I don't think using CVEs as a measure is pointless. I find them to be useful.
Yes, I think you are being a bit of a noddy comparing a kernel with an entire OS. That said, all software has bugs. Blimey, how on earth can you compare the paltry 3000000000000 odd source files of Windows tucked up in GIT with the gazzilions of source files that comprises a modern Linux based system (let alone the BSDs etc).
I will simply mention here that when I update an LTS Ubuntu or Debian box I run "apt update && apt upgrade && reboot" (or use a GUI if I'm bored) and it takes a few seconds to minutes to update the entire system. Everything. That includes Java, Flash, Office suites, graphics drivers, USB drivers, printer drivers, CAD suites, database servers, web servers, PHP, Python, Perl, Rust, Go, ... need I go on. Everything. The same happens when I use pacman or yourt, or emerge, or yum, or rpm or whatever.
I'm personally CREST accredited, so have a fair idea about security and prefer to spend my time doing stuff and not waiting for updates to install (if I can even find them) - you?
I've been meaning to SLOCcount Linux sometime, actually!
Having said that, I don't think it'll be 45M LOC. The kernel is 20M LOC (https://www.linuxcounter.net/statistics/kernel). Chrome is 18M (excluding blank lines/comments) (https://www.openhub.net/p/chrome/analyses/latest/languages_s...). LibreOffice is 9M LOC (https://www.openhub.net/p/libreoffice).
And then I found out that KDE is 60M LOC!! (https://www.openhub.net/p/kde)
GNOME is 9M (https://www.openhub.net/p/gnome).
But I'm guessing those two stats are comparing just the base desktop environment in GNOME's case with all the productivity apps (including KWrite et al) and system libraries (including QtWebKit et al). This must be kept in mind.
TL;DR, an incredibly basic system with just a word processor and web browser, and maybe a minimal windowmanager on top, would be 47M. Adding KDE in makes it 107M - but you're almost never going to use all of it (whereas with Chrome and LibreOffice some large proportion of that 18M and 9M is loaded into RAM and potentially targetable).
If you want to SLOC Linux then download it https://www.kernel.org/ and help yourself. Why not look here as well https://www.freebsd.org and others - those are my mates, and good ones.
Take one down, patch it around...
127 little bugs in the code.
FYI, security testers call this type of testing negative testing, which is different from functional testing that is done to test an app works "properly". However, for an OS this test is not negative testing but functional testing if the OS is designed to enforce user and process isolation.
The point of testing is to make the unknown bugs, known.
No, git works fine - that looks like an Engineer's bodge to account for an inadequacy elsewhere.
That doesn't make any sense. If this bug is present, this is instant total control of a PC. That's as bad as it gets.
So, on our register you get a series of items with weights from 0-9 that is a self ordering list of things to fix. It is pretty simple and you could add more dimensions if you like.
I'm just a simple Board member of my company - MD in my case. We try to create to do lists that have a reasonable chance of being fixed with a reasonably simple ordering of importance.
0.01 * 100 is 1.
This is a really awful snag but it has all ready been patched if you apply them.
I'll be changing our Risk Reg soon to become a Risks and Opportunities Register after a discussion in our last ISO 9001 audit. Not sure how the scoring scheme will work for that yet.
This may look like a bit of a silly pseudo formal exercise but it really does help with decision making. There's nothing wrong with bending the scores either, if you are open about it. It is simply a way of prioritising a list of things to do in the end.
I have to be a PHB sometimes as well as a sysadmin 8)