Hacker News new | past | comments | ask | show | jobs | submit login
32TB of Windows 10 internal builds, core source code leak online (theregister.co.uk)
369 points by manirelli on June 23, 2017 | hide | past | web | favorite | 91 comments



Looks like there's some debate as to whether or not this has been exaggerated: https://www.betaarchive.com/forum/viewtopic.php?t=37282

So far I haven't seen any links to source code.

Quote from one of the admins:

> Yes I have no idea where they got the 32TB stuff. We had a big leak of Win10 builds yes, but these were all Windows Insider stuff that were collected over time available to all Windows Insider members at one time or another.

Edit: BA's official statement: https://www.betaarchive.com/forum/viewtopic.php?f=1&t=37283


Hiya - I wrote the article. What's happened is that the Beta Archive folks have now deleted (or in the process of deleting) the private material that was uploaded to the BA FTP. There most definitely was non-officially-released internal Microsoft files in the archive, regardless of BA's intentions, such as the Shared Source Kit, the ARM64 Windows Server build, the Mobile Adaption Kit, and various prerelease versions of Windows.

We've updated the story to explain why things aren't what they seem. Essentially, the files at the heart of the matter were there (we screenshotted them and saved copies of the forum posts) at time of writing, and they were removed later on Friday.

In terms of the 32TB: that's the full decompressed dump of Windows files uploaded to BA. From what I understand, Microsoft hasn't released 32TB of public Insider material, so obviously there's extra sauce in the mix.

That includes, yes, copies of officially released Insider builds plus confidential private stuff that should never have left Microsoft, let alone turned up in BA. We make this clear in the story - I'm starting to feel the headline could have been better to make this clearer rather than grabbing the biggest figure. I am beginning to regret this.

BA can twist and complain all it likes - but stuff that was confidential within Microsoft ended up in their FTP archive (and some is still in there, such as the ARM64 stuff). The next stage of this story will be to uncover how exactly did this material escape Redmond.

C.


All the old builds of Windows 10 listed were presumably grabbed via public Unified Update Platform (UUP) infrastructure or the Ecosystem Engagement Access Program (EEAP), but I haven't confirmed yet. It's common knowledge in the Windows enthusiast community that builds (yes, even arm64 desktop Insider builds) can be pulled from Microsoft via these channels. It's not confidential, and not useful to share with anyone other than a build vault like Beta Archive.

Debugging symbols for most of those builds are available on symsrv.

The Windows Mobile Adaption kit (like the OEM Preinstallation Kits, OEM Adaptation Kits) is shared with a similarly sized audience, which used to include self-attested Microsoft Partners. Again, not confidential. Just gated stuff.

The Shared Source stuff is a slight unknown here because it's not clear what was in the ZIP. I presume this was a sampling of materials shared via the Shared Source Initiative (https://www.microsoft.com/en-us/sharedsource/), none of which includes high-value intellectual property, cryptographic code, third-party code, etc. It could still be damaging but Microsoft has clearly calculated the risk here; this stuff is shared with mere community MVPs.

So with all this knowledge, it's hard to digest the "omg more exploits coming" and "Microsoft lost 32TB of private IP" angles in The Register write up. I don't think there's a story here, frankly.


Clarify the 32TB and 8TB figures please. People with access to the archive who successfully downloaded the confidential stuff did not get nearly that much.

Do you consider windows installation images to be "compressed files" in this context?


Looks like the 32TB size reported is the total size of all the various Windows installation images, prior to de-dupe. 8TB after de-dupe. Not a very useful figure, however.

https://www.betaarchive.com/forum/viewtopic.php?p=420025#p42...


Compressed, it is ~8TB. Fully expanded it is ~32TB. I think the bigger issue is not the final size, but that internal Microsoft material - particularly source code - has escaped into public FTP. That, to me, is the main thing, right?

C.


Windows sources have escaped before. I doubt that Windows is buildable outside of Microsoft (and the bits are definitely not signable, since you need access to a key vault for that).

Useful for research, and finding security issues. Not much else.


Might be helpful to the reactos and wine folks if unofficially.


Actually the opposite. They can't work in the project of they've seen the actual MS code even if they write their own code.


Ah well, if there's a rule...


Why is that?

You can break patents without ever knowing the patent existed. So looking at this code wouldn't trigger a new patent problem.

And simply looking at some code, closing it, then later writing code that does the same functionality is not breaking copyright. So looking at this code would not trigger copyright.


Clean room reverse engineering. The idea that, if you build something with a specified interface (Windows API in this case) without prior knowledge of the implementation details, and you haven't broken any patents in doing so, then you haven't broken copyright either and you are free to do business. This is a gross oversimplification. See Intel vs AMD case for more details.


Clean room is a defense against copyright and not patent, AIUI. For patents it doesn't matter if you knew someone had patented it.

Not a lawyer, though, but a quick search confirms this.


>Compressed, it is ~8TB.

But what data does this 8TB refer to specifically? Is this the source + all the windows builds from a plethora of sources? Did you download 8TB of data from BA and expand it to 32TB or was this a figure provided to you by one of the raided hackers or their associates?

>think the bigger issue is not the final size, but that internal Microsoft material - particularly source code - has escaped into public FTP

Happens regularly, although usually it's MS employees leaving stuff in public FTPs or inside released ISOs, updates, whatever. redmond\ domain is huge and the (accidental or not) leaks never stop.


It's ~8TBs of deduplicated Windows installation media. The Shared Source Initiative material only amounts to ~1.2 GB, if that.


It's 32TB of deduplicated data. You've to download the whole 32TB actually


There does seem to be some source, as (now, not when you posted) discussed in that thread. Here's a pastebin (taken from that thread) with some filenames, including, for example, usb drivers

https://pastebin.com/raw/VGEbWVSM


If you really want to see some Windows source code. You can just ask them - and they will send it to you. It's not open source and there are limits to what you can do with it.

They call it their Shared Source Initiative. They want a reason for sharing it with you but I have used, 'I am just curious.' With that excuse, this was a long time ago when I still used Windows, I got the specific code I wanted for Outlook Express.


Do they share the code for the windows update mechanisms and the code for cortana and other spyware that is installed by default in windows?


I have no idea. I haven't used Windows in a years. Give it a shot. The most they can do is tell you no.


Put some effort in and reverse engineer it, if you really want to know how it works.

Complaining about the lack of source code is just sheer laziness.


How do you "ask them"? Do they have an email address for this? Or do you have to find the right guy on the team somehow?


Windows Shared Source initiative has existed for what, over 10 years now?

https://www.microsoft.com/en-us/sharedsource/

I think this "leak" is greatly exhagerated.


I haven't worked directly with Microsoft for well over 15 years, but this sounds similar to what I remember. Back then I worked for a partner who was doing direct integration work against low-level SQL Server and Windows libraries. Often when we encountered obscure bugs, they'd just give us the SQL Server or Windows source code and basically say, "Fix it, and we will release a hot fix." All of the comments would be replaced with white space which made things more difficult.


None of those categories seem to include ordinary developers though?


If your company, or you, have a relationship with Microsoft, try the Enterprise. They are pretty lenient, though not always rapid in their responses.


But the point is you need to already have a relationship, right? It seems you can't just say "I'm curious" (or even better, "I want to track down X bug") and expect to get source access, contrary to what was claimed earlier... Enterprise specifically says you need to "Maintain 10,000 Windows seats" which is not something a lone developer would do...


My only tie was having an MSDN subscription.


And you got it through the Enterprise license? Through your company or personally? Nothing related to 10,000 Windows seats? That's pretty weird if so since they say you must meet that requirement...


I didn't misrepresent myself, was logged in, and had no issues. They may have changed it, but that is what I selected. You'll probably have to sign an NDA. Give it a shot.


Even community MVPs get access to this stuff.


This is a gross exaggeration. As far as I can see, what "leaked" was the "shared driver source kit" that nearly any hardware vendor (like chipset manufacturer) can get; basically anyone who puts up a few thousand bucks and signs an NDA.


If nothing else, it would be interesting to compare code quality of this with leaks of much earlier Windows source files.


Does this mean an individual could actually get their hands on the fabled Enterprise LTSB edition and thus actually have control over updates?


An individual already can have an access to LTSB by paying for MSDN.


All I did was change a registry setting (or maybe it was a gpedit) to prevent automatic reboots. That was enough for me though, as I didn't appreciate my running apps being shutdown during the night.


Can you maybe recall exactly what u did to stop your computer to automatically shutdown(and up)?

I "resolved" the issue by dual booting. The second os(prev ubuntu, going to deb) changes something that takes away win ability to automagicly turn on my machine for updates.


The unstripped binaries are a huge benefit for non-black-hat developers too.


Microsoft actually provides symbols for most windows components through its symbol server.


The private symbols in these builds could actually be very useful. The article alluded there were private symbols. So, even if only 1% of the overall windows code was leaked, if there were, say private symbols for the heap allocator of the kernel, for a practiced reverse engineer that is pure gold. Not as good as code, but a hell of a lot better than having to figure everything out and name functions and symbols themselves.


What kinds of private symbols aren't served by the symbol server?


All of them. The server serves public symbols. Private symbols have structure info and even local variable names, which are very useful.


There are two levels of debugging symbols. One is released with every Windows build, for end-users reporting back stack traces and the like. These are of the second more granular level - private debugging symbols available to developers only.



Public symbols, which aren't nearly as useful as private symbols.


This will make Windows less secure in the short term, but as good and bad actors find bugs and Microsoft patches them, they will end up with a hardened product. Their OS is now effectively open-source.


It's only some core kernel and driver code, lower than 1% of the codebase.


Yup. It's the Windows Shared Source Kit, which is already mostly public. Many of the big security firms and government agencies already have licences to the full source code anyway.

The only thing this really gains anyone is it possible some non-public debug symbols might have been left in some builds. Not earth shattering.


USB and WiFi stacks can be fun, though.


> Their OS is now effectively open-source.

Ehh, Panic software had a good post-mortem talking about this potentially happening to them https://panic.com/blog/stolen-source-code/

My favorite takeaway was, "With every day that passes, that stolen source code is more and more out-of-date."

I remember hearing Windows source code leaks in the past (I see articles from 2000 and 2004) and remember hearing about problems with "clean room" implementations of open source SMB implementations.

Yeah, the fundamentals and much of the source code will probably stick around for many, many years. But this has happened before and I don't see why this is any more of a big deal.



Given the other article I read today about US companies bowing to Russian requirements to review source code, I wonder if MS has also already given away code that can be studied for security gaps.


Microsoft indeed make their sources available for review by major customers including governments. From what I heard this is done under NDA and reviewers are forbidden from taking the code away from MS facility.


I guess Microsoft really were serious about going open source.


Looks like the page where the source kit was listed was altered since the screenshot that's in the article was taken. I hope the files surface somewhere.


Can't imagine this was due to a hack of their systems. Seems more like an (ex)employee took a data dump and released it. Or it could be spear fishing.


If you develop WINE or ReactOS do NOT look at any of this code.


Can someone with access take a more comprehensive screenshot of the contests?


Throwaway account for obvious reasons. Does anyone have a link to the leaked data?

At this point avoiding links is pointless as the source code will be essentially public knowledge in matter of days/weeks. Damage control is the only strategy left. The sooner security researchers outside Microsoft can start analyzing and reporting vulnerabilities, the better.


Some of the leaked data seems to currently only be accessible on a https://www.betaarchive.com/ , but your account won't be able to access it for a month or so.


Not understanding why this comment needs a throwaway account?


Maybe he doesn't actually want it to research purposes, and actually wants a quick and easy way to find it for other less noble purposes.

He never actually stated he wanted it for security, just left it easy to imply.


It seems that the "leak" was what you need to develop a driver. You can sign up for MSDN and get that, right? Does that come with the $3000/year it now costs to subscribe to MSDN?


I must be possible to determine when this code was collected by matching the files to version control time stamps.

I wonder if that could be used to narrow down who pulled the code during that window.


Just imagine if the source code for SMB was let loose.


I'm imagining that other OS will be able to interop better with Windows increasing the value of Windows and improving experience of Windows by other OS users?

What were you implying?


They're disabling it later this year, so I wouldn't go that far.


Just the outdated SMBv1 of course, SMB is still very much supported and in constant development.


And replacing it with what?


Links or it didn't happen.


Looks like the WINE developers are going to have the time of their lives.


Quite the opposite. WINE developers will have to go an extra mile to avoid getting anywhere even remotely close to the proprietary source code, otherwise they may get sued for copyright infringement -- even if they didn't intentionally copied any of the code.


1. How would Microsoft prove that they saw the code?

2. If microsoft sued wine devs it would be horrible for Microsofts public image. They won't do it.

3. I hope the WINE devs don't listen to you.


> 1. How would Microsoft prove that they saw the code?

Get the court to order discovery on all of your computers. They could probably also get subpoenas for the source code hosting sites to reveal relevant access logs. Or someone could admit to reading the source code someplace public, like a bug tracker. Or they could argue that the choice of variable names and minor details of algorithm details are too close to be coincidence. A jury convicted Google because of rangeCheck, after all.

> 2. If microsoft sued wine devs it would be horrible for Microsofts public image. They won't do it.

No it wouldn't, particularly not if they had a strong case (e.g., someone bragging about it). If you think MS looks bad for suing people for stealing the code, then you'd have to think the FSF looks bad for suing people for violating the GPL and stealing the code of, say, Linux.

> 3. I hope the WINE devs don't listen to you.

I hope they don't listen to you. The repercussions are quite large--it's not unimaginable that shutting down the WINE project could result from a lost case. These cases do happen, and defendants do lose (Oracle v. Google is a notable recent one, and that's based on IMHO fairly weak evidence). There's a reason that projects that do major reverse engineering for interoperability have rather elaborate procedures for doing so.


> you'd have to think the FSF looks bad for suing people for violating the GPL

I agree with your, but there are people who argue this.


And I'm one of them. We don't want to alienate the already small number of people who develop Free Software. I would rather see companies who violate the GPL comply rather than seek damages. Actually bringing a suit, in my mind, is basically the nuclear option.


When I was involved with Mono we avoided reflecting .NET for much the same reason as elaborated by the poster who you (pretty jerkishly) dismiss. It's very easy to say that other people should undertake extreme personal risk.


Holy hell do you remember SCO v. everyone?


It looks like mainly kernel and drivers were leaked. WINE is emulating the Win32, which is distinct from the Windows NT kernel & drivers - kind of like how the Linux API is distinct from the Windows NT kernel, despite Windows supporting it.

This might be a boon for the ReactOS folks, who are trying to implement the NT kernel, except for seeing the Windows source code automatically disqualifies you from being a contributor.


Exactly. A few years ago the ReactOS developers had to stop all development for several months to perform a source code audit. This was meant to deflect accusations that they had derived code from disassembled Windows binaries.

If anything, this could make their legal situation more sticky.


> A few years ago the ReactOS developers had to stop all development for several months to perform a source code audit.

What kind of an audit can you do without access to the original software's source code? How would you tell if it's actually different?


This is what they did https://www.reactos.org/wiki/Audit


That's pretty cool, thanks!


That's unfortunate and doesn't fit my understanding of the situation - is disassembling a binary not fair game, in the same way that Samsung buying an iPhone and cracking it open is? (Assuming we're worried about copyright and not patents).


IANAL, but as far as I understand the situation it is not. See https://en.wikipedia.org/wiki/Chinese_wall#Reverse_engineeri...

But IIUC, clean room reverse engineering requires substantial effort to prove you did not take any shortcuts.


I vaguely remember that either the Wine or the ReactOS developers rewrote parts of their source a while after Win2k source leak in 2004, because there were some contributors who had been exposed to Microsoft's source code, and a rewrite of the parts those devs had touched was apparently the only way to make sure they were "clean".

IIRC, they had to go through some trouble to find people for that rewrite who had not looked at Microsoft's source code AND the parts of the Wine/ReactOS source code that needed to be rewritten.

So I am convinced that they will make extra sure not to even get in the position where someone could imply they might have looked at that source code.


No, they will not.


America better get its act together about computer security.


As opposed to whom? Britain? Germany?


A new bold step in Microsoft's Open Source endeavors!


I guess windows 10 is now open source!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: