A project of such scale and longevity always seems to pique my interest. I bet there are some interesting historical artifacts in that codebase. Why is old code so fascinating?
Decades ago someone as insignificant as you wrote this code. I find it especially interesting to come across comments in code, which allow developers to express themselves outside the confines of the language. I feel like a historian in this context, analyzing this person's thoughts and state of mind. Perhaps the author included elements of emotion or humour. Who was this person? An insignificant developer, as I am -- did they ever become something of themselves?
I think the developer's story is rarely told, but it's a story worth telling!
In Vinge's "A Deepness in the Sky" [0] one of the characters is a Programmer-at-Arms, and one of his roles is essentially software archaeology - not for the fun of it, but to keep the deeply layered software of a starship functional.
This is a fantastic book. Software is written and rewritten over thousands of years, layer upon layer, powering interstellar vessels and their “automation”. It makes you wonder what Windows will look like 100 years from now, if it’s still around. Will it be easier to just write another layer on top or dig down into the original source and modify?
Vinge's "A Deepness in the Sky" can be used to explain current big industrial control systems. A typical distributed control system these days transfers data across many different layers, transforming the data when it crosses each layer.
Starting deep down in the plant:
* reading the 0-10V or 4-20mA signals from sensors.
* AD converters convert the values into a digital representation in field devices.
* move the data to real-time controllers that run control algorithms and send outputs back to motors/valves/pumps/etc.
* those controllers move the data often via raw Ethernet packets to other real-time controllers and Windows stations used for operator visualization.
* some Windows workstations gather data into databases for historical usage. Sometimes this is a SQL type database, sometimes a flat database used to get better storage performance.
* then data is moved to local databases that can be used by the onsite teams to analyze it in an office environment.
* then that data is then moved into big data offsite/remote storage for analysis of how the medium and long term performance of the industrial installation and comparison to other installations in other parts of the world.
* from that reports are made with key performance indicators and graphs.
And each time the data is passed up a layer the format of the data and timestamps may be adjusted to match operating system that the data is passed to.
History seems to suggest that it will eventually get bogged down with too much junk but its too hard to rewrite and keep compatibility so eventually some fresh project without the legacy will be able to become the technically superior product and everyone will move over to that.
In theory, yes. In practice, either of two solutins will take hold: either the old system is put into a container on the new system or the application software itself obtains an emulation layer for the old system interface during the port. Either way, not a lot will be gained.
Yes! Thank you do much for bringing this up. I read about this book in a comment on HN many years ago and have been trying to find it, or the book title, ever since. Going to bounce this to the top of my reading pile.
See this page on "digital archaeology" describing research that studies developers and software through their "digital remains": http://digitalarchaeology.info/
When I worked on Azure we had some access to the windows team data and documentation, and used an integrated build environment for some services. There are some interesting things going on under the hood mostly around building components in isolation against the spec of other libraries/services.
The switch to git was widely publicized at that time. As far as I know, the Windows team is the only large scale team using git. Facebook is using Mercurial, Gogle is using some concoctions around Perforce, I think, and is working on a Mercurial transition. And all of these attempts for trying to make DVCS work at scale are straining their respective systems to their breaking points.
I must have been in a cave that month or something. It's a fascinating move imho considering how few people actually knew what git, svn, cvs etc. were when I worked there back in 2010. So much myopia towards outside technology but Satya and Scott Guthrie were always on top of things with a good vision of how microsoft and open source should co-exist. I legitimately regret leaving only to have ballmer leave a few weeks later followed by Satya's, who was previous vp of the division I worked in, promotion. All those unvested shares >_<
The product that I maintain at work started life in the late 1980’s as a DOS application written in a mix of C and ASM. All of the original code comments are still present. Every once in a while I’ll go spelunking through some of those for fun.
Mark Lucovsky, one of the original Windows NT kernel developers, gave a talk called "Windows: A Software Engineering Odyssey" at USENIX 2000 talk with a lot of historical details about the Windows code history and development processes:
Back in the late '80s, my company had a program that included getting the Windows 3.0 source code. Looking through the code I had found the line "goto wearefucked;" and accompanying label further down. When we got the source for Windows 3.1 the first thing I did was check that file, but to my disappointment the label had been changed.
This post is kind of dramatic. Windows is much more than a simple kernel: it is a collection of drivers, tools, a desktop envinroment, a network stack, user base applications (like Paint), etc, etc.
If you compare it to the entire Ubuntu project, you will se a project of similar dimensions.
I don't know about Ubuntu specifically but some distro's do ship sources with their installation media and it usually fits on 1 DVD.
When I say "sources" I am talking about kernel, user land and desktop applications too.
However that would be compressed tarballs and doesn't include git history - the latter being where I suspect most of the disk space is being used in that screen shot.
My benchmark was SuSE, which was 1 DVD's worth. However your post now has me wondering if that didn't contain the complete sources either - just sources for server components. That said, those 12 DVDs likely cover way more than the Windows sources would. eg multiple different image editors, LibreOffice (would Windows sources include the sources for MS Office as well?) several different databases (MariaDB, PostgreSQL, Redis, etc). So even that might not be a fair comparison either?
Unfortunately without access to that source directory all of this is all just going to be speculation anyway.
I did also make the point about compression too by the way :)
Right. Probably it's not at the level of full debian archive, but those 500GB probably include all kinds of weird services and windows tools and programs most users never run or have a use for.
Oh I'd guarantee it would. And the vast majority of those libraries will exist for backwards compatibility too. Much like with the CLI user land on Linux (GNU coreutils et al).
I think the real proof of the pudding is the footprint of a default desktop in Windows vs Ubuntu+. There was a time when Windows would literally consume one order of magnitude more disk space than Linux after a fresh install however I think things have since converged in the middle somewhat.
Going back on topic though, that looked a dev directory for Windows so would likely have contained a .git directory too. That would easily balloon the disk space used by any (mature) project's source.
+ I know you were talking about Debian, but I'm going with Ubuntu now because it's more of a desktop orientated distro and frankly it works better in Windows favour anyway due to it installing more software by default than Debian would.
fair point about compressed bytes, but what about structure ? even ubuntu never felt that intimidating but maybe it's because I just didn't count folders
Why would that be a better example? BSDs don't write all their own desktop software. Ok, they will be some edge cases before you jump in with "OpenBSD wrote cwm". Generally speaking, there will still be a lot of other 3rd party software too such as GIMP. So BSD isn't going to be any different to Linux in that regard.
However this is all moot because the point I was making was about the centralised source repositories; those will contain not just the sources of the base Linux / BSD system (kernel + user land) but also any other source that are offered as binary packages as well (ie all the stuff that's part of the platforms build system)
> You can spend a year (seriously) just drilling down the source tree, more than a half million folders containing the code for every component making up the OS workstation and server products and all their editions, tools, and associated developement kits, and see what’s in there, read the file names and try to figure out what does what.
This makes me really want to know how they manage to document everything to make such a huge project easy to work with.
Usually when you're working on the scale of operating systems, you have people focusing on particular areas e.g. kernel teams, networking teams, graphics teams etc and you also have their corresponding test teams. So no one person usually even has understanding of the entire source code.
The kernel is a pretty small part of the operating system. Development of Windows NT started in 1989[1], when RAM sizes of PCs were measured in MB, not GB, so there wasn't room for a lot of bloat. (NT 3.1 could run on a machine with 12 MB of RAM.[2]) Also, it wasn't the first kernel Cutler wrote. While Cutler is undoubtedly a brilliant programmer, I doubt he could keep the details of all the other parts of the system (graphics, etc.) in his head at once.
Other well known kernels were also written by very small teams. The first Unix kernel was written by two people and the first Linux kernel was written by one person.
That's not how NT was seen at the time. It was quite bloated relative to other PC OSs. Of course modern OS features required a bit more overhead than the crazy world of windows 3.0.
I guess I should have said "a lot of bloat by today's standards"...
My first experience with NT was NT 3.51 on a 32 MB machine. If I remember well, it didn't feel much slower than Windows 3.1. But finally having a PC operating system that didn't crash all the time really made development much more pleasant and productive.
A lot of affordable machines a few years old even had 4M. I remember my dad upgraded from 4 to 8 and from Win3.11 to Win95. The computer had maybe 1 or 2 years, hardly more. It was around 1500€ (well, it was 10000 FRF at the time) when bought, with a 14" screen.
32M was insane, probably out of reach of all but rich homes.
Only a few years later (1 or 2) a classmate told us he had a computer with 128M. That sounded so ridiculous that we thought he was a mythomaniac or something. Turns out his dad was able to get a powerful workstation from work, and the first time we visited him we saw that mythical beast, and our mind were blown.
And right now the private working set of my start menu process is 27M...
Cutler certainly did not write the whole first kernel alone, although the initial size of the NT team was quite small, at least at the very beginning. The story of the development (until the at least the first release, IIRC) is in the book "Show Stopper!"; quite interesting.
Show Stopper is a wonderful book if you're interested in the historical development of large systems. It's second only to The Soul of a New Machine. I highly recommend it.
I read that book a few years ago. I've read several books about various computers projects and I usually leave with a wish that I could have worked on that project. After reading Showstopper, my main thought was, thank $deity I didn't work on that project.
I think it is because in the book Dave Cutler comes off as an asshole.
500 GB strikes me as quite small. The entire Git repository (it sounds like Microsoft uses Git internally) must be a whole heck of a lot larger than this if there are over 60,000 commits every few weeks.
Still, it’s a mammoth project, which makes it all the more impressive that it works (most of the time).
If the reported size was from that fs command (assuming that is similar to linux's du) and that he did it in his git clone version, then it includes all the git files, including file versions (that with 60k changes every 2 weeks should be pretty huge), and branches (different editions, targetted to different markets, etc, that may not have trivial changes).
I don't think he copied the structure without the git files to make the count, nor deleted/moved elsewhere the .git directory for that. Neither that is just source code there, (versioned) bitmaps, third party drivers blobs and more probably it's there too.
That’s quite a few assumptions, at least one of which likely isn’t correct; at Microsoft, the .git directory doesn’t contain ”all the git files, including file versions […] and branche”
”GVFS allows our developers to simply not download most of those 3.5 million files during a clone, and instead simply page in the (comparatively) small portion of the source tree that a developer is working with.”
”the Windows repo, which has over 3 million files in the working directory, totalling 270GB of source files. That’s 270GB in the working directory, at the tip of master. To clone this repo, you would have to download a packfile that is about 100GB in size, which would take hours. And once you’ve succeeded in cloning it, local git operations like checkout (3 hours), status (8 minutes), and commit (30 minutes) would still take way too long to run”.
I'd say that it's very likely that a repository of such size would be "shallow cloned", i.e. with `--depth n`. This would substantially cut down on the size of a large repository with hundreds of thousands of changes.
The real question is: is it compiled with the same msvc compiler that ships in the SDK? If so, they don’t even use C99. C89 all the way. That especially means no variable declaration anywhere in the function body.
While MSVC doesn't have full C99 support, MSVC supported most of the features that shipped in C99 that make it much less painful to develop in C quite a number of years before C99 support formally landed in other compilers. MS will probably never support full C99 because the demand isn't there, but it's certainly not ANSI C you are expected to write.
MSVC support C11 because it a prerequisite for C++17. The limits are you need to compile in C++, so the very few changes/incompatibilities between C and C++ always bend to the C++ choices.
The preprocessor isn't what people used to have in C99 so some macro still not portable.
To the best of my knowledge (i.e. not updated since C++14), C++ is no longer a strict superset of C as of C99, and there are C99 features that are not available in C++, so I'm not sure that's the case. But plenty of overlap, for sure.
> The following C99 features are not supported by C++:
>
> * restricted pointers
> * variable length arrays
> * flexible array members
> * static and type qualifiers in parameter array declarators
> * compound literals
> * designated initializers
It's probably the DDK compiler, although I'm not sure of the difference these days.
In C89 you can declare variables in the middle of the function, as long as they're at the beginning of a { block }. The beginning of a function is somewhat of a special-case of this.
Yes and no. Compiler is checked in with the (Windows) source code and is periodically updated/patched as project goes along. When SDK or DDK is released, it's the same compiler at some specific version.
Any code in active development would have been ported to C++ (although left in the original C "style"). The MSVC C compiler is just for back-compatibility and not intended for modern C development.
Microsoft created VFS for Git, specifically because the windows kernel code was too large for git to handle without being intolerably slow (git status taking hours to complete for example).
Microsoft CEO and incontinent over-stater of facts Steve Ballmer said that "Linux is a cancer that attaches itself in an intellectual property sense to everything it touches," during a commercial spot masquerading as a interview with the Chicago Sun-Times on June 1, 2001.
I am personally pleased they have thrown in the towel with Internet Explorer/Edge to just use the same browser as everyone else. Okay, bad for diversity, good for getting your web page to render.
Next they will be emulating Windows in Linux.
Not a bad idea.
A few years ago I was curious as to why programs worked quicklier in Virtualbox running Windows on Ubuntu than when I booted the machine into Windows 7 mode and tried to run them natively. Plus why bother with installing driver disks and having a subsystem when the Virtualbox provided defaults were doing a better job?
Then there is the file system. The Windows one might do special Windows things like animating folders flying through the air but it is always slower given the same hardware and same task. And normally you just want to copy files or rsync them.
Plus there is that matter of anti-virus. That is never a problem on Linux but you need that on Windows, another little watchdog program to double check every byte for you. Can't see how that helps. The unix/linux/ChromeOS idea of considering everything to be hostile with file permissions has always worked.
So I look forward to a future where Windows is just a set of UX guidelines and any 'Windows program' is secretly run in some proprietary blob that is running in some white-labelled Virtualbox clone. Maybe they end up doing that thing Apple did to move off the legacy code base where it recompiles itself.
It always infuriated me that MS usually wasn't using the tools they gave to developers for their own stuff. I think it has got a little better in the last years. Now they seem to be using UWP to some degree. But before that they didn't use Winforms or MFC. WPF only for VS 2010 and up.
If you've ever used any other applications written in pure Win32 you would know why --- the efficiency difference is enormous. They did start using more managed code in the system starting with Vista, which resulted in all the complaints about it being slow and bloated. I suspect there's been even more of it added since then, because the Win10 explorer.exe on a recent quad-core i7 system with an SSD still manages to respond more slowly to actions like opening a folder than XP's on a 15-year-old PC with a regular HDD...
Minwin was about moving code around. By necessity it required even more code to support the indirection and compatibility of programs expecting code in specific DLLs.
Makes me wonder if there exists an OS that is truly written from scratch designed for only modern hardware and devoid of all backward compatibility and bloat (like the JS frameworks ditching support for msie).
It would be certainty a very interested side project I guess.
This is an interesting consideration. If you think about IoT devices, oftentimes you can find yourself designing software for a bespoke hardware system with limited resources and no room for considering backward compatibility. Paravirtualized VMs too need not necessarily require full POSIX ABIs since syscalls are more expensive and the underlying hypervisor can take care of any compatibility with the underlying hardware system.
The problem with modern hardware is that they themselves are full of backwards compatibility. And while such a system could certainly be done (as somebody said, Fuscia is probably a fairly good example of one), it will only take a few years as hardware changes and it's going to have backwards compatibility "bloat" as well!
There are a few basic tutorials here: https://wiki.osdev.org/Expanded_Main_Page that can definitely give you some insight into how kernel development works. It's an incredibly disciplined and broad field, but very interesting too.
There is a book about the development of Windows NT called show stopper. In the book they talk about this. C++ was pretty new in 1989 so Dave Cutler decided to stay with C to reduce risk.
Except for the graphics team, as they were managed by someone else. They decided to go with C++, and they were pretty much always late because everyone was learning and building the tooling around C++.
Unsafe sounds like a really bad word, but unsafe usually also means performant. I know that's not always the case, but usually. Rust falls in the exception to that rule category, for the most part, while C# does not. Also, the people typically working on these codebases, like Linux, are very senior developers and have code reviews being done by other very senior people. So while they definitely have had a lot of bugs related to security in the language, we don't have any examples of other languages implementing something anywhere near this size that didn't have as many bugs.
According to Google's security team 68% of exploits in Linux are due to memory corruption. Source, their keynote at Linux Kernel Summit 2018.
Check DoD security assement of Multics versus UNIX regarding ease of security exploits, and how using PL/I prevented the large majority of them. On the mobile now, the document is accessible at the Multics history site.
Or any deployment of High Integrity Computing OSes for that matter.
From what I know for a long the OS division who writes Windows has been at war with the developer division who writes .NET and refused to use .NET. I think it has gotten a bit better recently.
Second this. Though Microsoft has been really busy getting their C++ sorted out lately.
Funnily enough, C# performance has also seen some dramatic performance increase.
> You can spend a year (seriously) just drilling down the source tree, more than a half million folders
I would be more inclined to switch to Windows if its source tree were smaller.
My guess is that a big reason I prefer MacOS over Windows is that Apple has been much more willing to drop support for legacy hardware and old applications to keep the source code more manageable.
It is difficult for humans to estimate how they spend their time, but my guess is that I've spent at least 35 hours recently exploring Windows 10: Changing various settings, installing software, asking Windows-specific questions of Google Search.
And I'm writing this on Windows. (Except for the Windows-specific questions, I'm not counting web use in the estimated 35 hours because of course the web mostly works the same way across the desktop OSes.)
I think computer users vary drastically in how much they value predictability in the software-based systems (or "environments") they use, and that I value predictability much more than the average user does.
I realize that it is unreasonable to hope never to be surprised at the response the system makes to an action of mine. But my response to being surprised is to try to understand how I could correctly predict similar responses in the future. For example, I might try to understand the reasoning of the designer of the part of the system in question. Or I might search for ways that I might have misinterpreted the situation. And I don't like it when I never reach an understanding of the surprising response.
Its source code's having half a million folders is a sign that Windows will never stop surprising me, which, all other things being equal, makes me less hopeful that spending time getting to know Windows will pay off for me.
Life of course will never stop surprising me. But to have any hope of getting anywhere in life or achieving any goal whatsoever, my brain must be sufficiently reliable and predictable. I see my computer as an extension of my brain that helps my brain be more reliable in the ways that will help me to succeed.
If you see your computer or the web site you are interacting with as a potential friend with agency of its own, then I can see where you might be offended by my original comment. I see computers and web sites as tools.
Levers, if you will.
Windows 10 with WSL has enough Ubuntu features built in now that there's really no reason to use a Mac, since the "it has a Unix shell" comment doesn't work anymore. Your choices for applications are much larger on Windows.
With the new ConPTY they also one of fix my last gripes about Windows. That the console sucks.
With all the changes that Microsoft is making lately I really wonder how long until Windows becomes free as in beer. With their revenue from services growing, they don't need the Windows licensing revenue as much. Making Windows free would also allow them to making Windows cloud servers cheaper. Allowing them to grow the ecosystem.
I switched from Apple to Windows 10 when Apple stopped offering decent "Pro" platforms. I need NVIDIA cards for the CUDA number crunching I do. (The Mac faithful say "b.b.but you can run an Nvidia externally over Thunderbolt." Ha!)
I even used to be an Apple employee.
Windows 10 is just fine. With a tiny amount of care to choose hardware that is all "happy path" (I run dual Intel Xeons, with a SuperMicro MoBo, and now dual Nvidia 2080 Ti for scientific processing), everything works fine. (Also, don't run third-party virus scanners. Just use what's built in to Windows!)
And things that don't quite work right on Macs, like 30-bit color, etc, work great.
I run Linux and FreeBSD in VMs. I don't use WSL much, I just build many things on Windows under Powershell if I don't need Centos or FreeBSD. I find that a lot of FOSS stuff works great on Linux and native Windows and not so well on MacOS because of "non-standard" choices about directory locations, etc.
It's best to give up your prejudices and give Windows 10 a try. Microsoft really did solve almost everything.
My $dayjob gave us all Macs for workstations. Which is fine and dandy, but I spend all day in an RDP session as my job is automating Windows. Honestly Windows is fine and is getting better. I'm running the Windows 10 Insiders Preview so sometimes things get a little funky but still it pretty much just works.
Decades ago someone as insignificant as you wrote this code. I find it especially interesting to come across comments in code, which allow developers to express themselves outside the confines of the language. I feel like a historian in this context, analyzing this person's thoughts and state of mind. Perhaps the author included elements of emotion or humour. Who was this person? An insignificant developer, as I am -- did they ever become something of themselves?
I think the developer's story is rarely told, but it's a story worth telling!