Hacker News new | past | comments | ask | show | jobs | submit login

There is not much discussion about Windows internals, not only because they are not shared, but also because quite frankly the Windows kernel evolves slower than the Linux kernel in terms of new algorithms implemented. For example it is almost certain that Microsoft never tested I/O schedulers, process schedulers, filesystem optimizations, TCP/IP stack tweaks for wireless networks, etc, as much as the Linux community did. One can tell just by seeing the sheer amount of intense competition and interest amongst Linux kernel developers to research all these areas.

The net result of that is a generally acknowledged fact that Windows is slower than Linux when running complex workloads that push network/disk/cpu scheduling to its limit: https://news.ycombinator.com/item?id=3368771 A really concrete and technical example is the network throughput in Windows Vista which is degraded when playing audio! http://blogs.technet.com/b/markrussinovich/archive/2007/08/2...

Note: my post may sound I am freely bashing Windows, but I am not. This is the cold hard truth. Countless of multi-platform developers will attest to this, me included. I can't even remember the number of times I have written a multi-platform program in C or Java that always runs slower on Windows than on Linux, across dozens of different versions of Windows and Linux. The last time I troubleshooted a Windows performance issue, I found out it was the MFT of an NTFS filesystem was being fragmented; this to say I am generally regarded as the one guy in the company who can troubleshoot any issue, yet I acknowledge I can almost never get Windows to perform as good as, or better than Linux, when there is a performance discrepancy in the first place.

That deleted post from the MSFTer is pretty damning, but is the difference one that really matters?

Take a look at these benchmarks that were posted here recently: https://news.ycombinator.com/item?id=5644880

There isn't a single comment in that whole thread about how outrageously bad EC2 performance is. Meanwhile I'd bet that most HN startups run on EC2, heroku or other virtualized cloud platforms. And how many are using dog slow interpreted languages like python or ruby? It looks to me like people around here are quite willing to take very large performance hits for the sake of convenience.

I find Windows to be a small performance hit for the sake of convenience.

As a developer, I fail to see in what way is Windows convenient.

It is a great OS for which there exist a large corpus of high quality developer tools. In fact, there are many domains such as game programming for which there is no other platform that comes even close.

um, it comes on the computer already installed? .... I know weak, but it's the only idea I could think of.

You can buy computers with Linux pre-installed nowadays too.

I used to use Windows exclusively until about 4 years ago. Up to the time i switched, I occasionally was testing a few Linux distros, and repeatedly came to the conclusion that drivers were still an issue, and that Linux wasn't ready for the average user's desktop.

Not so anymore, since about 2009. From that time on, the only thing without a Linux driver I encountered (there are many more, just not that widespread that it should matter) was an HP printer. Which is why I stopped using Windows altogether.

My experience since then? Windows is a royal PITA to use and maintain. Linux, with KDE as the desktop manager, isn't just faster, it's way friendlier for users. One example, which to me is huge: on Windows, you need to run the updater of every f..ing software provider from which you have purchased an app. On Linux, there's just one updater for everything. another one: on Windows, even with all the discount programs for students and others, you have to spend thousands of dollars before you get equivalents of all apps installed that you get for free when you select developer workstation for a Linux installer.

Agreed, games are the only thing that Linux doesn't yet cover as well as Windows - both development and play. However, wanna bet that Linux will tip the balance in this area too, in at most five years?


Reposted here: http://blog.zorinaq.com/?e=74 This is too insightful to be lost in the cracks of HN.

And here: https://news.ycombinator.com/item?id=5689731 :)

I have the same experience in an established tech company in the bay area.

Exactly the same has happened - lots of new hires. Bad management. Really silly review process. Features are valued over fixing things.

There's no mentorship process for said new hires. This has obvious flow-on effects.

The old timers don't get promoted into management but end up fixing more and more bugs (because they're the ones that know stuff well enough to fix said bugs.) They get frustrated and leave, or they just give up and take a pay check.

The management values "time to deliver" over any kind of balance with "fix existing issues", "make things better", "fix long standing issues", "do code/architecture review to try and improve things over the long run."

They're going to get spanked if anyone shows up in their marketspace. Oh wait, but they're transitioning to an IP licensing company instead of "make things people buy." So even if they end up delivering crap, their revenue comes from licensing patents and IP rather than making things. Oops.

Thank god there's a startup mentality in the bay area.

The original author has requested that the hash and filename he used in the begging to prove his identity be removed.

Can't edit the post :( I emailed pg@ and info@ycombinator.com an hour ago already to ask for redaction.

Can you delete it and re-post a redacted one?

I had pg@ delete my HN post, so to leave only the properly re-edited version on my blog.

Sorry, I have never noticed Windows being slower than Linux. And usually when people complain, they usually have an incomplete understanding of what they're doing (nothing personal!).

The links that you posted in support of your claim are irrelevant IMO.

Compiling has nothing whatsoever to do with windows internals. You're comparing Visual Studio a full fledged IDE with dozens of extra threads running source indexing, source code completion/help indexing and dozens of other things that gcc does not do. To make a fair comparison you will have to just compare cl.exe with gcc with a bunch of makefiles (yes you can have makefiles on windows too).

Then your "real concrete and technical" example is actually a bug in windows vista which was fixed around 6 years ago.

And your claim about MFT fragmentation kind of sounds bizzare to be honest. Since Vista the OS internally runs a scheduled task to run a partial-defrag that takes care of it. I'm not sure what went wrong in your case.

I'm not saying you imagined the slowness, I believe you experienced what you said. So lets test your theory. Since you can write code runs slower only on Windows - give us some C code that runs horribly on Windows.

There is a rather large difference between "runs slower" and "runs horribly".

What you did is try to move the goal posts.

What? What "goal posts"? Jesus.. not everything is a nerd war to prove something right/wrong. I have no emotional investment in the outcome.

I guess the only solution is to tag my comments "Hey this is a casual conversational comment" or else people read too much into the wording.

Well Mark's post was interesting, but this is also old news now, is it not? Microsoft has had two OS updates since then and hot fixes and service packs. The biggest difference I see is that the Linux Kernel can be tweaked to specific performance characteristics at build time, whereas Windows can only be altered by runtime configuration. depending on what you might want to change, you may not have that capability. Going back to the article about network latency while playing media, the breakdown seems to show that this was a bug, and bugs happen. To Linux's credit, those bugs are probably patched more quickly and even if it hasn't been promoted to release, if someone has made a patch, you can incorporate your own fix. If not, the landscape doesn't look any different for Linux vs. Windows in that regard.

I have never seen a description or mention of a bugfix for this Vista problem. The root of the issue is that DPCs are very CPU-intensive due to the inherent design of the Windows network stack and drivers. A bug like this just does not "simply" happen as you make it seem. The root cause is lack of innovation in the kernel. The bloat does not get fixed. The software architecture they used to handle 10/100 Mbps of network traffic does not scale to 1000 Mbps.

While I don't know about that Vista problem, it's not true that Microsoft doesn't innovate in the kernel. Windows 7 introduced the "MinWin" kernel refactoring which significantly reduced the OS footprint, see the 2009 talk & slides by Mark Russinovich:



Changes in Windows 8 were smaller, but it did for example improve the efficiency of memory allocation:


Windows 7 and 8 both have lower system requirements than Vista while offering more features. That fact was widely advertised and acknowledged. Sure, not everything was improved, but it's not true that MS never fixes things for better performance. They simply have different business priorities, such as supporting tablets in Windows 8 rather than supporting 1000 Mbps.

The OP didn't say there's no innovation at all in Windows. He just said it's slower than in the Linux kernel, and as a consequence atm Windows lags behind Linux in performance-critical usage scenarios.

Can you please elaborate on this inherent design flaw? My understanding of NTOS DPCs is that they are quite similar to Linux tasklets/bottom-half interrupt handlers.

A simple problem is that DPCs are not scheduled in any way (there's some very primitive queueing and delivery algorithms), or more importantly, are not scheduled in any way that correlates with the thread scheduler's view. So between four cores, if Network DPC/ISRs are abusing Core 1, but the thread scheduler sees an idle-ish low priority thread using up Core 0, and Core 1, 2, 3 are all Idle (because it doesn't understand DPCs/ISRs), it'll pick Core 1 for this thread (just because round-robin). I'm omitting NUMA/SMT and assuming all of these are 4 regular cores on the same socket.

One could argue a better decision would've been to pick Core 2 and/or 3, but there's nothing to guide the scheduler to make this decision.

But it's not that "DPCs" are a design flaw. It's the way that Windows drivers have been encouraged by Microsoft to be written. You'll see most Windows drivers have an ISR and DPC. If you look at IOKit (Mac's driver framework), almost all drivers run at the equivalent of Passive Level (IRQL 0) outside of the ISR/DPC path -- the OS makes sure of that.

Because Windows driver devs are encouraged to write ISR/DPC code, and because this code runs at high IRQL, it means that bugs and inefficiencies in drivers show a much larger performance degradation. And when you buy a shit 0.01$ OEM NIC, and you have to do MAC filtering, L2 layering, checksum validation and packet reconstruction in your DPC, plus there's no MSI and/or interrupt coalescing, you're kind of f*cked as a Windows driver.

Well you can manually request your DPC to target a particular core before you insert it in the queue. So yeah, while its possible to avoid the situation, it requires the driver writer to be aware of this problem.

Also w.r.t to your other point, Threaded DPCs do exist (since vista) which run at PASSIVE_LEVEL.

Targeting the DPC provides absolutely no help -- how will you as a driver know what core to target to? It's typical design-by-MSDN: provide a feature that looks useful and people will quote, but actually provides no benefits. Drivers targeting their own DPCs are actually one of the leading causes of horrible DPC perf.

As for Threaded DPCs, not only does nobody use them in real life, but they MAY run at PASSIVE. The system still reserves the right to run them at DPC level.

Really the only way out of the mess is Passive Level Interrupts in Win8... Which likely nobody outside of ARM ecosystem partners will use.

Well I assume (depending on where the bottleneck is) spreading execution across cores will reduce DPC latency. Or maybe they could use UMDF.

Though, as a user of badly written drivers, I'm totally fucked. Its too bad the OS design does not allow for the user control any aspect of this (well apart from MaximumDpcQueueDepth).

Spreading DPCs across cores will lead to two possibilities:

- Typical driver dev: Knows nothing about DPC Importance Levels, and sticks with medium (default): IPIs are not sent to idle cores, so device experiences huge latencies as the DPC targeted to core 7 never, ever, gets delivered.

- Typical driver dev 2: Hears about this problem and learns that High/MediumHigh Importance DPCs cause an IPI to be delivered even to idle cores: wakes up every core in your system round-robin as part of his attempt to spread/reduce latencies, killing your battery life and causing IPI pollution.

Now I hear you saying: "But Alex, why not always target the DPC only to non-idle cores?". Yeah, if only the scheduler have you that kind of information in any sort of reliable way.

Really this is clearly the job of the OS. As it stands now, targeting DPCs on your own is a "fcked if you do, fcked if you don't" proposition.

You do get a few more variables you can play with as a user, but changing them will usually lead to worst problems than it would solve. Many drivers take dependencies on the default settings :/

Okay, but I'm not suggesting that spreading DPCs over multiple cores is the only solution. It is a solution in some cases. Originally I was merely responding to your point about not being able to schedule DPCs on other cores. You were speaking more generally from the OS schedulers POV, but I took it more literally.

Honestly.. I've spent countless hours hunting down bad drivers to fix audio stutter and other crap on my gaming PC. I've finally got DPC latency under 10microSec and I'm not touching a thing :)

Yes, my point was that the OS isn't currently doing it on its own, putting the onus on the driver developers -- which have limited information available to them and are pretty much unable to make the right choice -- unless we're talking about highly customized embedded-type machines, or unless the user is willing to heavily participate in the process.

It was just a simple example of something that could change for the better in terms of performance, but that probably won't because it's a significant amount of code churn with little $$$ benefit.

I was honestly surprised that a core-balancing algorithm was added in Windows 7, but that was done at the last second (RC2) and by a very senior (while young) dev that had a lot of balls and respect. Sadly he was thanked by being promoted into management, as is usual with Microsoft.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact