Hacker News new | past | comments | ask | show | jobs | submit login
Making Windows Slower Part 2: Process Creation (randomascii.wordpress.com)
179 points by nikbackm on Oct 16, 2018 | hide | past | favorite | 32 comments

Application Verifier is a feature of Windows for developers to use to check program correctness. Bruce says he turned on Full Page Heap, which makes every allocation go on a separate page, and tries to line it up so that the next page is not readable or writable. The idea is, if you go past the end of your memory allocation, the CPU raises a protection fault, and you catch the memory corruption bug at its source. Application Verifier can also do other things, like raise an error if your program ever passes an invalid HANDLE to a Windows system call.

Application Verifier is turned on by process name (i.e. "explorer.exe"). I think most of the time, developers turn on an Application Verifier feature, start just one process by that name, and then verify. But Bruce turned it on for a short-lived process that runs in a large build. So it runs thousands of times, and he hit this problem. It sounds like a great thing to fix, but I don't think it affects most developers using Application Verifier, and it definitely doesn't affect 'normal people' who are just using Windows on their laptop or desktop.

I do remember a coworker at another company complaining that their hard drive was filling up and it turned out the problem was the App Verifier log files. In that case the individual log files were enormous so a few dozen (hundred?) runs of the program under test was enough to be a problem. Different, but related, and perhaps more common than the problem which I hit.

If the log files aren’t cleaned up, you’ll hit this issue even if you don’t run it 30k times in a minute. I ran into a very similar problem with pytest (python) creating sequentially numbered temp dir/files under /tmp on linux. Result: full / partition.

My version of this - OS (Linux) crashes due to overclock stability testing filled EFI NVRAM, and I had to manually delete the crash dumps after running into puzzling issues later on. Apparently it was possible (on some hardware, at least) for the automatic dumps to brick your computer: https://bugzilla.redhat.com/show_bug.cgi?id=919485

This reminds me of the Accidentally Quadratic series: https://accidentallyquadratic.tumblr.com/

These issues are quite common and seem to grow out of that initial expectation that N stays small which later turns out to not be so true for cases discovered later on. Scrolling through some of the latest entries, even Chrome itself had some accidentally quadratic complexity.

Accidentally Quadratic looks interesting. I've certainly found lots of accidentally quadratic algorithms throughout my career - and written a few as well. It's amazing how badly they can blow up.

The thing that made the biggest difference for me when it came to experienced slowness was more or less always created by OEMs.

Bundled 3rd-party bloatware is one thing, but often the problem would be the actual drivers or utilities that came with the machine.

"Utilities" could usually be removed, but if the drivers are bad there wasn't much to to except try an older or newer release.

The second biggest problem was mandatory antivirus software installed by IT departments. (That is, on my machine. For a lot of people a bigger problem was adware and spyware.)

For the benefit of those who are younger than me: Reinstalling stock Windows used to be standard procedure.

Today the situation feels a lot better but I still prefer Linnux.

Microsoft really needs to solve or at least mitigate these issues.

I've been using Windows as my main development environment for mainly JavaScript and Python for like 5 years now, and I've felt every single one of these pain points (if you haven't, go read parts 0 and 1 of the same series! They are great!).

Some of them I've tried to mitigate (I'll be looking over my suite of tools to see if any "file watchers" are causing issues now, as well as using the fantastic looking tool from the article to dive into this slowness I feel), but for the most part it's just a cost of working on Windows for me, and it's pushing me away.

With a fraction of the power, I can get magnitudes more "perceived performance" out of Linux or MacOS, and I'm now starting to use linux VMs more to get that performance back (which is such a weird sentence!)

There are things about Linux that I hate as a desktop OS (for me they always seem to "deteriorate" over like 6 months and need a reinstall to stay stable, and I've on more than 5 occasions installed something and restarted only to find the machine unbootable...), and there's things I hate about MacOS (the keyboards, the hardware requirements, being linux-ey enough that it feels familiar, but not enough that stuff just works for me), but I can't deny that when I work in those OSs, I'm more productive and spend less time waiting for my machine to do little tasks, and therefore have much less aversion to them (at this point I've almost developed a phobia of having to move, copy, or rename large numbers of files on Windows, and it shows as a lack of organization in projects because of how flakey and time consuming it can be).

I know it won't be easy, but MS is going to start losing devs and eventually users if this keeps up and the other OSs keep pulling ahead in percevied performance.

(As an aside, and just to jump the gun a bit with the expected replies, I'm fairly certain my Linux-as-a-desktop-os issues are self inflicted or come from a lack of understanding of something on my part, but I haven't had the time to dive into what they are, and a slow OS is better than a non-functional one. Hopefully my increased usage of VMs now will iron out those issues without as much risk)

To be fair, the way that JavaScript development works these days, particularly anything involving npm, is batshit insane. I should not have 400 MB of hundreds of thousands of tiny files sucked down into my node_modules folder for a simple web front-end, with 17 different version of lodash required by different packages, and dependencies nested so deeply that I can't even delete the directory without resorting to specialized tools. It's a complete disaster as far as being able to verify your dependencies, and we're sort of just blanket operating on the assumption that everything is always licensed with an MIT-alike license.

> and dependencies nested so deeply that I can't even delete the directory without resorting to specialized tools

I think this hasn't been an issue on windows for a while

It is still an issue, but it's not nearly as common any more with javascript development.

npm a while back moved to a "smarter" way of deduping and laying out files in a more "flat" way to avoid the massive bloating and tons of copies of the same libraries. And other tools (like yarn) do the same (or better!).

And the long paths issue is solvable, but you need to use the funky `\\?\` path prefix, or the tool you are using needs to support long paths on windows (which annoyingly explorer.exe doesn't, so you are stuck using hacky workarounds or other tools to delete or modify those files.

Disk "thrashiness" been a problem with Windows for a long time. Back in the 90s and early 2000s when slower mechanical hard drives with access lights made HDD seeking and thrashing very noticeable, I always noticed that Windows boxes just thrashed the disk far more for trivial operations than Linux or even early MacOSX. (Never used Mac classic.) Windows has always been really clunky in terms of disk I/O. I've never been sure if the problem is in the kernel, filesystem, or higher up the stack.

Linux on the other hand is spectacular here. In my experience it's marginally better than Mac on the same hardware (and way better than Windows!). I threw Linux on a spinning disk Mac Mini a while back with a really slow disk and found that it didn't matter much. Macs with SSDs are so much faster in part because MacOS seems less efficient on the disk front and SSDs make that less of a bottleneck.

Linux is open source. Why hasn't everyone copied what Linux is doing? What is Linux doing?

I'm not sure what Linux does, but I'm glad Windows does not do that.

Building a kernel on Linux would bring down my machine's interactivity to zero, even the mouse pointer would stop responding and this isn't even like two/three decades ago - this was in 2014 and I've never since touched the pile of crap that Ubuntu is.

Not sure what was wrong, but that doesn't happen on any Linux system I have ever used. Then again I've never been an Ubuntu fan. Maybe they had some incredibly bizarre settings. Also seems like it could have been a driver problem.

I've had similar issues on multiple distros and boards over the years, easily replicable with any sort of intensive disk load. It's not one specific issue (as the underlying bugs get fixed and change over time), but simply that there are lots of little deadlocks in the IO subsystems. There are workarounds and kernel flags to help mitigate things, but no one should have to do that for a daily driver.

>being linux-ey enough that it feels familiar, but not enough that stuff just works for me

Maybe because it’s not Linux but a BSD-based Unix?

It's less that it's BSD based, and more that they are on super old versions of all the GNU tools due to license issues.

Needing to update bash and coreutils on MacOS sucks, and even then I can't rely on all other developers I work with having those same tools installed and up to date and installed the same way. So I deal with the provided versions and their differences and quirks.

Have you ever used a commercial UNIX, e.g. Aix, HP-UX,...?

Even OS X would feel like state of the art.

That’s a bit like saying you shouldn’t ask for a pay rise because there are starving kids in Africa.

Commercial unixes are basically dead, and for very good reasons. Good riddance.

Many Fortune 500 have a different point of view.

I made the distinction of "desktop os" on purpose. I don't know of any Fortune 500 companies that are using UNIX systems as their desktop OS for workstations...

Same with Python, same with Ruby, same with Node, etc. Shipping up-to-date coreutils wouldn’t save you from any of these.

> Microsoft could fix this problem by using something other than a monotonically increasing log-file number. If they used the current date and time (to millisecond or higher resolution) as part of the file name then they would get log file names that were more semantically meaningful, and could be created extremely quickly with virtually no unique-file-search logic.

Does Windows have no equivalent of `mktemp`? Getting unique identifiers is a common requirement.

It does, but it puts them in the %appdata%/Temp folder. Which can be deleted by disk space cleanup.

I don't think there's a built-in to get unique names in arbitrary folders, other than the idiomatic timestamp method described.

You have it backwards. GetTempFileName() only gives you unique names in arbitrary folders! If you want to use %APPDATA%\Temp, you have to call GetTempPath() yourself and pass that path into GetTempFileName().

Well, arguably putting the log files in the temp folder so that they get cleaned up would be an improvement - currently the log files are never cleared without user action, and most developers don't even know the log files are created.

I suspect that the other reason to not use some mktemp equivalent is that they wanted the names to have some logical order to them, it just turns out that ascending file numbers is a bad way to do this.

I've definitely seen this practice, the solution I saw to "fix" this: after 20 attempts fail. Silently. It was a static file, but it didn't track last file created (of course someone had copied this static file... But that's a separate problem).

  while True:
    fd = open(log-$timestamp-$pid-$randomhex, O_CREAT|O_EXCL)
    if fd:
Hypothetically unbounded runtime, O(1) in practice.

Don't even bother with the while True. If you get a bucket collision with a random number, just don't log anything, or just fatally crash because you can't trust the platform's PRNG or don't have enough available entropy or are just plain unlucky. Keep it proper O(1).

I'm starting to get the same kind of vibes as with Mark russinovich's sysinternals 'the case of the ....' blog posts. Same investigative style. The blog is unabashedly technical, hands on, and yet crystal clear.

The other two blog posts linked at the top are also entertaining and educational reading.

    Making Windows Slower Part 0: Making VirtualAlloc arbitrarily slower
    Making Windows Slower Part 1: Making file access slower

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact