Hacker News new | comments | show | ask | jobs | submit login
A Not-Called Function Can Cause a 5X Slowdown (randomascii.wordpress.com)
308 points by deafcalculus 8 days ago | hide | past | web | favorite | 125 comments





Windows has a real problem with huge monolithic dlls that inexplicably pull in other huge monolithic dlls. Especially when each dllmain() has its own weird side effects. It leads to bizarre behavior all the time such as this.

The weirdest thing that ever happened to me was when Visual Studio 2012 hung in the installer. After debugging with an older VS, it turned out the installer rendered some progress bar with Silverlight, which was hung on audio initialization, which was hung on a file system driver, which was hung on a shitty Apple-provided HFS driver. Uninstalling HFS fixed the installer.

Why does an installer even need audio when it never played sound? Because Microsoft dependencies are fucked.


Getting even more to the point: why does an installer even need to use Silverlight to render a progress bar!?!? As far as I know, these things have been there since 16-bit Windows and continue to work fine in Win10:

https://docs.microsoft.com/en-us/windows/desktop/controls/pr...


> continue to work fine in Win10:

putting the head in the sand like this won't make the problem go away. UX designers demand stuff that is smoother than win32 progress bars, that has spline-interpolation-like behaviour with some nice little tween at the end when going to the next page, with varying smooth color gradients all over the place.

go make this UI with pure Win32 primitives : https://youtu.be/v0GG8uh80V4?t=158


I hate animations like that.

All they do is distract me, waste my time while I'm waiting to see and interact with the next thing, and move my target out from under me while I'm trying to go click it. I always turn off all the animation settings right out of the box.

I wouldn't be surprised if one day soon this fad gives way back to a renaissance of clean, simple, instant UI's and everyone will feel like their devices got a lot snappier.


> I wouldn't be surprised if one day soon this fad gives way back to a renaissance of clean, simple, instant UI's and everyone will feel like their devices got a lot snappier.

facebook introduces artificial delays in the loading of their stuff because people wouldn't believe that it actually works when it is too fast


Similarly travel booking sites add an artificial delay otherwise travelers don’t believe the site tried hard enough to find the best deals.

TurboTax Online is just an hour long series of delays to give the impression that server farms are struggling to do work that could be handled instantly by a Commodore 64.

Wow. Do airlines do this as well? I literally will not fly American Airlines anymore because their website is so unbearably slow when looking for flights (not that United or others are much better).

> I wouldn't be surprised if one day soon this fad gives way back to a renaissance of clean, simple, instant UI's and everyone will feel like their devices got a lot snappier.

No, don't think so. Instant UI's are disorienting because there's no visual cue for what is about to happen, and they don't give your brain an intuition as to how the interface behaves (e.g. a sidebar slides in from the side rather than the top). We're used to instant UI's because we've learned the patterns over time, but they're very unnatural to new/ocassional users. Some interfaces overdo the animation timing though, they can be very fast and still give users enough of a cue to understand what happened.


Animations that change colors within a narrow range are fine, moving UI elements around is a huge anti pattern that annoys users without any benifits.

It’s the same issue as flat buttons, users need to know what’s clickable, and what has been clicked. They don’t need shading etc, but feedback has masssive utility. Sadly, our field generally lacks the kind of real feedback that corrects such incompetence.


> moving UI elements around is a huge anti pattern that annoys users without any benifits.

Still disagree, and I think you're thinking of really obnoxious examples. I'm imagining about a ~150ms animation for the following:

1. A sidebar sliding out from the side

2. Dialogs swooping in and out from the top of the screen

3. Form panes sliding out when replaced

These quick, almost-no-time visual cues are incredibly important at giving your interface a sense of presence and physicality. Without them, your UI elements are flickering in and out of existence without any perceivable reason, and can often leave the user wondering what changed.


That still just adds perceived lag to the system slowing people down as you can’t read something that’s not yet rendered or sliding around that quickly. 150ms adds up.

“Why did this get so slow?” Is a real and common complaint with such changes and why OS’s allow people do disable so many animations.


The "why did it get slow" complaints happen when the animations aren't fast enough. The brain doesn't need a lot of time to register motion, but many times designers will use a slow and laborious animation to really emphasize something appearing. See Google's material design: it is based on the philosophy that interfaces should behave physically, and you'll see that it feels incredibly snappy.

Check out these demos: https://material.io/design/motion/speed.html

Example: you can't effectively convey a menu becoming a button without an animation. Without an animation, the menu disappears and a button appears, but there's no reason to believe that they're related. A 100ms tween from one to the other is all you need. Our brains are very good at relating motion, not objects blinking in and out of existence.


100ms * even a minimal 20 actions per minute = 16 minutes wasted in an 8 hour work day.

Remember, those 100ms delay gaining information which also delays every action that happens after them. So, the cost is real.

Worse, people can break 200 actions per minute for well known systems. Which makes this far worse as it’s slowing people down more as they try and get more done.


The human brain dedicates quite a bit of hardware to some of the best visual diffing algorithms in the known universe. Animations just confuse those algorithms.

Um, this is absolutely false. There's actually a very famous optical illusion that proves that humans are extremely bad at visually diffing without animations. If you switch between two pictures but separate them with a black screen, you will have a hard time figuring out what changed. The brain is extremely good at detecting movement, not "find the ten differences".

Here's an article that shows gifs that demonstrate the illusion:

https://www.thisisinsider.com/spot-the-difference-braintease...


Separating things with a blank screen isn't representative of a visual element suddenly appearing/dissapearing though. If they have to resort to that to confuse you then it shows that the brain is good at spotting the difference, it's almost like adding an animation if anything, a fade out and back in. This is also with a far more complex scene than most programs, so again not representative.

Get rid of the blank screen in-between and tell me people have trouble spotting the difference.


> Get rid of the blank screen in-between and tell me people have trouble spotting the difference.

If you blink, you'll miss it. That's the problem: if you're not directly looking at the part that's changing (or near it), you'll miss it. Either way, our brains are not wired up to expect objects to suddenly blink into existence.


> UX designers demand stuff

Really? Demand? This is an installer that you will see maybe once a year, probably not even that. You will forget everything about it the moment it terminates, so why not use the default progress bar the OS provides and call it a day?


The installer is the first impression your user will have of your application.

And that hanging leaves a great impression :)

Then most companies have impressed on me that they hire chimpanzees to design/code Windows installers and the like.

go make this UI with pure Win32 primitives

I happen to be both a demoscener and Win32 programmer, so I know exactly how to do those sorts of effects without bloat. That doesn't mean I necessarily advocate doing it, however.

It's ironic that you mention fancy progress bars, when the ones in the VS installer are even less fancy than the built-in OS ones. 2 colours, no 3D border, not even a gradient.

One of my annoyances is applications which reinvent standard OS controls and UI. The OS has (or unfortunately, more like had these days... but that's a different rant) theming and customisation capabilities for a reason. Don't try to be different just for the sake of being different, you'll stand out and not in a good way. It's an accessibility issue too: what happens when your custom-rendered UI is run on a system where the user has configured a custom colour palette (because he/she is colourblind, or just finds some colour combinations easier to use)? The OS UI widgets all have a consistent style and change all together, more than can be said of custom UIs.

(In applications like games, it often makes perfect sense to build a custom UI because it's part of the gameplay experience; but for a productivity application, a custom UI just gets in the way.)


Granted Microsoft sets a particularly egregious example here, but the way we (don't) track dependencies is f'ed up across the whole software industry. It should be (nay, needs to be) possible - in an automated way, with a visual UI - to see a piece of software as a cloud of features and click and drag away or cut+paste and get JUST the parts of the code the feature ACTUALLY requires. Yes, like dependency inversion, but I'm talking about all the way down to the assembly instructions level.

Before you object that that's just not possible: yes, but only because we've already scrambled the eggs and can't unscramble them. The problem is the way we develop(ed) the whole stack (down to the assembly language and kernel calls level) is bass ackwards. Consider that the foundations of how everything is organized (vis a vis dependency injection or lack thereof in low level libraries) dates back to a time when computers were many orders of magnitude smaller and you were expected to have a good idea what was in the libraries you included. Now, the situation is reversed - you have no idea what's in the libraries you include and no idea what they include, etc. But we still program as if it should be the responsibility of the user to understand the library / program and what it uses transitively.


> a visual UI to see a piece of software as a cloud of features and ... get just the parts of the code the feature actually requires

oooohhh I love that idea. I wonder if this could be utilized for npm.


I've been thinking a lot about Software Archeology, trying to envision what that might be. I think we don't even have the tools yet to do a Site Survey. Somehow I can't fully envision what those tools would look like, but this is my best attempt to describe the vague picture in my head.

I'm seeing it being at least a bit feasible with package managers that allow the ability to config/build packages and install them locally according to needs by the developer .. something like Gobolinux with its bundle garden, combined with a configuration-management tool that can build every package with every possible configuration ..

Bit of a goose, though. As in, not sure it'll fly in its current (scrambled eggs) state. Perhaps a rewrite is really what's needed .. or maybe HaikuOS can deliver this without much ruffling of feathers?


Imagine if the provenance of every code line or machine instruction was tracked all the way from what the human wrote to what the machine executes. Combined with data flow analysis, itself combined with runtime analysis via a feedback process like Profile Guided Optimization.

The outcome would be somewhat like a package management tool, but one which can subdivide & prune a package even if the original author wrote it monolithically.

It could also provide feedback in the IDE showing which lines are "dead code" that don't contribute to the features which you've declared (via tests) are important.


Yeah I agree. What OP is describing would require an architectural shift in the way npm packages are built, IMO.

Sidenote, this is related, and pretty interesting: https://npm.anvaka.com


Visual Studio's Code Architecture Diagrams are a good idea for how this could look and work, I think. Its basically just missing a "Crop all code outside this dependency chain" for any given function

Isn't that pretty much what tree shaking does? Webpack has had that for quite some time now (except fully automated, without the UI).

I think tree-shaking is made for finding dead code? Seems like the goals are intertwined though.

Personally I don't like tooling that magically does transitive dependency management - it creates the problem of bloat by making it too easy to add stuff - stuff that typically never gets removed.

Doing dependency management at the shared library level rather than source level, also results in massive bloat - you might only need one call for a library, but the whole library and all it's dependences, and their dependencies etc etc get pulled in - a ridiculous mess - then you have the whole different versions of libraries problem.

Things like npm, maven etc are the problem, not a solution in my view.


Dependency management is a tremendous quality of life improvement, but you're not wrong: deep webs of dependency suck.

The Java ecosystem is a better about this than npm, but good public libraries need to make the aesthetic choice to keep minimal dependencies.

Helper/toolkit libraries like Guava or Lodash should pretty much be end user only. Your gzip library doesn't need em.

I get they're convenient, but if you're going to use 3 helpers in 5 places just fork those little bits and add them to your own codebase. DRY is for applications not libraries.

I like to see dependencies no more than 5 deep which is usually at the edge of manageability. For Java the critical path looks like:

Internal framework -> internal client lib -> RPC lib -> serialization -> bytecode hackery lib


There's also the problem that Apple refuses to write decent software or follow the rules on Windows (which is why your Apple HFS driver failed.) Microsoft, on the other hand, writes some of the best MacOS and iOS software.

Wait, so an IDE installer pulling in a web framework for .net that initializes an audio driver that requires HFS all So that a progress bar can be better than their own native one (pick an API whether wpf, win32, ...) one is somehow not Microsoft’s fault to some degree? I find it really hard to believe they write some of the best iOS/ macOS software if they release this shit on windows.

I've never gotten the hate for static linking, which would avoid issues like this. You'd always have the version of your dependencies that you're expecting.

Also, dynamic linking completely neuters LTO. There's not much point to (theoretically) saving a few megabytes of RAM when you're pulling in twice as much unused code.


> I've never gotten the hate for static linking

I believe most of the dislike for static linking can be traced to a single incident: a really bad (as in "remote code execution" bad) vulnerability in zlib, CVE-2002-0059. Back then, it was common to statically link to zlib (and zlib is a very popular library, the DEFLATE algorithm it implements being the "standard" compression algorithm), so instead of just replacing a single dynamic library and rebooting, everything had to be audited for the presence of embedded copies of zlib.

Quoting from a message from a few years later (http://mail.openjdk.java.net/pipermail/build-dev/2007-May/00...):

"[...] Updating the main system zlib package was easy, but finding all the embedded copies was a nightmare. I think we ended up grepping every binary and library in every package for symbols and other bits of code that looked like zlib. All the different versions involved compounded this. And when everything is found and patched and built and tested there's the cost of bandwidth to distribute all the extra stuff. So when people talk of removing static libraries they're talking about real costs -- time and money. After zlib there was a definite feeling of "never again"."


That's not a complaint about static linking though, that's a complaint about library embedding. Lots of Professional Enterprise Software as well as poorly designed games use a buttload of dynamic linking, but a few patches here and there and some API/ABI ignorance later, the libraries are impossible to replace.

Unfortunately, this is conflating static linking with bad dependency management. ("Just copy it into your own repo", is a different step, that is unnecessary to do the second step, "and link it statically")

There is no reason I see why you couldn't just have a build that produces the library .a file for zlib, which can then be pulled in as a dependency of your build / linked in statically.

I totally agree, they had a nightmare situation on their hands, but I don't think static linking was solely to blame :-)


I have 100s of binaries on my system. Which ones do I need relink?

You can write a script to look at package files' makedepends, and rebuild everything that uses the offending library.

> There's not much point to (theoretically) saving a few megabytes of RAM when you're pulling in twice as much unused code.

But there is. The unused code is still shared.


...and the libraries are typically demand-paged as well, so the truly unused stuff isn't even resident.

That would assume that used and unused code aren't mixed within pages.

That might make a marginal difference if you have large libraries that are used in a large number of applications that run concurrently. Otherwise, the sharing is completely pointless.

>I've never gotten the hate for static linking

vulnerabilities


Sure - but most computers are used by people who are the sole user, so local user to user security as a concept isn't useful.

Most code isn't part of the remote attack surface.

Also there would be no software bugs if bugs only existed in the past... ie bugs come with updates, not just fixed by updates.

There are trade offs, and Linux developers decided to make it what was easiest for them and remove the ability for the user to choose.

If there wasn't a need for stability and isolation then docker et al wouldn't exist.


Based on what cesarb said, that's a problem with the build system.

What was the HFS driver hung on?

It's a mystery how MS manages to exist. Or maybe it's just consumer market emergent chaos mastery.

Great post, but it made me sad because of how people limit themselves when it comes to tests.

Tests are the things you do after the code works, maybe if you have time. Managers generally only care about hitting a %. Colleagues dodge and avoid them. I don't remember any awards for tests or testers.

But tests are so powerful and so cheap...


Google generally has an excellent attitude towards tests, with most code reviewers refusing to approve changes that lack them. So that's great.

In this particular case the person who reported and then fixed the issue was quite pleased when the LLVM tests became 5x faster (and didn't cause hangs) because they could run them far more frequently, thus catching bugs even earlier in the cycle.


> But tests are so powerful and so cheap...

Tests are cheap on the front end but they incur a cost over time. Let's say your code works and you write 100 tests to cover everything. Then, somewhere down the line, you have to make a change to your code, you refactor, or whatever. Now you have to change all of your tests.

It's great the tests are there, and they serve a useful purpose, but they have an actual cost that can accrue to be substantial.


> But tests are so powerful and so cheap...

Most evidence suggests the opposite, though tests have become better in time.


Yeah, I firmly support testing, but if the code you are working on isn't explicitly written to make writing tests easier, than odds are (in my experience) that writing the tests will take longer than writing the code.

I have been known to write shitty code, but am always looking to make the next set of code less shitty or even redo something I'm working on if the time allows. I'm getting better, but have a long way to go. I have a feeling that my code would be bad to write tests against. What would entail making code easier to test against? Small bits of code meant to be included that does one specific thing? Wrapping that included coded into functions and/or objects and then creating a test suite to hit all methods etc?

I'm not a real proponent of TDD, but... try writing some tests first, when you develop a new feature. Or at least, write them sooner - before your implementation is "done"; write them when you think you figured out the design. See, tests are a new use of your API/interfaces - if you find it hard to test stuff that's important, maybe you didn't model the problem right?

Rules of thumb:

- Don't test implementation, test business workflows ("functional/component tests" are most important; unit tests are sometimes/often nice, but don't overdo it! If a simple refactoring breaks lots of tests, you're doing it wrong - testing implementation detail, not business logic); e2e tests are good and required, but are often slow and when they fail they don't necessarily always isolate the problem very well

- Seek a functional coding style, once you get used you'll find out it is easier to test and easier to reason about (no state means you just test the logic, and it's easy to unit-test too)

- Largely ignore code coverage (use it as a guideline to see whether there are important parts of your app that you ignored/ whether you forgot to add tests for some business workflows/ corner cases).

- Avoid test hacks like calling private methods via reflection or whatnot. Remember, tests exercise your APIs - you either have the APIs wrong, or you're trying to test irrelevant implementation details.

- Look for invariants, and test those. Things that should always be true. Often times, there are multiple acceptable results - avoid exact-match tests when that happens (e.g. if you make a speech-to-text system, don't test that audio clip X produces exact output Y; often times, a genuine improvement in the algorithm might break some of your tests).

TBH I think correct testing is much harder than the industry gives it credit for. Maybe that's why it's so rarely encountered in practice :)


I'll take umbrage with your last point:

"Look for invariants, and test those. Things that should always be true. Often times, there are multiple acceptable results - avoid exact-match tests when that happens (e.g. if you make a speech-to-text system, don't test that audio clip X produces exact output Y; often times, a genuine improvement in the algorithm might break some of your tests)."

No, you should test for exactly the output you have coded to generate. Otherwise, you do not know when you have behavior regression. You would expect to have to update the text-to-speech tests when you modify the text-to-speech algorithm. But if you're modifying another algorithm, and you start seeing tests break in the text-to-speech algorithm, you're probably introducing a bug!

A failed test means nothing other than the fact that you have changed behaviour -- and should therefore trigger on any behavioural change. It's your opportunity to vet the expectations of your changes against the actual behaviour of the changed system.


I respectfully disagree. A test should fail, ideally, only when behavior change is undesirable (i.e. contracts are broken). Optimizations, new features etc. should not break existing tests, unless old functionality was affected. And then there's the whole thing about separating functional from performance concerns - even degraded performance shouldn't fail the functional tests.

In fact, the example I gave was real-life - a friend from Google changed their speech recognition tests to avoid exact matches and it was a significant improvement in the life & productivity of the development team.

[edit] There's also another damaging aspect of exact-match tests: they often test much more than what's intended. Take for instance a file conversion software (say from PDF to HTML). You add a feature to support SVG, and test it with various SVG shapes - it's easy & tempting to just run the software on an input PDF, check that the output HTML looks right (especially in the relevant SVG parts), and then add an exact-match test. Job done, yay! Except that, you do this a lot, and it will slow you down like hell. Because when it fails in the future, it's very hard to tell why (was the SVG conversion broken? or is it some unrelated thing, like a different but valid way to produce the output HTML?). Do this a lot and you won't be able to trust your tests anymore - any change and 400 of them fail, ain't nobody got time to check in depth what happened, "it's probably just a harmless change, let me take a cursory look and then I'll just update the reference to be the new output".


You're building a bit of a straw man. If you have 400 tests that fail with a single behavioural change, why are you testing the same thing 400 times? And you don't need an in depth investigation unless you didn't expect the test to break. And if you did expect the test to break, then you ensure that the test broke in the correct place. If a cursory glance is all you need in order to confirm that, then that's all you need. Tests are there to tell you exactly what actually changed in behaviour. The only time this should be a surprise is if you don't have a functional mental model of your code, in which case it's doubly important that you be made aware of what your changes are actually doing.

In your Google example, would their tests fail if their algorithm regressed in behaviour? If it doesn't fail on minor improvements, I don't see how they would fail on minor regressions either.


400 is an arbitrary number, but it's what sometimes (often?) happens with exact-match tests; take the second example with the PDF-to-HTML converter, an exact match tests would test too much, and thus your SVG tests will fail when nothing SVG-specific changed (maybe the way you rendered the HTML header changed). Or maybe you changed the order your HTML renderer uses to render child nodes, and it's still a valid order in 99% of your cases, but it breaks 100% of your tests. How do you identify the 1% that are broken? It's very hard if your tests just do exact textual comparison, instead of verifying isolated, relevant properties of interest.

In my Google example, the problem is that functional tests were testing something that should've been a performance aspect. The way you identify minor regressions is by having a suite of performance/ accuracy tests, where you track that accuracy is trending upwards across various classes of input. Those are not functional tests - any individual sample may fail and it's not a big deal if it does. Sometimes a minor regression is actually acceptable (e.g. if the runtime performance/ resource consumption improved a lot).


> It's very hard if your tests just do exact textual comparison, instead of verifying isolated, relevant properties of interest.

I think you have this assumption that you never actually specified that exact-match testing means testing for an exact match on the entire payload. That's a strawman, and yes you will have issues exactly like you describe.

If your test is only meant to cover the SVG translation, then you should be isolating the SVG-specific portion of the payload. But then execute an exact match on that isolated translation. Now that test only breaks in two ways: It fails to isolate the SVG, or the SVG translation behaviour changes.

> In my Google example, the problem is that functional tests were testing something that should've been a performance aspect. The way you identify minor regressions is by having a suite of performance/ accuracy tests, where you track that accuracy is trending upwards across various classes of input. Those are not functional tests - any individual sample may fail and it's not a big deal if it does. Sometimes a minor regression is actually acceptable (e.g. if the runtime performance/ resource consumption improved a lot).

... "Accuracy", aka the output of your functionality is a non-functional test? What?

And I never said regressions aren't acceptable. I said that you should know via your test suite that the regression happened! You are phrasing it as a trade-off, but also apparently advocating an approach where you don't even know about the regression! It's not a trade-off if you are just straight-up unaware that there's downsides.


> That's a strawman

It wasn't intended to be; yes that's what I meant; don't check full output, check the relevant sub-section. Plus, don't check for order in the output when order doesn't matter, accept slight variation when it is acceptable (e.g. values resulting from floating-point computations) etc. Don't just blindly compare against a textual reference, unless you actually expect that exact textual reference, and nothing else will do.

> "Accuracy", aka the output of your functionality is a non-functional test? What?

Don't act so surprised. Plenty of products have non-100% accuracy, speech recognition is one of them. If the output of your product is not expected to have perfect accuracy, I claim it's not reasonable to test that full output and expect perfect accuracy (as functional tests do). Either test something else, that does have perfect accuracy; or make the test a "performance test", where you monitor the accuracy, but don't enforce perfection.

> And I never said regressions aren't acceptable.

Maybe, but I do. I'm not advocating that you don't know about the regression at all. Take my example with speech - you made the algorithm run 10x faster, and now 3 results out of 500 are failing. You deem this to be acceptable, and want to release to production. What do you do?

A. Go on with a red build? B. "Fix" the tests so that the build becomes green, even though the sound clip that said "Testing is good" is now producing the textual output "Texting is good"?

I claim both A & B are wrong approaches. "Accuracy" is a performance aspect of your product, and as such, shouldn't be tested as part of the functional tests. Doesn't mean you don't test for accuracy - just like it shouldn't mean that you don't test for other performance regressions. Especially so if they are critical aspects of your product/ part of your marketing strategy!


OK I'm caught up with you now. Yes, I agree with this approach in such a scenario. I would just caution throwing out stuff like that as a casual note regarding testing without any context like you did. Examples like this should be limited to non-functional testing, aka metrics, which was not called at all originally. And it's a cool idea to run a bunch of canned data through a system to collect metrics as part of an automated test suite!

There is a single cause of all unit-testing difficulty that I have ever seen: isolation. Functionality must be isolated to be testable.

Examples:

* You don't isolate side-effects, such as file creation. Now you must execute file creation every time you execute that functionality. Even though (I assume) you are not interested in testing the file system, you are now implicitly testing it.

* You don't isolate external dependencies, such as requiring a database connection. Now you can't test without standing up a temporary database.

* You don't isolate logic, such as doing several complicated validations in sequence within a single functionality. Now in order to test any of the validations, you must also arrange to pass all the previous validations.


The issue I run into most often is that a system or set of functions I want to test relies on or resides in a complicated object that requires proper initialization. This means that to run tests I need to manually perform all of this initialization, including spoofing complicated internal data structures and whatnot.

The fix? Better compartmentalization in a lot of cases. Or writing classes with a viable form of default initialization in mind.


Yes I think this is the problem. The attitudes I mentioned come from people who tried testing and got fed up because of the many legitimate issues they had.

Unit testing in particular is neither powerful nor cheap.

Unit tests typically test only a handful of inputs, which cover a completely insignificant fraction of possible inputs. This is the opposite of powerful.


The parent comment says “tests”, not “unit tests”. I would expect a set of unit tests where appropriate, integration tests, system tests, regression tests, fuzzing, etc.

“Powerful” might be a bit subjective here, but in order to write tests, it helps to have a set of theories about what kinds of bugs might be in your program. You can then write tests that identify with high probability the presence of those specific bugs or categories of bugs. In that narrow sense, tests are statistically powerful. They reject the null hypothesis (this type of bug does not exist) with high probability when it is false. I don’t want to conflate this definition of statistical power with the colloquial use of “power”, though.


Bruce's write-ups are just great, sheer joy of reading.

True - and also remind me why I will stay away from windows

I regret to inform you that, while I think the current Windows development methodology could use better testing (to put it mildly), things like [1][2] still crop up in other platforms.

[1] - https://bugzilla.kernel.org/show_bug.cgi?id=201685

[2] - http://lkml.iu.edu/hypermail/linux/kernel/1811.2/01328.html


[2] was resolved quickly. A much revised version has landed recently.

[1] is a strange bug, because the devs have consistently been unable to reproduce it, despite constantly looking over the issue. Users of ZFS have hit problems, also, suggesting that it is not an EXT4 bug, but a very subtle problem elsewhere in the block subsystem.


[2] was resolved after it landed in stable release branches, which is a bit late for how much impact it had.

[1] was, in fact, root-caused to a blk-mq bug.

https://patchwork.kernel.org/patch/10712695/


Don't see anything wrong with [2]. Security by default, if you want to disable protection to get some performance on defective hardware, that should be an active choice

The problem(s) with it were that it was underspecified _how_ negative the workload impacts were for some cases, and that it ended up in stable branches default-on before getting reverted for the impact.

As Linus remarked in the thread, the people who were paranoid about the security implications already went the BSD route and disabled SMT, while the people who don't worry about it that much suddenly get a nasty perf impact by default.

To quote an Intel rep in the thread: "Using these tools much more surgically is fine, if a paranoid task wants it for example, or when you know you are doing a hard core security transition. But always on? Yikes."

Yes, your system should default to being secure. But there's a sliding scale when deciding on security flaw mitigations of user disruption versus level of security given.

In this case, the user disruption was medium-high, and the benefits did not outweigh that, so the intersection of the two factors was refined.

If this were something every skiddy toolkit were leveraging to exfiltrate astonishing amounts of data, this might fall the other way. But right now, that's not where we are.


You're only hearing about this issue because they fixed it. And you know very well that every OS has had problems that are just as dumb from time to time.

That may well be, but DLL's on Windows have been dumb for a long, long time. And they're still dumb.

Honest question, how does this particular problem differ from classic dependency hell? It sounds like exactly the same sort of issue I see on Linux blogs when e.g. a broken PNG display library causes crashes in a program that doesn't display PNGs (but uses a library that uses a library that does).

Its correct that there are similarities, but the difference in this case is that its a system-wide policy enforcing the use of a system-provided DLL that is causing the issue (everywhere) as opposed to your scenario being more of a userspace issue that, an admittedly skilled, user can get themselves out of .. In your case, I'd fix it with some hot LD_PRELOAD action - I'm not sure how I'd endeavour to address this on a production Windows system, however.

Lol, because of bugs? I sense more bias and it's not bug related.

I was lucky enough to have a class with him at DigiPen. He's an optimization genius.

This reminds me of a desktop heap exhaustion problem IE would regularly trigger for me back in the XP days:

https://weblogs.asp.net/kdente/148145

It all came down to a registry setting that MS neglected to bump up much from the original Win 98 defaults. IIRC the conservative default even persisted into Win 7.

That 3MB limit would bring down my 48GB system...


This bug is fixed in the latest insider builds at least.

Using the author's own testing tool:

With the Spring 2018 release:

    F:\tmp>.\ProcessCreatetests.exe
    Main process pid is 46940.
    Testing with 1000 descendant processes.
    Process creation took 2.309 s (2.309 ms per process).
    Lock blocked for 0.003 s total, maximum was 0.000 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 0.656 s (0.656 ms per process).
    Lock blocked for 0.001 s total, maximum was 0.000 s.
    Average block time was 0.000 s.

    Elapsed uptime is 7.08 days.
    Awake uptime is 7.08 days.

    F:\tmp>.\ProcessCreatetests.exe -user32
    Main process pid is 44584.
    Testing with 1000 descendant processes with user32.dll loaded.
    Process creation took 2.624 s (2.624 ms per process).
    Lock blocked for 0.014 s total, maximum was 0.001 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.617 s (1.617 ms per process).
    Lock blocked for 1.122 s total, maximum was 0.648 s.
    Average block time was 0.026 s.

    Elapsed uptime is 7.08 days.
    Awake uptime is 7.08 days.
With an insider build:

    C:\tmp>.\ProcessCreatetests.exe
    Main process pid is 9928.
    Testing with 1000 descendant processes.
    Process creation took 2.440 s (2.440 ms per process).
    Lock blocked for 0.003 s total, maximum was 0.002 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.306 s (1.306 ms per process).
    Lock blocked for 0.003 s total, maximum was 0.001 s.
    Average block time was 0.000 s.

    Elapsed uptime is 4.78 days.
    Awake uptime is 3.93 days.

    C:\tmp>.\ProcessCreatetests.exe -user32
    Main process pid is 14144.
    Testing with 1000 descendant processes with user32.dll loaded.
    Process creation took 4.756 s (4.756 ms per process).
    Lock blocked for 0.022 s total, maximum was 0.004 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.823 s (1.823 ms per process).
    Lock blocked for 0.003 s total, maximum was 0.001 s.
    Average block time was 0.000 s.

    Elapsed uptime is 4.78 days.
    Awake uptime is 3.93 days.

There's no longer a difference in lock blocked time whether or not you load user32 during process destruction. Nor does the very obvious mouse stuttering still happen.

Woah! That is fascinating. I had heard nothing about this. Your uptime is a bit shorter on the insider build but the change in lock blocking is too dramatic to be explained by that.

I notice that all of the elapsed times are worse on the insider build - is that perhaps a slower machine? And are there enough CPUs on that machine to trigger the bug? That is, I'd like to believe that the bug is fixed but I'm skeptical.


True, the above comparison might not have been the most scientific :P The Spring 2018 results were run on a much more powerful desktop than the surface pro 3 used for the insider results.

Here is the results for the April 2018 Update rerun on the same surface for a more apples to apples comparison:

    C:\tmp>.\ProcessCreatetests.exe
    Main process pid is 6448.
    Testing with 1000 descendant processes.
    Process creation took 4.382 s (4.382 ms per process).
    Lock blocked for 0.007 s total, maximum was 0.000 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 0.592 s (0.592 ms per process).
    Lock blocked for 0.002 s total, maximum was 0.002 s.
    Average block time was 0.000 s.

    Elapsed uptime is 0.01 days.
    Awake uptime is 0.01 days.

    C:\tmp>.\ProcessCreatetests.exe -user32
    Main process pid is 11364.
    Testing with 1000 descendant processes with user32.dll loaded.
    Process creation took 4.707 s (4.707 ms per process).
    Lock blocked for 0.009 s total, maximum was 0.000 s.
    Average block time was 0.000 s.

    Process termination starts now.
    Process destruction took 1.248 s (1.248 ms per process).
    Lock blocked for 0.904 s total, maximum was 0.902 s.
    Average block time was 0.181 s.

    Elapsed uptime is 0.01 days.
    Awake uptime is 0.01 days.

The mouse movement hanging behaviour is easily evident with the April 2018 release. I didn't notice the same on the insider build.

Great read but what’s up with the 6 banner ads interspersed in the content page ? Maybe a few too many ?

With NoScript blocking execution of javascript, there are zero banner ads interspersed in the content page.

Maybe it's mobile view or something? No banners in text block on my laptop. Some in the right pane, some down below.

I get the congrats Android user redirect ads on mobile. So, it seems fairly over the top

AdBlock might help.

Also good to know: I seem to recall similar if not worse slow-downs happen if you try to pull in Windows Sockets (WS2_32.dll).

Unless the compiler can prove you're never going to run into that case, it can't remove the call, and because the call is an imported function it still has to create the import and have an entry in the IAT for it, so it needs to be resolved at load time. Not all that surprising IMHO.

The author even said "we immediately knew what to do", which is kinda the contrary of surprising.

The interesting bit is not that a slow loading dependency got imported anyway, but why that dependency is slow and that it can get imported very easily indirectly.


Is there a way to tell whether a .net program is also affected by this behavior?

I just checked a C# app I had laying around with depends, and it depends on user32.dll which depends on gdi32.dll.

It's hard to imagine a Windows program that wouldn't depend on one of those critical DLLs. The only thing that saves us is that we don't often create and destroy hundreds of processes at a time.


wow, how come the mere presence of a function causes a DLL to get loaded? Is it because in order to compile, the DLL (or its export definition) needs to be present, and the compiler does some magic because of that?

Lazy binding isn't without downsides, it requires internal synchronization of its own, which means it's possible to write multithreaded programs that will suffer latencies due to lock contention during symbol resolution. Depending on OS (not sure this applies to Windows), it can also mean what used to be fatal startup errors are delayed long into process life

Because it was linked in, extra GDI dlls were also linked in leading to GDI cleanup.

The linker has no idea that the function is unreachable, it has to link it in to resolve the external symbols.


The title is really misleading. "Dependencies cause overhead even when unneeded" would be more accurate.

That strips out the interesting part, though. A 5X slowdown for an actual, useful, real-world project is interesting. Vague “overhead” could mean nothing more than some bigger binaries.
gcb0 8 days ago [flagged]

tl;dr the method presence pulls in a dependency that runs slow buggy code.

so click bait.


This is a stupid dismissal of a well-written piece of investigative debugging. This article is the kind of thing I'd like to see more of on HN.

I also liked the article but to be fair, the title is actually a little clickbaity. The slowdown has nothing to do with not-called functions, it's just about DLL-dependencies.

Not clickbaity at all, it's the essence of what makes the problem interesting. The fact that a DLL dependency can slow down your program at shutdown is not at all intuitive, particularly when it's a system DLL that should be bullet-proof.

I don't think that's quite right: the method presence pulls in a dependency, but the code is not buggy and the slow bit is the loading of the dependency itself.

Technically it's not even the loading of the dependency but its unloading: cleaning up GDI objects[0] on process destruction has become much slower in W10AE and a system-global lock is held during that event.

[0] some of which are created automatically when a specific dll is loaded, possibly transitively


That sounds like there might be other programs with severe regressions out there

That doesn't begin to describe the problem: The presence of an unused dependency causes the entire operating system to briefly stop processing input events. A background process could have this problem and you would see performance apparently drop through the floor for completely unrelated processes.

I honestly would have been tempted to give this article the clickbait title: "Windows has a DoS vulnerability in the GDI subsystem".


Imagine if you use layers and layers of abstraction (e.g. Java lasagna programming style), how slow it could be

> "The first fix was to avoid calling CommandLineToArgvW by manually parsing the command-line."

> "The second fix was to delay load shell32.dll."

If your build pipeline is continuously spawning processes all-over, to the point "delay loading" makes a significant difference - it's time to start re-evaluating the entire pipeline and the practices employed.


Do you know of a build system that can handle a source tree as large as an entire web browser without spawning a lot of processes?

It's hard to tell what, if anything, you are recommending here. Pass thousands of files to a single compiler invocation? Ignore the problems and stop trying to make process creation and clean-up faster?


[flagged]


> Are they employing caching of binaries/object files? Are they running a continuous build? Which parts of the build actually take the most time to go through? Can they benefit from building concurrently on multiple machines?

So you're suggesting things they already do.

And if you're using multiple machines, you probably want to max out each machine, so it matters a lot if there's code that falls over and dies when applied to a large number of cores.


> Pass thousands of files to a single compiler invocation?

Sure. Or pass it a file with all the filenames. Or have the compiler work as a server that takes compilation requests over a socket. It's not like passing thousands of filenames between two processes is a deep unsolved problem.


Or just spawn thousands of processes which has been done for the last 40 years without particular issues.

So, the solution to concurrency problems is to serialize everything?

"Concurrency is everything serialised, properly."

The posts we are replying to here seem to have a very narrow concept of what 'properly' entails in this case.

The second paragraph mentions that this is about a test suite. It has to “spawn processes all-over” to do its job.

Technically, it doesn't has to. You can put your whole test suite into a single executable, making it run extremely fast for C/C++ projects where process startup is often much slower than running a single test unit. This approach is used in some OSS projects I've worked on but it also has its downsides.

Some tests (e.g. unit tests) can run like this, yes. Other tests, including some benchmarks, are not meaningful when wrapped in a single process. Invocation speed and specifics matter.
kornish 8 days ago [flagged]

Did you read the entire article? The whole point was that "delay loading" a particular DLL prevents a static analysis in the compiler from inserting hooks to perform expensive operations.

> "Did you read the entire article? The whole point was that "delay loading" a particular DLL prevents a static analysis in the compiler from inserting hooks to perform expensive operations."

I actually have read the entire article. Have you?

Your explanation has absolutely nothing to do with the performance gains observed. Moreover, in the context of delay-loaded DLLs, your explanation actually makes no sense whatsoever.

Delay loaded DLLs, a linker/loader optimization Microsoft has offered since the days of C++ 6.0 (1998), simply means most process invocations in OP's case won't actually end up loading said DLLs, reducing the amount of time spent in DLL_PROCESS_ATTACH/DLL_PROCESS_DETACH (and specifically during destruction, in the Kernel).

masklinn 8 days ago [flagged]

> I actually have read the entire article.

You provide no evidence of it, and ample evidence to the contrary.

> Delay loaded DLLs, a linker/loader optimization Microsoft has offered since the days of C++ 6.0 (1998), simply means most process invocations in OP's case won't actually end up loading said DLLs, reducing the amount of time spent in DLL_PROCESS_ATTACH/DLL_PROCESS_DETACH.

It also avoids loading gdi.dll, which avoids creating a bunch gdi objects, which avoids taking the "destroy gdi object" codepath on process termination… which is the bit that is both slow and globally serialised.

TFA's final section even demonstrates the difference it makes: 30% increase in start time including 300% increase in lock contention time but 200% increase in shutdown time including 400% increase in lock contention. The process shutdown is almost entirely serialised due to (as TFA and its predecessor explain) a system-wide lock held during GDI cleanup.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: