

Finding and Fixing a Five Second Stall - jamesmiller5
http://the-witness.net/news/2012/12/finding-and-fixing-a-five-second-stall/

======
eric-hu
While I'm a web dev and not a Windows coder, this post is a great example of
why I come back to HN.

> From experience, I’ve learned that it’s always best to fully understand a
> problem before you fix it. If you just patch over its symptoms but never
> figure out what the problem really was, it will often come back to haunt
> you.

Philosophical-level statements like this really help me as a dev, since my one
of my biggest hurdles is getting traction for even simple 1-2 line refactors
that, in my opinion, can simplify code and improve maintainability. Far too
often, I hear "just get it done", and I like to be reminded with statements
like this of why "just get it done" doesn't always result in a better bottom
line for the company.

>For some reason, I took this concept to heart in a programming sense and have
found it is a good rule to code by. My version of the static discipline,
adapted for software, is that whenever you are making a modification to a
piece of code, you should always leave it in a state of stability equal to or
better than how you found it. And preferably the latter.

I try to do this where I can too, and again, appreciate the philosophical
explanation. It prepares me to more thoroughly and calmly explain my own work
style.

The author laments Windows development, and I can only contrast that with my
experience on an open source web stack (Ruby on Rails, Backbone.js). The
upside of working with open source is that I can crack open the gem (library)
I'm using and investigate the logic myself. I could fix the bugs or extend the
code to do what I want. That is pretty cool.

The downside of working with open source software is that documentation is
horrid. As the author states, man-hours are finite, and people who write OSS
typically don't want to spend their finite hours documenting their code. I
just take these as tradeoffs. It seems like Microsoft invests heavily in
documentation, but I guess the author's point is that they can still do
better.

~~~
shabble
I invariably end up reading the library/framework/whatever sources as well, if
only to clarify a particular point of documentation or something.

But, when bughunting, there's often the dilemma of whether to fix the bug at
the source (and hopefully, submit a patch to upstream), or to work around it
(which can lead to horrific hacks) in your app code.

With the former, you're stuck having to deploy custom packages with your fix
in until/if it gets accepted into upstream and a new release is rolled, so all
too often I've found myself doing both. I don't know what the solution is;
even a perfect patch to a superhumanly responsive maintainer is going to take
some time to merge and deploy.

On an entirely unrelated note, a superb way to drive yourself insane is to
edit the currently installed system version of the package in question, and
then forget that you did some time later.

~~~
eric-hu
I'm only about 1-2 years into web dev and OSS, and a couple of months into
feeling like I can contribute meaningfully to a gem. With that said, this is
my current ideal Github workflow:

\- fork the gem to make changes/fixes \- complete fix and point my app
dependency to my branch \- add testing and docs around my fix \- submit pull
request to project owner \- point my app dependency back to original gem when
merged

That all said, I consider the activity of a gem before I even bother
submitting a pull request. If it hasn't been touched in a month, or if there
are open pull requests from 1 month or older, I weigh heavily just writing a
'my team only' solution.

This is admittedly selfish, but I'm hedging my time and emotional energy. I
don't want to put care into crafting something that I believe to be useful
only to watch it sit around unused because someone else doesn't want to review
my work.

------
JoeAltmaier
Windows is saddled with baroque interfaces that are there only because they've
always been there. Microsoft OS developers try to simulate or stub out old
behaviors, in an effort to keep old app code running without breaking anything
too badly.

Such a crust of backward compatibility code weighs heavily on a decades-old
OS.

~~~
to3m
In fact, if the Windows guys had actually taken this backwards compatibility
thing seriously, and failed to tidy things up by deprecating DirectInput, this
problem probably wouldn't have arisen. DirectInput has a single simple flag to
disable the Windows key :)

"DISCL_NOWINKEY - Disable the Windows key. Setting this flag ensures that the
user cannot inadvertently break out of the application".

(I found this flag to work entirely reliably on Win2K and WinXP, and I'm
pretty sure it worked on Win98 as well. I think it also kindly disabled a
bunch of other things for you, leaving Windows with only Alt+Tab and
Ctrl+Alt+Del. Which is basically exactly what you want for anything full-
screen.)

------
vl
Oh, the joys of Win32 programming.

Back in the day I wrote networking service that was occasionally failing under
stress load with memory corruption. I was staring at rare dumps, re-reading
code and documentation to no avail. Finally, after few days of debugging, I
went ahead and located debug symbols and sources (oh, I happened to work at
Microsoft at the time) for the build of Windows we used at the test lab, it
was relatively arcane procedure back then (it got better later, but at the
time it was either symbol server doesn't work, or sources don't match).

So I setup gflags with memory guards and windbg and started waiting. After few
days of stress run it finally crashed again and there it was - comment in the
code of the crashing library saying "OVERLAPPED can be deallocated at this
time if completion ports are used, but we save this value to it here anyways
for backward compatibility reasons with bla-bla." Glad that you told me guys,
I guess now I have to rewrite it and refcount the OVERLAPPED! I still don't
know how I could debug it without the source access. (Ironically, it also
enlightened me on why service I implemented at the startup before was
occasionally crashing as well).

And don't even start me on implementing SSL support in the service.

------
breadbox
Reading this really brought back memories of when I was writing Windows code.
It really felt like half of any project was spent on working around the OS,
rather than with it.

~~~
georgemcbay
Now for a lot of developers half of any project is spent working around
various browser quirks -- how far we've come!

------
moconnor
I had the opposite experience yesterday. A colleague and I were tracking down
a rare race condition that became more likely with more threads. A signal
handler was being re-entered, which the man page suggested shouldn't happen
because by default the signal is blocked during handler execution.

Because this was Linux, we were able to read the kernel source to see what
actually happens. It turns out that the signal is only blocked for the current
thread, and Linux will deliver it to the first unblocked thread instead.

This misunderstanding also came from a less explicit API doc, although not
outright incorrect. But because we had full stack source we didn't need
guesswork, a good night's sleep or inspiration. Just curiosity and
persistence.

There's a zen-like calm to knowing you can always, always follow a bug through
the entire system, that everything happens for a reason and you can open an
editor and see what those reasons are.

I will miss this deeply if I ever return to closed-platform development.

~~~
unwind
Did you submit a patch for the documentation to make this particular more
clear?

------
owenfi
Is there some 'platform complexity' measurement metric along the lines of a
'code complexity' indicator? It would be useful for finding
platforms/languages/environments with less pain and as a way to track
improvements as well.

<http://owenfi.com/post/38255106579/windows-keyboard-apis>

~~~
themckman
You could probably take your normal metrics and compute them for the lines of
code that specifically contain framework/platform code? Can't say how useful
that will be, but I'd be interested to see how those numbers align with my
feelings.

------
wallflower
Reminds me of this one job where we had to call LockWindowUpdate to get an
Excel/VBA-based application to perform.

[http://msdn.microsoft.com/en-
us/library/windows/desktop/dd14...](http://msdn.microsoft.com/en-
us/library/windows/desktop/dd145034\(v=vs.85\).aspx)

~~~
kevingadd
LockWindowUpdate really isn't that unusual if you understand how painting
works in Windows. The terminology behind it is weird and so is the fact that
it's one window at a time, but it's not all that strange - it's an old enough
primitive that you probably couldn't justify making it a per-window flag bit.

------
chrisdevereux
My favourite win32 api is this little gem for flushing a file output buffer to
the physical disk: <http://support.microsoft.com/kb/148505>

~~~
mikeash
I pity the poor tech writer who had to constantly write "Commode.obj" while
maintaining a serious tone.

------
revelation
They are using a low-level hook to disable the expected behavior of a "get me
out of here" key.

They deserve all the stalls they can get. The reason it was so complicated to
get this behavior in the first place is because its heavily _discouraged_.

Not to mention that calling SetWindowsHookEx will flag you as a keylogger in
every antivirus snakeoil in existence.

~~~
speednoise
So heavily discouraged that it was in this Microsoft sample code linked in the
OP: [http://msdn.microsoft.com/en-
us/library/windows/desktop/ee41...](http://msdn.microsoft.com/en-
us/library/windows/desktop/ee416808%28v=vs.85%29.aspx)?

~~~
revelation
That code is offered as part of the extensive documentation that Microsoft
provides on its various frameworks.

It by no means sanctions what still remains a terrible hack. A badly written
hook could render a users system essentially unusable, and as I explained, its
a very common heuristic for anti virus software.

The point remains: the fact that you have to resort to globally hooking all
input to break the expected behavior of a key can not possibly serve as an
example for bad documentation or bad API design. It in fact proves the very
opposite: you have to work very hard to break what users expect.

~~~
jamesmiller5
Considering that neither party, those who don't want to enable the windows key
during full-screen and those who always depend on the windows key to function
normally, is served without an unstable and error prone method it would seem
the API is failing both parties in some cases.

I think the author would agree with your last statement. By forcing the
applications to handle this logic and break what some users expect the API
isn't serving the users or developers well.

------
RBerenguel
>> The great thing about programming on Windows is that it is the only
commercially viable platform where you can ship software to users without
getting approval from a giant bureaucracy <<

I kind of had the feeling I could write code for a Mac and ship it.

Wait, in some sense I have done so before with code for computing fractals I
shared with other Mac users in my department. It worked perfectly in my Mac
and their Mac.

Wait! I have downloaded and paid for apps straight from some developers
websites (top of my mind, or top of my menu bar: Hazel, 1Password, Arq)

So I just stopped my reading there, sadly.

~~~
agrona
I suspect the author means viable for video games, not software. I don't have
any proof of this, just the dearth of large-budget games make their way to
Mac.

~~~
RBerenguel
Yup, I guess that's the point, but dissing the whole platform because a kind
of software product is not particularly buoyant...

