
The Windows Shutdown crapfest (2006) - shubhamjain
http://moishelettvin.blogspot.com/2006/11/windows-shutdown-crapfest.html
======
bradford
I started at MS during Vista and I've been involved (sometimes tangentially)
with Windows ever since. This is all my opinion, but It's been very
interesting seeing the decision making process change over time.

If I had to summarize the change, I'd say that it's evolved from an expertise-
based system to a data based system. The reason why eight people were present
at every planning meeting is because their expert opinion was the primary tool
used in decision making. In addition to poor decisions, this had two very
negative outcomes:

1) reputation was fiercely fought for. Individuals feared that if they were
ever incorrect, the damage to their reputation would limit their ability to
impact future decisions and eventually lead to career death. Whether this
actually happened or not is irrelevant; the fear itself caused overt caution
and consensus seeking.

2) In the absence of data, an eloquent negotiator is often able to obtain
their desired outcome, no matter how sub-optimal that outcome might be.

Nowadays, I see data used a lot more, hence the telemetry. A while ago, in
response to criticism of Windows telemetry habits, I wrote:

"Telemetry is, by now, a fundamental part of the engineering process. Products
that don't incorporate it are going to be clobbered by products that do.
Microsoft didn't start this paradigm, but I think they had to incorporate it
in order to stay competitive."

I understand that telemetry is still a delicate issue, but the expertise based
decision making of yester-year truly sucked for many, and I don't see a viable
alternative.

~~~
codedokode
> Telemetry is, by now, a fundamental part of the engineering process.
> Products that don't incorporate it are going to be clobbered by products
> that do.

I am not sure this is correct. Telemetry can show what users do but it won't
show why they do it. Users don't click the button: maybe they don't need it
but maybe they don't understand what it does. Or maybe there are too many
buttons and users get lost.

Also making a decision based on telemetry is making a desicion based on
opinion of non-experts. There is a quote by Ford “If I had asked people what
they wanted, they would have said faster horses.”. Can you make something
innovative this way?

For example if we take a code editor like Sublime Text which in my opinion has
very good UI, and it has no toolbar - can you use telemetry as an argument for
removing a toolbar? No, because if there is a toolbar, there will be users who
would use it without realising that they could do better without it. You need
an expert to make better UI, not telemetry.

~~~
shawnz
Where does the parent claim that telemetry is the only factor you should be
basing your decisions on?

------
theoutlander
I can relate to this. It was a common theme across Microsoft. In one instance,
the team I was part of was responsible for the initial integration of Bing and
Facebook. We had about 24 dedicated people on the team (a few
partner/principal group managers too!). Our team was considered agile because
we released most features every 4 months (this was around 2009 I think)!!!

Anyway, we had a hackathon at Facebook with Zuckerberg and the FB team. There
were all these big (redundant) talks by every manager up that chain about how
this is the most important integration, etc.

Guess what came out 4 months later??? A like button on the Bing home page! The
kicker was that the code for the widget was picked from the facebook
developers portal.

~~~
ajross
This isn't purely a software engineering mess though. The PC architecture for
"off" is a mess at the hardware level too.

I mean, long ago you just had a physical switch on the power supply. But then
filesystem authors invented buffering and that wasn't safe anymore. So now
there were two ways to shut the machine off: the "safe" way and the old way
(where the "old" way was still available via e.g. holding the power button
down for 4 seconds).

Then, we wanted to start using laptops and carrying them around. But booting
the whole computer again and again every time you changed your seat became a
chore, so software people invented this cool new trick where you could dump
memory to disk and restore it later, so "hibernation" was born and we now had
three ways to shut down.

Then the hardware people jumped in and pointed out that really the problem
here is just the CPU. DRAM refresh is cheap, so there's really no need to dump
the RAM at all. Let's just shut the CPU off and come up with a
hardware/firmware/OS/driver hack (yeah, it touches basically everything) for
powering on into a known DRAM configuration. Much faster! And now we had a
fourth way to shut down.

(OK, this is a little spun. In fact suspend to disk and suspend to RAM landed
nearly simultaneously in the PC world, with different manufacturers picking
different horses. Then of course ACPI came in and standardized both, forever
locking us into not one but two kinds of suspend.)

Then of course, we had a paradigm shift where "mobile" OSes revisted this
whole scheme and threw it out the window. The hardware people making mobile
chips designed the clock and power gating logic such that the "suspend to RAM"
happens essentially every time the CPU reaches an idle state, and never has to
be "entered" explicitly the way ACPI S3 is. And now PCs are shipping with this
scheme too even on systems where ACPI still works in a traditional way. So,
yeah. FIVE.

I mean, Microsoft surely made a UI mess out of this. But it's not like they
were handed a simple problem to begin with.

~~~
pavement
If I don't like what a computer is doing, I will frequently just yank the
plug, pull the battery, flip the circuit breaker, and fuck the rest.

All this buffering and fault tolerance is bullshit at the consumer level. If
the disk's file table gets fucked up, then let it burn.

I'll format the disk and re-install your stupid operating system as I see fit,
whenever I want, and keep my actual data safe and sound far away from someone
else's stupid hung process, until it produces actual results that I can copy
into place, whenever appropriate.

These are lessons I learned throughout the late 90's, in the face of countless
blue screens, before migrating to linux.

    
    
       It is now safe to turn off your computer.
    

Indeed.

~~~
glangdale
There was a nice paper a while back on the topic of 'crash-only systems'. That
is, the idea would be that no software system was permitted to have a
dignified shutdown path at all. Every system would be turned off with the
equivalent of the power switch or "kill -9".

[https://www.usenix.org/legacy/events/hotos03/tech/full_paper...](https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf)

The point that was made was that frequently the recovery paths were on net
faster (for a shutdown/reboot cycle) than the "durpee dur, I am slowly
shutting myself down paths" and that you have to build a good crash recovery
path anyhow.

~~~
lathiat
There's a storage vendor (hilariously for thread context) that took this to
heart. They don't have a power button, just a switch. You turn it off.

I think it was Tintri?

~~~
pixl97
I'm pretty sure Dell MD3200's are like that.

------
feelin_googley
"I was on the team responsible for improving performance for Outlook 98. Right
after Outlook 97 shipped, my lead printed out every single function call made
when Outlook started up. It was a stack of paper about a foot high. He spent
about a week going through this printout with a highlighter, looking for
"stupid shit", as he called it. Turned out, there was plenty of it. It further
turned out that most of this "stupid shit" wasn't a result of any one
programmer making a dumb decision; most of it was a result of an architecture
and a mindset which tried to prevent developers from shooting themselves in
the foot by gratuitously abstracting away "dangerous" things like memory
management. Again: needless abstraction will bite you in the ass if you're not
careful. My lead and I spent months going through the Outlook code exorcising
"stupid shit" \-- removing code which hid what it was actually doing, getting
the code cleaner and closer to the machine, making everything more explicit,
tighter, and less generic."

The above is excerpted from a later blog post by the same author.

~~~
drzaiusapelord
On the flip side, removing safety from MS Office devs has led to a pretty
significant quality and security decline. Pretty much every Office version we
upgrade to is tested to hell and back and we usually delay adoption for three
years. By then its barely functional after three years of service packs and
hotfixes. We just migrated 100 people to 2013 and are still finding issues.
Worse, this mentality spread to updates, which we have to delay by 2 weeks or
more due to breaking things, ironically usually Outlook.

Would I trade a slower start time and less perky performance for security and
stability? Absolutely. Maybe those old timers knew what they were doing. Maybe
the care vs quick equation makes more sense to lean on care than quick for
many types of software, especially in our connected world. Maybe the trade-off
didn't make too much sense during the age of Pentium II, but in the age of
i5's and i7's on every desktop with 8+gb of ram? I doubt it would be as
noticeable.

I think we're dangerously experimenting with a "do it quick, ship, and maybe
fix it later" mentality that might make sense with startups hurting for
dollars and a MVP, but for dinosaurs like Microsoft, its a liability. I wonder
how many "previous employees didnt know what they were doing so we fixed
everything" anecdotes end up with unhappy customers in the end? Perhaps most.

~~~
nxc18
Microsoft fired a huge number of their SDETs (and they no longer actively
recruit them). Since then, I've seen a huge decline in quality among office
and windows. Devs will tell you they're just as good as testing as the
testers, but I don't buy it. Also, if you're a dev, do you really like testing
enough to give it just as much attention as you do feature implementation?
Again, I don't buy it.

~~~
Clubber
Yes, I'm a dev and I know I don't have the patience for testing. Devs tend to
only test happy path, or just off of it. I didn't realize how unexpected
software can be used until I attended one of our company's user groups.

------
Theodores
I have decided to stick with writing code rather than progress my career.

Consequently I am that minority in the team delivering features that customers
use. I take personal pride in that and keep quiet about it. Those reports,
those plans, the documentation, the mock ups, the meetings and the emails are
not in the final product, all that work counted for next to nothing.
Ultimately it was mostly two of us working away and refining the specs by
collaboration with stakeholders on an informal basis. Because we understand
the problem space we are able to ask the important questions and I think
stakeholders prefer working directly with programmers rather than have
messages related via management channels with costs plucked out of the air and
relayed back.

With stakeholders I find I am talking their language and a lot of ground gets
covered in a working meeting, where some fixes are done during that meeting
with anything requiring more time speced out. With management intermediaries
they think everyone thinks like how they think so they can't imagine that it
would be safe to have the programmer talk in a mix of trade jargon with a
client. Usually this is part of the gig, many people make a cushty living
coasting between meetings. What do we do so as to get on with the job but not
enable this coasting that goes on with freeloading management types?

~~~
0xbear
You'll be out of a job by your mid-40's. I suggest you reconsider your
decision to not "progress your career".

~~~
Clubber
Yes, that's not anywhere close to true. It might be in SV, but most businesses
just care about how well you solve problems.

~~~
0xbear
Says someone who's not 40 yet. Tech is notoriously ageist.

~~~
Clubber
I've heard the rumors, but I haven't had a problem with it yet. I've been
building systems with, not bleeding edge, but relevant technology for 20
years. By building systems, I mean I either do it solo or lead the team with
the design. I specialize in back-end automation engines, but can do a good
front end as well. I usually don't adopt a technology until it's obvious it's
stuck. I primarily do Microsoft stacks. I talk well on my feet, am good
talking with clients, and understand the needs of business. I haven't had to
use a recruiter in 16 years.

I hope it never happens, but I haven't seen any evidence of it. I'm also never
the oldest person in the group. I hope this helps.

Good luck.

------
dredmorbius
I'd like to read this, but Blogspot is absolutely the most reading-hostile
service of which I'm aware. Its "dynamic" formats most especially.

Not only do these defy all readability within a browser, they break fallbacks:
console clients (w3m, elinks, etc.), or readability alternatives (Pocket,
Instapaper, Outline.com).

Please just _don 't_ use Blogspot. And don't use the dynamic styles if you
must.

~~~
codedokode
I agree, it is an example showing why you should not try to convert every
website into a SPA. A in SPA means "application", not a "website".

------
jesseryoung
Anybody who currently works on the Windows team know if things are still like
this? I know I've read about improvements to the version control system for
Windows teams and some other blogs about improvements in management practices
but I'm wondering if even after 11 years if this has been fixed.

~~~
whowouldathunk
I work on the Windows team. Things are very different now. Source control
especially: [https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-
large...](https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-
repo-on-the-planet/)

It still takes a lot of political effort to change primary UI features though.
And I think that's a good thing.

~~~
dingo_bat
> It still takes a lot of political effort to change primary UI features
> though.

Hasn't prevented the travesty that the win10 UI is, though.

~~~
rhencke
Travesty is a bit hyperbolic, no?

The Win10 UI works. It's stable. It's reasonably efficient at getting things
done. It's reasonably compatible with people's expectations of how Windows
works. (no Win8-style revamp of Start, etc.)

There's parts of it I dislike greatly, personally. (Cortana, the
Settings/Control Panel split.) It's annoying and unpolished on parts like
that, but I can't say it's a _travesty_.

~~~
titanix2
It's ugly (especially the dialog boxes) and obviously targeted at tablets. As
the Surface don't perform well, I hope one day Microsoft will come back to its
root and release an edition of Windows tailored for the PC.

~~~
dingo_bat
That's exactly right. It is so ugly! And in the recent builds I've been seeing
blurred fonts everywhere. I do like that they are moving fast and most changes
are good. For example the pseudo transparency effects alleviate some of the
ugliness concerns.

------
kuharich
Previous discussion:
[http://news.ycombinator.com/item?id=1706976](http://news.ycombinator.com/item?id=1706976)

------
kabdib
And they still managed to fuck it up in the latest version of Windows Server;
logging out of the GUI is perilously similar to shutting down the system. Oh,
you get some more warnings, but I can easily see muscle memory doing the wrong
thing.

------
the-dude
I have always thought about the shutdown crapfest as not being able to
shutdown for like ... 25 reasons? I believe there is a KB entry somewhere
listing a bunch of them.

Almost impossible to troubleshoot ( network connections! )

~~~
olyjohn
Is it just me, or shouldn't an OS be able to kill it's own processes? When I
kill -9 in Linux, the application is dead, gone... doesn't hang, doesn't sit
there frozen.

Task Manager... don't even get me started. It feels like half the time when I
try to end a process, it won't end. It'll hang up Task Manager or something
else will freak out before you actually end a non-responsive process.

This is why I feel Windows can't shut down half the time. It doesn't actually
seem to be able to kill and shutdown it's own processes. So when shutdown time
comes, one little app hang and you'll just sit there forever.

~~~
agwa
> When I kill -9 in Linux, the application is dead, gone... doesn't hang,
> doesn't sit there frozen.

Wrong. If a Linux process is blocked in an uninterruptible system call
(typically for disk I/O), it cannot be killed, even with -9.

I have been unable to cleanly shut down Linux systems in the past due to
processes getting stuck in this state.

~~~
pritambaral
`kill -9` on a process blocked on a syscall does kill the process, it just
doesn't clean the process up. The result of SIGKILL-ing (i.e., killing with
-9) a blocked process is a zombie process. Zombies are dead, right?

~~~
pjc50
With a suitably wedged NFS process I believe you can get a process that is
stuck in D even if you `kill -9` it.

~~~
dozzie
You're right, signal delivery does not interrupt an atomic operation, so if a
syscall takes unusually long time (like network filesystems sometimes cause),
the process hasn't received its SIGKILL _yet_.

------
mvindahl
Entertaining blogpost. And reading the 2006 comment thread below is also
pretty entertaining. All of the comments seem to be about ten years old. Most
of them tend to agree that Microsoft seriously botched this feature. But there
is also a very vocal minority who argue that Joel is a moron who should shut
up and the thing that makes Windows so _great_ is its flexibility, and all
those 15 menu options make sense, and people should just shut up about their
Macs and deal with the fact that Mac has just a teeny weeny market share.

Well, fast forward to 2017. MacBooks are all over the place. And when trying
to shut down Windows 10, I'm faced with just three options: Sleep, Shut down,
and Restart.

Vista really was the nadir of Microsoft in terms of UX, no matter how
furiously some people contested this back in the day. And Joel was spot on.

------
SimonPStevens
> "Those new hybrid hard drives can make this super fast."

\- Vista RTM'd - 8th November 2006

\- Date on that blog post - 21st November 2006

... and the wikipedia page he links from the blog post about hybrid drives
says "In 2007, Seagate and Samsung introduced the first hybrid drives".

I don't think making UI decisions based on hardware unreleased at the time is
really the right way to do it.

(The same wikipedia page also lists those early hybrid drives as "featuring
128 MB or 256 MB NAND flash memory options" and given that Vista required
512mb and recommended 1gb, I don't even think that they would have helped that
much as most of the ram would still need to be written to the spinner platter
part of the disk anyway)

------
saagarjha
Aside: is this website messing up the scroll for anyone else?

~~~
codedokode
Blogspot's UI is awful. They probably tried to follow trends and make an app
out of website. While loading the page it first shows you the preloader with
gears, then loads the top post and only later loads the post you were going to
read. It probably gives good score in automated load speed test but in fact
you can start reading the text much later than if it were plain HTML page.

And when I clicked the search box, the post I was reading has scrolled itself
away (no kidding!). I tried to remove focus from the search box but the post
didn't come back.

And that is Google's product. It might have some advanced code inside but the
UI is awful.

~~~
eadmund
I think it's pretty terrible that Blogspot requires JavaScript in order to
display text. There's already a really great way to display text on the web:
HTML. It just makes no sense to me to change that.

It's much like how the original wiki is so badly broken now.

------
kelvin0
TL;DR: Bike shedding + Poor communication ==> LONG HORN.

