
Why Windows 95 and 98 would crash after 49.7 days of uptime - theBashShell
https://sites.google.com/site/edmarkovich2/whywindows95andwindows98wouldcrashafter49.7daysofuptime
======
ggambetta
This is a pretty uninformative article. It doesn't actually say why Windows
would crash, as the title promises. It does some trivial math, points out
there's an overflow, and then says _" Clearly some code in Windows didn’t
handle this too well, and Windows hung."_

~~~
usr1106
I don't agree. Nobody of the readers would care about the programming error
made 25 years ago in some closed source code.

However, timers wrapping around is a very current problem every programmer
should be aware of. Not too long ago we had a problem with corrupted video
files. After weeks of investigation it turned out that some code used an 32
bit unsigned to count microseconds. Well, that wraps around after approx. 1 h
10 min... Obviously most test cases were videos shorter than that.

~~~
yummybear
I'm a reader and care, so at least one.

~~~
usr1106
Fair enough. If you stick to open source, then you can always know how
programmers got it wrong in the past and present.

~~~
derefr
No, that allows you to know how _programmers of FOSS_ got it wrong.

Not all programming is the same; there are many lessons to learn about
software engineering (mostly in the "what not to do" sense) which, for
practical reasons, have only ever showed up in (time- and budget-constrained)
closed-source codebases.

~~~
usr1106
Do you you have any fact-based evidence that certain bugs/anti-patterns would
happen less in open source code bases or free software code bases?

Well, security by obscurity might happen a bit less. And hopefully many bugs
in contributions are spotted instead of being merged because the author thinks
it should work. But all major open source projects have had amazingly "stupid"
programming bugs that went undiscovered for many years. Or the same feature
has been patched again and again, but still causes more trouble.

Knowing that many company-internal software projects work with little review
and very low testing coverage, it should be natural that their code quality is
even worse. But that certain kinds of problems would only occur in closed
source does not sound evident to me.

~~~
derefr
> Do you you have any fact-based evidence that certain bugs/anti-patterns
> would happen less in open source code bases or free software code bases?

How could I have that, _or_ the opposite of that, without access to closed-
source code-bases?

That was my whole point: that you can't know what sort of selection bias
you're implicitly accepting by only looking at a certain easily-accessible
slice of the data.

In order to know that it's safe to assume that FOSS _is_ representative of all
software, you _first_ have to get access to enough non-FOSS codebases to be
representative.

------
justanotherc
Nowadays you don't even get CLOSE to 49 days uptime before Win10 forces you to
restart due to updates ;s

~~~
tsomctl
My laptop with Win10 is at 72 days.

I created a bat file with:

    
    
      net stop wuauserv
      sc config wuauserv start= disabled
    

and then scheduled it to run every 10 minutes. The problem with that is you
now have to remember to disable it and check for updates when you have time,
which I'm obviously not doing.

~~~
7777fps
You're oddly proud about not having updated your OS for 72 days.

~~~
icedchai
About 15 years ago, I worked at a startup that had a server with 5 years of
uptime!

~~~
clarry
I have a home server with 1301 days of uptime.. (last year it survived a move
without rebooting)

~~~
wycy
> (last year it survived a move without rebooting)

How? UPS?

~~~
clarry
Yep, moved the computer while together with its UPS.

~~~
zaphod12
that's very cool, but what the heck is on this server that made it worthwhile
to move it like that?!

~~~
SAI_Peregrinus
Probably his uptime record!

If it's a short enough move I can see doing that just for the hell of it.

------
lmilcin
This is a myth that cannot be proven or disproven. Nobody was ever able to
keep Windows 95/98 up for that long.

Jokes aside, around 2001 I was asked to do some maintenance on a machine
running Windows 3.11 that was running an app to control all access within the
building. When I asked about potential upgrade the guys balked at me saying
the machine has couple of years of uptime and they are not very keen to have
to restart it on a schedule due to windows 95/98 bugs.

~~~
me_me_me
That's the reason we still have COBOL and similar systems working.

For exact this reason. They are working and doing what they meant to be doing.

------
ajross
This brings up a good meta point: people who like to complain about the
quality of software in the modern world almost certainly didn't live through
the early[1] days of the industry.

This sounds crazy to modern ears, but at the time the idea of a desktop system
actually staying booted for _a whole month_ was weird and alien. They just
didn't do that. Commercial unixes were better. It was routine to find servers
that had been running for three digit (!!) uptimes, and people would brag
about stuff like this. Linux, when it arrived, was in this category too and
we'd all puff to our friends about how we "never turn off" the machine under
our desk, because it would always be able to run for a month or so before it
crashed.

It was just a different world, and things, despite some of the community
emotion, have gotten vastly better. We're much better at writing software than
we used to be, which tends to make some of old folks a little confused when
people hold up "new" ideas as being paradigm shifts in software quality.

We already had the shift! We're just picking the higher hanging fruit now.

[1] Do the mid-90's even count as early?

~~~
jaclaz
Well, there are huge differences (in the MS world) between 9x and NT.

DOS (before) was very, very stable (but of course was not multitasking) and
widely used in industrial setups.

Windows NT (before Win9x) and Windows 2000 (right after) were both, very, very
stable, I had machines running 24/7/365 that were rebooted once a year or so
and for other reasons (maintenance/update, hardware replacement, black outs
longer than UPS capability, etc.).

~~~
barrkel
DOS wasn't functional enough to be described as stable; it did almost nothing
but look after the files on your file system and provide a few APIs, via
interrupts, for programs to read and write files and perform console I/O.
Stability was almost entirely the responsibility of the program you ran on
top.

------
ryanschneider
I worked on a service that had this exact issue; actually it was worse because
it was a signed integer so rolled over in half the time.

Even more insidious, we never hit the issue until months into production
because we had a pretty consistent two week release process. So the issue only
came to light when development slowed down as the project became more stable.

Luckily deployments were staggered across datacenters so when the bug hit it
didn’t happen to every server at once, but of course as always happens, when
it did, the majority of the servers started hanging over the weekend while I
was on call.

~~~
yjftsjthsd-h
I worked at a company with a similar issue. The server is used to get patched
and rebooted every month. Then the company decided to lay off huge chunks of
operations staff. The software held up remarkably well, but we eventually
discovered that software that was never designed to stay up for more than a
month will leak memory like crazy if you leave it up for a long enough.

~~~
sn_master
IIS used (still does?) to randomly kill its own processes at random because
chances are, if its been running "long enough" its leaking memory anyway.

------
Wowfunhappy
Is there a good way to test for bugs like this during development? Since it’s
not particularly realistic to pause development for two months to see what a
given build does after that long.

~~~
treesknees
Our company does soak/longevity testing. Typically we'll take a build and let
it run for a few weeks during development to find memory leaks. Then the final
release enters our longevity lab where we put it under some amount of load and
let it run until the device or VM is retired. It won't necessarily find the
bug before final release, we could either recall that version or submit an
immediate maintenance version that fixes the issue once discovered.

Software will always have bugs, we've chosen to approach it as "find the bugs
before the customer" rather than "stomp out every possible bug". Because you
are right, we don't get 2 months to pause and fix bugs, and also customers
will always do some crazy configuration or workload you didn't have a test
case for.

~~~
qntmfred
That's pretty cool. What industry are you in that y'all are willing to invest
resources for such an exercise?

------
chrisbennet
Back in 2005 I had encountered this bug or something similar. It was in
Windows CE, the embedded version of Windows.

For me, it failed at around 25 day (49.7/2?). I believe the documentation at
the time [since fixed] might have had GetTickCount() returning an integer
instead of an unsigned integer. I had a devil of a time tracking it down!

------
sitzkrieg
unfortunately recreated this issue recently keeping track of freertos current
tick on a 32 bit embedded system :)

------
Yizahi
Yeah, that happens. Here is an example of an actual bug opened a few years ago
on our system. Dev team spend some time tracking the issue and found out
incorrect variable type used for sessions, it was unsigned long used for
milliseconds, like in the OP. We were joking that we need to allocate 50 days
to verify the fix :) .
[https://imgur.com/a/HS3fcTj](https://imgur.com/a/HS3fcTj)

------
miles-po
49.7 days? I wish I could get more than 7 out of my MacOS Catalina laptops
before I get a gray screen.

------
OliverJones
If you think this is cool, wait'll you see what un-upgraded UNIX-derived
systems do on Jan 19, 2038 at 03:14:07Z.

~~~
OliverJones
Update: new kernel are on top of this problem.
[https://lkml.org/lkml/2020/1/29/355?anz=web](https://lkml.org/lkml/2020/1/29/355?anz=web)
It's the old systems that will suffer. Hopefully only soda machines and not
avionics or intravenous infusion pumps.

------
dooglius
In fairness, I have hit this exact same problem on Linux

------
The_suffocated
This bug reminds me of the plot of the 1996 Japanese crime story "Subete ga F
ni Naru" (The Perfect Insider).

~~~
serf
the story that makes you love an incestuous homicidal genius.

I thought the anime really glossed over that story terribly.

------
hncensorsnonpc
From what thine is this? It says "There IS a bug ... NOW patched" Should make
no sense anyway but I see no time attached to this.

~~~
jaclaz
The cited MS KB has been archived starting November 9, 2004:

[https://web.archive.org/web/20041109020858/http://support.mi...](https://web.archive.org/web/20041109020858/http://support.microsoft.com/kb/q216641/)

It is dated August 2004 (it is revision 3.2) but the patch (for Windows 98)
has a date:

Release Date: Jun-04-1999

~~~
greggturkington
There's a CNN article about it from 1999 too:
[http://www.cnn.com/TECH/computing/9903/05/win98bugfix.idg/](http://www.cnn.com/TECH/computing/9903/05/win98bugfix.idg/)

------
cafard
In one of Jon Bentley's "Pearls" books is a chapter of rules of thumb. One
states that "Pi seconds is a nanocentury."

