
Those Win9x Crashes on Fast Machines - abbeyj
http://www.os2museum.com/wp/those-win9x-crashes-on-fast-machines/
======
phire
The original Mac does a calibration of the floppy drive motor during boot to
measure jitter.

If you are implementing an emulator, you must insert some jitter to the
emulated floppy drive.

Because if there is no jitter, the ROM's calibration code does a division by
zero and crashes.

~~~
Wowfunhappy
While perhaps not so great from a defensive programming perspective, Mac OS
feels like a different case since it's only designed to run on specific
hardware.

Modern Mac OS also has all sorts of "bugs" that Hackintosh users need to patch
or otherwise work around. Since we're doing something that was never intended,
I don't really see these as flaws in the OS.

~~~
kelnos
I would still consider them to be timebomb bugs, though. Even if you're
developing for a restricted set of hardware, newer versions of that hardware
could very easily violate some corner-cutting assumptions in the future. I
would rather spend a little more time now to get something right and future-
proof, rather than pass the problem onto future-me, who likely won't have the
context anymore to find the issue quickly, or, worse, future-someone-else, who
doesn't have the context at all.

~~~
yjftsjthsd-h
Yeah, over a long enough time window I think portability and correctness will
always come back to bite you. Apple could've saved time by making Darwin only
handle one processor nicely, but then the Intel transition and ARM additions
(iOS is still Darwin, after all) would've hurt more. Windows coasted on x86
for a while, but now that they're targeting ARM I'll bet they're pretty glad
that it was originally built to be portable. Code that only works on the exact
system you need today might be good enough sometimes, but if you survive long
enough you'll want it to be more flexible than all that.

EDIT: I should add, this applies to applications, not just OSs. If you're an
Android dev - will your app run on Android 15? Will it work on ChromeOS? Will
it run on Fuchsia? If you're writing Windows software - will it run on ARM? If
you're making webapps - does they work on Firefox? And maybe it's not worth
the effort, especially if you don't plan to be selling the same software in 5
years, or maybe you think you can just deal with those things when you get
there, but if you plan to still be in business in a decade then you should
plan accordingly.

------
LeoPanthera
There's a patch for this problem, which is particularly useful if you want to
run Windows 95 in a virtual machine.
[https://winworldpc.com/download/c39dc2a0-c2bf-693e-0511-c3a6...](https://winworldpc.com/download/c39dc2a0-c2bf-693e-0511-c3a6e280947e)

Indeed, there's a pre-made VirtualBox image pinned to the top of Reddit's
/r/windows95 if you are lazy.

------
korethr
Holy shit. I feel like this neatly explains why Windows 95 was an utter crash-
fest on the computer I bought just before my freshman year of highschool. With
an AMD K6-2 running at 350Mhz, it was the first computer I had that was all
new components instead of the franken-sytems built from a dumpster-dive base
and other dumpster-dived components grafted on. The shop I bought it from
initially put 95 OSR2 on it. And it did like to crash. It wasn't until I
started using Windows 98SE that I started to see anything resembling
stability, and not need to re-install every other month.

If only I had known about AMDK6UPD.EXE back then and been able to understand
the reasons behind the crash and why the patch fixed things.

------
graton
I have to admit I find this type of article about old computer/software quite
interesting as recently I discovered a backup of mine that contained source
code I wrote in 1993. I was writing assembly language back then. Using a
really great library called Spontaneous Assembly. First version 2.0 and then
3.0. SpontaneousAssembly 3.0 added support for easily writing TSR (Terminate
and Stay Resident) code.

Back in the early 1990s I was in college and working in the computer lab. So I
wrote various little DOS utilities to help us better manage the computers and
the interaction with Novell Netware.

Due to this reminiscing I have even purchased a few tech books from that time.
MS-DOS Encyclopedia, Peter Norton's Programmers Guide to the IBM PC, and some
others.

I only wish I still had a copy of SpontaneousAssembly 3.0 as it would be fun
to recompile some of my old code!

~~~
clan
For those who got curious like me have a look at:
[http://300m.us/docs/computing/SA-3.0a/TOC.htm](http://300m.us/docs/computing/SA-3.0a/TOC.htm)

I am not familiar with how libraries work in the US. Can anyone get a library
card with Library of Congress? They have the floppy images:

[https://www.worldcat.org/title/spontaneous-assembly-for-
cc-a...](https://www.worldcat.org/title/spontaneous-assembly-for-cc-assembly-
language-library-for-80x86-based-ms-dos-systems/oclc/32314029)

EDIT: [http://300m.us/docs/computing/](http://300m.us/docs/computing/) Has a
link for purchase which 404 but to an existing site. Maybe Kevin is the
friendly type?

~~~
graton
Yeah I have emailed Kevin Crenshaw but he stated the project is "abandonware"
at this point. He said to search for it online. So far no luck with version
3.0. It would have been fun to try playing with some old code in a VM :)

I don't live anywhere close the the Library of Congress, so not easy for me to
get a copy there :(

------
TwoBit
>"It was somewhat unfortunate that this was called an “AMD fix” (the file
containing the solution was called AMDK6UPD.EXE), even though Microsoft was
clear that this was not a problem in AMD CPUs but rather in their own code."

I'll bet the AMD name was suggested by producers and/or management at the
protest of engineering, with the argument that the public knows this as an AMD
problem and so it's better to call it that regardless of the technical
reality. I've seen this logic many times in my career and do understand
there's some rationale to it.

~~~
Nextgrid
Could it be simply because since the bug primarily affected AMD CPUs at the
time it would make it easier for everyone if the update was called the “AMD
update” as opposed to some cryptic name like “network stack delay loop
update”?

~~~
Lammy
I figure it's for the same Microsoft×Intel mutual-back-scratching reason that
made Microsoft try to charge $35 for the fix:
[https://www.theregister.com/1998/11/26/amd_posts_windows_95_...](https://www.theregister.com/1998/11/26/amd_posts_windows_95_k62/)

~~~
WalterGR
Ah, yes, The Register.

 _Faced with the Great Satan of Software 's apparent refusal to admit its
mistake and eliminate the charge, AMD has made the fix available from its Web
site free of charge._

------
ghewgill
This is exactly the same timer loop problem as was found in Turbo Pascal
around the same era:
[https://retrocomputing.stackexchange.com/q/12111](https://retrocomputing.stackexchange.com/q/12111)

~~~
RcouF1uZ4gsC
I think you could solve the problem by pushing the "turbo" button on the
computer case that would reduce your cpu frequency to something like 8 mhz.

~~~
unilynx
All turbo buttons I remember specifically clocked down to 4.77 mhz -
apparently the original 8088 frequency?

~~~
PeterisP
I recall having a turbo button on a 386-based computer that halved the speed -
I don't recall if it was from 40mhz to 20 or from 80mhz to 40, something like
that. I also recall seeing computers of that era which had a mhz display on
the front to show the current processor speed.

~~~
einr
The highest clocked 386 CPU, and one of the most popular ones, was the
Am386DX40, so your computer would likely have had a 40 MHz part.

It is a common misconception is that the MHz displays from this era had any
kind of communication with the CPU or motherboard. They don't -- they are dumb
devices that can switch between showing two arbitrary patterns on the LED
display and are "programmed" by painstakingly setting jumpers on the back.
Often when using 2-digit displays for computers with 3-digit clock speeds they
would be programmed to display "HI" and "LO" instead of a number.

So when your display showed "20" that doesn't mean the CPU was running at 20
MHz. It might have been, because 386 CPUs always ran at the bus speed and 20
was a common 386 speed, but things get a lot more complicated when you move to
the 486 platform with internal clock multipliers.

My 486 DX4/100 (33 MHz bus speed, 3x multiplier) has a turbo button that when
disengaged lowers the effective speed of the system to something _roughly_
like a 486 DX50. But this is not an exact science and does not in fact mean
that the CPU is running at 50 MHz.

------
thaumasiotes
> According to the Pentium Processor Family Developer’s Manual Volume 3:
> Architecture and Programming Manual (Intel order no. 241430), the LOOP
> instruction. The absolute best case when the branch is taken is 6 clock
> cycles. The Intel manual notes that “[t]he unconditional LOOP instruction
> takes longer to execute than a two-instruction sequence which decrements the
> count register and jumps if the count does not equal zero”.

This makes me wonder about three-instruction sequences of increment /
decrement / jump-if-nonzero, and one-instruction sequences of jump-if-nonzero.
What's the point of having the unconditional LOOP instruction in the first
place?

~~~
saagarjha
Compatibility?

------
miohtama
Turbo Pascal, known for its awesome one pass compiler, had the same issue

[https://retrocomputing.stackexchange.com/questions/12111/why...](https://retrocomputing.stackexchange.com/questions/12111/why-
did-ms-dos-applications-built-using-turbo-pascal-fail-to-start-with-a-divisi)

------
jeffbee
We still face a related class of problem today. The x86 PAUSE has wildly
varying throughput. On most Intel parts it is 1/8 or so, but on Skylake Xeon
it's 1/141\. On Ryzen its 1/3\. I've seen code that makes assumptions about
how much real time must have passed based on PAUSE loops.

~~~
acqq
> I've seen code that makes assumptions about how much real time must have
> passed based on PAUSE loops.

Note: here, the PAUSE instruction is not the problem at all, but the "code
that makes assumptions."

Because the "seen" code is not named, I assume it's something internal for
some company?

~~~
jeffbee
Yes, private code. Basically there was some mutex fairness thing that was
written on a SKX and on a Zen CPU where PAUSE is 50x faster it didn't have
good fairness, it was too tight.

------
userbinator
The last time I benchmarked it, which was at the beginning of the i7 era, LOOP
was just as fast (within the margin of error) as dec/jnz - Intel probably
doesn't want to be seen as slower than AMD and didn't care about that timing
loop anymore.

~~~
MauranKilom
Couldn't they just microcode it to that?

~~~
caf
That's not how microcode works, but yes - the instruction decoders and
retirement logic are capable enough these days that you can have it decode
into the same sequence of μops, which is almost certainly what happens.

Note that it's not exactly the same thing, because an interrupt can happen
between the decrement and jump for the two-instruction case, but not for the
LOOP case.

------
thrownaway954
i love this site. os/2 was such a huge part of my life in the 90s and the sole
reason i love computers back then. it's great that this site has preserved so
much history of it.

~~~
kzrdude
Serious question, what was OS/2 and who used it?

~~~
LeoPanthera
Wikipedia's OS/2 article is comprehensive.

[https://en.wikipedia.org/wiki/OS/2](https://en.wikipedia.org/wiki/OS/2)

tl;dr A graphical OS developed by IBM that succeeded DOS and competed with
Windows. Notably, it featured pre-emptive multitasking before Windows did. It
was not a success in the home market but was reasonably successful in big
business, especially finance, for a short amount of time.

~~~
Lammy
And still exists today as ArcaOS!

[https://www.arcanoae.com/](https://www.arcanoae.com/)

~~~
benibela
I have an open source project, where someone decided to compile it on OS/2.

They send me the binaries for OS/2 for every release till 2016

Apparently modern C++ and Qt run there without issues

------
lordnacho
I've written this kind of code myself, where you measure a time delta and
divide something by the delta. It's always something that sticks out though,
that you might divide by zero (especially if you did it in Java!).

The article says it would have been picked up in code review, and I agree. But
it just seems odd that it wasn't changed right there. Why not just write to
loop so that it keeps looping as long as the divisor is below some number like
10ms? You also want to minimise the estimation error, which is easier to do if
you divide by a slightly larger number. Consider a loop that takes between 1
and 2ms to finish, your estimate will be either x or 2x.

------
mwcampbell
Do drivers or other kernel code still have this kind of delay loop in current
operating systems, or is everything interrupt-driven now?

~~~
kalleboo
I thought that was the point of BogoMips
[https://en.wikipedia.org/wiki/BogoMips](https://en.wikipedia.org/wiki/BogoMips)

------
Wowfunhappy
Would it have been possible for Microsoft to test for something like this, or
would it be possible today? For example, is it feasible to slow down time to
simulate an impossibly-fast CPU?

~~~
traverseda
You can do that for linux userpace apps using the "faketime" utility. It just
intercepts that calls that try to find out the actual system time. Not sure
how that would effect kernalspace, since the kernal is sort of the thing that
decides what time actually _is_.

~~~
Wowfunhappy
> Not sure how that would effect kernalspace, since the kernal is sort of the
> thing that decides what time actually is.

Yes, I'm imagining you'd need to be in a virtualized/emulated environment of
some sort.

------
tzs
Be happy that it only crashed. It could be worse.

We once got a couple of high end motherboards for AMD processors, back in the
Win98 era, and I tried to install Win95 on one of them. The box said that
Win98 was required, but it couldn't hurt to try, right? Maybe there would not
be drivers for some of the peripherals, but all I needed was the CD-ROM and
hard disk to work.

Install is going fine until it is time for it to reboot. That failed. It
didn't even get to the BIOS initialization screen. It appeared to just be
dead.

Figuring we just got unlucky and got a defective motherboard or processor, I
tried installing on the other one so I could get on with my work.

That one died too.

Eventually I found something about this on the motherboard maker's support
site. The problem was with the device probing during install.

What I'm about to say is not made up. As unbelievable as this might sound to
people who grew up with more modern PCs, at one time they really did work as
I'm about to describe.

The early PC buses had no built-in way for the host to identify what cards
were plugged into the expansion slots. Typically, an expansion card would have
a set of jumpers or DIP switches that could chose between several sets of
possible addresses for the card's registers to appear in I/O space.

The user was expected to keep track of the settings of all cards they
installed, and adjust the jumpers appropriately to avoid conflicts, and record
the settings in a config file that the drivers would read to find out where
their card was.

Later buses, such as EISA and later PCI, provided ways for the host to find
out what is there and how it is configured. But operating systems still needed
to support the old bus, and they wanted to make this as user friendly as
possible.

So systems like Win95 would have a device probe during install that would try
to identify what is on your old bus. They would do this by very carefully
probing the I/O address space.

For example, suppose you know that a particular network card if present has to
be at one of 8 addresses, and you know that after power on or reset that
certain bits will be set in its status register and certain bits will be
clear. You can read those 8 possible addresses, looking for the right bit
pattern. If you don't find it, that particular network card is not present. If
you do find it, you can do more tests to confirm it.

Some of those other tests might involve writing to the device registers, and
seeing if it responds the way that network card should.

This is obviously risky. What if it is not that network card, but rather a
disk controller card that just happens to have a register that after reset has
the same bit pattern you expect in the network card status register? The thing
you write then to verify it is the network card might be the "FORMAT DISK"
command to that disk controller.

And so you had to be very careful with these probes. They had to be done in a
safe order. You'd need to probe for that disk controller before you probed for
that network card.

Those new motherboards contained peripherals that Win95 did not know about,
and so the Win95 probe procedure did not know how to avoid doing bad things to
them.

One of those peripherals was the built in interface for flashing the BIOS
EEPROM. The Win95 device probe ended up overwriting the BIOS.

~~~
555-5555
It's unfortunate (though understandable) that you bricked both before
discovering the root cause. I remember hot-swapping EEPROM chips to repair a
bricked motherboard from a similar era.

While attempting to upgrade the BIOS, something went wrong. Most likely there
was a bad sector in the boot floppy I used. The result was an unbootable
machine. Solution? Swap in a working EEPROM chip from a compatible
motherboard. Boot to a floppy disk that has a BIOS imaging utility and image
file. Hot-swap the bad EEPROM chip back in. Re-flash the BIOS. Or, if you had
money, you could sometimes purchase a pre-flashed replacement EEPROM chip.

I don't miss those days, but am fortunate to have experienced them. It forced
us to learn more about how a computer really works.

------
TwoBit
>"The issue also illustrates how seemingly solid assumptions made by software
and hardware engineers sometimes aren’t. Software engineers look at the
currently available CPUs, see how the fastest ones behave, and assume that
CPUs can’t get faster by a factor of 100 anytime soon."

Disagree. Where I've worked (Oculus/Facebook and EA) we would never allow such
assumptions in code reviews, regardless of how unlikely the failure may be.
You never allow div/0 unless it's mathematically provable to be impossible.
I'm sure other orgs have the same code review policy, and static analysis
these days would also catch it.

~~~
outworlder
> we would never allow such assumptions in code reviews

Right.

Today we have the benefit of hindsight, we know how fast processors have
become. In the Win3.1 era, noone sane would have predicted this. Even Moore's
Law applied to transistor counts, not processor speeds.

What you should ask is: what other assumptions are you implicitly making that
you are not currently aware of?

~~~
Dylan16807
> In the Win3.1 era, noone sane would have predicted this.

That's a bold claim!

We went from 4-8MHz 286 chips to 20-50MHz 486 chips in the decade leading up
win3.1's first release. By the time we were approaching windows 95, pentiums
were up to 133MHz.

Those chips _already_ had really fast branch instructions.

So you're already staring down the barrel of calibration taking 15
milliseconds. It's a reasonably obvious step to consider LOOP being a cycle
faster than adding and branching, which takes you all the way down to 7
milliseconds.

So taking that all together, x86 clock speeeds have doubled 3-4 times in the
last dozen years. A chip could come out tomorrow that takes 15 or even 7
milliseconds on the calibration loop. Your code breaks if it hits 2.

I think someone sane could have predicted the problem.

------
dfox
I somehow cannot believe that nobody had mentioned the first significant
instance of similar crash at least for the DOS days: Borland's CRT.TPx/conio.a
which consistently caused division by zero on anything faster than Pentium
MMX. At the time the wisdom was that anything over 200MHz is too fast, but in
reality anything AMD with "Pentium Rating" over 200 was fast enough to cause
consistent crashes and 266Mhz Pentium MMX was slow enough that it mostly
worked (and anything i686 consistently crashes).

~~~
miohtama
I mentioned :)

------
qwerty456127
I have used Windows 98 SE on CPUs up to various Pentium 3s. There was a
problem with big (above 512MB) RAM volumes but it was easy to solve.

I was only forced to switch to Windows XP when I upgraded to Pentium M
(Dothan) - besides the Safe Mode I could find no solution to run Windowes 98
on it.

I would gladly return to Windows 98 now if my hardware and software supported
it.

~~~
Wowfunhappy
Why do you prefer Windows 98 over XP?

~~~
qwerty456127
It was much more a clean-and-simple design. The less complexity and useless
(to me) features an OS has - the more I like it. Of course I mean the way it
works, not the way it looks.

It also took so little RAM and HDD it would altogether fit in a humble corner
of my today RAM - I don't really get it why does modern software need so much
more.

~~~
eropple
"Clean-and-simple design." OK. So you like an OS with no meaningful memory
segmentation (you could hop into kernel mode by modifying a register), awful
multiprocessing support, and absolutely no defense in depth against the
truckloads of malware that are all over the Internet?

That's certainly a take.

~~~
qwerty456127
It was totally enough for a personal computer. Certainly not enough for a
server. I only ran trusted software and could do whatever I wanted. As for the
Internet - it's a browser's job to sandbox the JavaScript.

~~~
eropple
So no defense in depth and an uncritical reliance on browsers to be perfect?

------
thaumasiotes
> Run 10000h (1,048,576) iterations of the LOOP instruction.

If you wondered about this, 10000h appears to mean "100,000 hexadecimal". I
assume it was intended to say 100000h.

