Hacker News new | past | comments | ask | show | jobs | submit login
What is a hard error, and what makes it harder than an easy error? (microsoft.com)
82 points by luu on Feb 6, 2024 | hide | past | favorite | 44 comments



The original definitions of "hard error" and "soft error" were for disks and go back to the early 1970s at least. A soft error was a recoverable error that could be handled by re-reading the disk sector. A hard error was a permanent disk error that could not be recovered.



Seems in line with how Windows was using them:

> These messages were special because they were generated from inside the I/O system


Interesting they were actual graphical dialogs with bouse support in 16-bit windows, but you'd get a text-mode blue screen on Win95. Happened a lot to me due to faulty CD-ROM drive. D: can not be read. Abort/retry/fail. What was even the difference between Abort and Fail?


> What was even the difference between Abort and Fail?

A whole generation of computer users has wondered this.

According to Wikipedia[1], "abort" aborted the program (terminated it), and "fail" returned an error code to the program. (Which … probably has a high likelihood of killing it all the same, since a given random I/O is probably pretty non-optional.)

Wikipedia also notes,

> the message has been cited as an example of poor usability in computer user interfaces.

[1]: https://en.wikipedia.org/wiki/Abort,_Retry,_Fail%3F


In DOS the infamous Abort, Retry, (Ignore), Fail was the default handling of "critical errors". Abort would terminate immediately, and fail would return control to the program with an error code.

You could override this however: https://web.archive.org/web/20100206220048/http://webster.cs...


Ahhh. I remember as a kid playing some game and sometimes this would come up. We learned that Fail would allow the game to keep running but the background for the scene simply didn’t appear (I guess that I/O was optional).


Abort terminates the application. Fail returns an error code.


Abort means terminate the program.

Fail means return from the system call with a failure code.

I used to wonder this myself.


Not sure the differences, but my understanding of Windows was that it wasn't until Windows XP that they added a "Window Manager" think of like Gnome or KDE on Linux. Before then, the UI was all a wrap on top of a command line application, or so that's how I was told it was. I assume the blue screen is as minimal as it is since it usually is some hardware halt, and its much easier to render without worry about any GPU restrictions and the sort. Although the modern ones are impressive with their QR codes.


Welllll...

OK, so from 1985 with Windows 1.0 until 2001 with the release of Windows XP, consumer versions of Windows were built on top of MS-DOS. For the latest releases -- Windows 95, Windows 98, and Windows Me -- DOS was bundled together with Windows to make a complete operating system product. But before then, up through Windows 3.11, you had to say 'win' at the DOS prompt to get into Windows.

But the thing you have to remember is that DOS was barely an operating system. It had a rudimentary file system, support for a single process, and no memory protection to speak of at all. You could bypass the OS altogether to directly access the hardware; most DOS applications, especially games, did just that for everything except file access. So Windows actually provided a lot of operating system functionality that DOS didn't and called down into DOS routines for file access.

(In fact, you could even chainload an operating system like Linux from DOS. A program called LOADLIN.EXE let you load and run a Linux kernel from the DOS command line; combined with special file system support called 'umsdos' that overlaid Linux file semantics on top of the DOS FAT file system, this meant that you could run full Linux, from DOS, in your existing DOS partition!)

But even then -- starting in Windows 3.1 (or 3.11, I forget which) a new feature emerged called "32-bit disk access". Windows provided its own routines to handle the DOS file system while it was running, eliminating the need to call down to DOS at all. This was optional on Windows 3.x but the default on Windows 9x (including Me). So DOS was relegated to the role of a bootloader for Windows and a compatibility layer you could escape to for the old DOS programs people still wanted to run (like Doom, Duke Nukem 3D, and such).

So Windows started off as a layer on top of DOS, which was command-line driven, but once loaded it became most or all of a graphical, multitasking operating system in its own right. And it had its own, built in window manager; matter of fact it had more of one than Mac OS, in which user application code had to handle things like moving, resizing, and closing windows (for which utility functions were provided).

Windows NT, which consumer versions of Windows from Windows XP onward were built on top of, has existed since 1993 and that's its own thing, having a 32-bit fully preemptive multitasking kernel with memory protection, user permissions, and the whole bit. It was used primarily in server applications before Windows XP.


I did some debugging of a embedded RTOS program under DOS. I had compiler switches so I could compile it for the target or DOS. The latter was way easier to debug.


Thank you, my memory on the history is hazy as heck, its from word of mouth.


No worries. I keep forgetting how old I am, and that the DOS underpinnings of early Windows may be completely unknown even to those of my successors who work in computing. An entire generation has been born, grown up, graduated college and entered the workforce without having ever seen a DOS-based Windows.


The earliest Windows I used was Windows 95, I remember the "Ski Free" game or whatever it was called, been posted about it on HN a few times. It wasn't until Windows Millenium Me that I really started using a computer daily though, which was an awful OS.

Aside:

You reminded me of my first time using Linux, it was Slackware and I had to type "startx" after logging on to get to the GUI.


I still boot the GUI by hand from the shell. I love the feeling of control it gives me, like I don't have to have that stuff running until I want it. And I was a Slackware user until fairly recently. I use Void, btw. :)


Not sure what you're talking about: the window manager "like Gnome or KDE" was added in Windows 1.0. And while it was kind of a wrap on top of a command line application back then, just as it still is on Linux (I believe), Windows 95 got a completely standalone desktop/graphic subsystem which didn't ran on top of anything except HAL; console/DOS subsystem was its foster sibling, so to speak.


I guess the NT equivalent are those error popups that come out of SYSTEM and similar accounts, yet somehow manage to get on the interactive user's session display. They usually have styling that's several Windows versions out of date.


[Abort] [Retry] [Cancel]

What's the difference between abort and cancel?


I guess "Cancel" probably returns a failure code to the caller of the API while "Abort" kills the program, similar to "Abort"/"Fail" dichotomy on MS-DOS.


In the context of Windows, back in the day Cancel referred to basically ignoring the error message box.

If the error kept occurring you would just keep getting dialog windows.

Abort was then to essentially try to kill that particular task/process.


If memory serves right, essentially the same difference as kill -9 and kill.


The wording on the buttons leaves that somewhat unclear.


Isn't it Ignore and not Cancel? Where did you actually see this?


Depends which OS, Windows would often use Cancel in place of Ignore


I have a hard time believing this considering the flag for it in Windows is called MB_ABORTRETRYIGNORE. How old of a Windows version are we talking? Are you sure you've seen this in actual Windows and not in an altered/joke screenshot?


Nah you're right actually

Having said that, the article points to a dialogue box of which says Cancel instead of Ignore. Perhaps the dev got the naming wrong too?


He writes "16-bit Windows", and it's beyond anything I can find documentation for, but he's probably right. That's like pre-1992, I think. But it's been "Ignore" for so long that it's kind of silly to make jokes about it.


Yeah that is fairly old! My memory deceived me


I'm Googling around and don't see such a screenshot at all, for what it's worth. What I see is just Cancel and Retry: https://www.betaarchive.com/imageupload/1261286563.or.360017...


> Where did you actually see this

uhh ...


My comment was too hasty (sorry) but my point was that you seemed to be asking about the behavior of something that doesn't actually exist, and in fact might never have. (?) I'm not sure where Raymond got it from, or he was just doing this from memory, but Googling around, I don't see an Abort button here:

https://www.betaarchive.com/imageupload/1261286563.or.360017...

All the 3-button versions of the dialog I recall are Abort/Retry/Ignore, not Cancel. So it's kind of hard to answer "what does Cancel do"!


He was probably doing it from memory then since the dialog box is made in HTML (it's not a screenshot).


    “‘Abort, Retry, Fail?’ was the phrase some wormdog scrawled next to the door of the Edit Universe project room. And when the new dataspinners started working, fabricating their worlds on the huge organic comp systems, we’d remind them: if you see this message, always choose ‘Retry.'”

    — Bad’l Ron, Wakener, Morgan Polysoft (SMAC)


Such errors were called "critical errors" on DOS. They were called critical because the app could not continue execution unless the error was resolved. Windows seems to have picked a softer name.


isn't the opposite of a hard error called a "soft error"?


According to Chen,

> The opposite of the “hard error” message was the “soft error” message, which was your regular MessageBox.

I think it was just an opening hook thing.


Guess it’s the same difference between caught exceptions and panics, you can either recover or you cannot?


No, a hard error in this case is one that is triggered by a hardware interrupt and is not interruptible by the system. This is explained in the article.


It can be kind of hard to route an error that happens deep inside a kernel to an application.

Disk errors happening after a deferred write are a classic example that's still applicable today. Your process might have succeeded a write(2) into cache and exited cleanly, then when it's time to write to disk, something bad happens. There's no way to tell an application that is long gone.


But you would use unbuffered writes if you really cared. Otherwise there’s nothing actionable that can be done by the application unless there was some kind of signal or callback / APC.


All correct. You're making my same points.

I'm illustrating a real world scenario where such a mapping may not be possible.

Further, if the issue is a global problem, like filesystem corruption, it becomes even more ambiguous -- maybe that write could have worked or maybe it wouldn't, but the filesystem may just globally make everything read only and start failing writes.

So my point is that these errors were meant for catastrophic failures that don't map neatly into being handled with the process. In a modern kernel there are way fewer of these conditions than was acceptable in the early 90s. But there are a few similar things that are still common.


Fun fact, your question was 7% the length of the article which clearly answered your question.


What were the odds of that happening?:0




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: