I've had a few interesting operating system development experiences. Warning: rambling alert!
Circa 1984: I was working at Callan Data Systems, a small 68k workstation maker in the greater Los Angeles region (just outside Thousand Oaks, for those familiar with the region).
We had been using Unix ports from UniSoft, but for the new 68010 and 68020 based systems we were developing we were doing our own port from the AT&T sources. I don't recall if our base was System III or if it was System V Release 1 (I'm sure it was earlier than SVR2, for reasons that will become apparent).
This version of Unix did not support demand paged virtual memory. It was a classic swapping system. I was rewriting the process and memory subsystems to add demand paged virtual memory (that's why I know we were starting from something earlier than SVR2, because SVR2 was when AT&T added demand paged virtual memory support).
It was running quite well, except I had this one annoying bug where occasionally when a signal was delivered to a process that had a signal handler installed for that process, the process would get some kind of error, like an illegal instruction trap. For instance, hitting control-C in the shell might hit the bug. There was no sign of memory corruption, and no illegal instructions where it would claim it had been executing.
I spend some long, late evenings with the in-circuit emulator and the logic analyzer, trying to figure out what the hell was going on. Eventually, I was able to determine that it only happened if the signal was delivered while the system was trying to return from handling a page fault for the process that was receiving the signal.
On the original 68000, virtual memory was not supported. When a bus error was generated by an invalid memory access, the exception stack frame that contained information about the error did not contain enough information to restart or resume the failed instruction. You had no choice really except to kill the process. Hence, almost all 68000 systems were pure swapping systems . (It was possible to do on-demand stack space allocation even on the 68000, through a bit of a kludge ).
The 68010 added support for virtual memory. The way it did this was to make bus error push a special extended exception stack frame. This extended frame contained internal processor state information. When you did the return from exception, the processor recognized that the exception had the extended frame, and restored that state information. (This is called instruction continuation, because the processor continues after the interrupt by resuming processing in the middle of the interrupted instruction. The other major approach, which is what I believe most Intel processors use, is called instruction restart. With that approach, the processor does not save any special internal state information. If it was 90% of the way through a complicated instruction when the fault occurred, it will redo the entire instruction when resuming. Instruction continuation raises some annoying problems on multiprocessor systems ).
The way signals are delivered is that whenever the kernel is about to return to user mode, it does a check to see if the user process has any signals queued. If it does, and they are trappable and the process has a signal handler, the kernel fiddles with the stack, making it so that (1) when it does the return to user mode it will return to the start of the signal handler instead of to where it would have otherwise returned, and (2) there is a made-up stack frame on the stack after that so that when the signal handler returns, it returns to the right place in the program.
This was fine if the kernel was entered via a mechanism that generated a normal interrupt stack frame, such as a system call or a timer interrupt or a device interrupt. When the kernel was entered via a bus error due to a page fault, then the stack frame was that special extended frame, with all the internal processor state in it. When we fiddled with that to make it return to the signal handler, the result was the processor tried to resume the first instruction of the signal handler, but the internal state was for a different interrupted instruction, and if these did not match bad things happened.
The fix? I put a check in the signal delivery code to check for the extended frame. If one was present, I turned on the debug flag and returned from the page fault without trying to deliver the signal. The instruction that incurred the page fault would then resume and complete, the processor would see that the debug flag was on, and would generate a debug interrupt. That gave control back to the kernel, where I could then turn the debug flag off, and during the return from the debug interrupt do the stack manipulation to deliver the signal.
Circa 1986: I was working at Interactive Systems Corporation in Santa Monica. ISC had a contract from AT&T to produce the official port of System V Release 3 to Intel's new processor, the 80386. Unfortunately, the contract also called for porting it to the 80286, and that was what I was working on the kernel, with one other programmer. That was a kludge. We got it working, but there was a strange scheduling bug. If you loaded it down with around 10 processes, each using a lot of memory, so the system had to make heavy use of virtual memory, what you'd see is that 1 process would get about 90% of the available processing time, 1 process would get essentially no processing time, and the remaining 8 would split the remaining 10% of the processing time pretty much equally. It would stay this way for a few hours, and then it would thrash especially hard for a short while, and go back to the previous pattern, except the processes had shuffled, so it was a different process getting the 90% and a different getting screwed with 0%, and the remaining 10% equally shared among the remaining processes. So, in a sense, the scheduler was actually quite fair--if you watched it for a week, every process ended up with about the same processor time.
We just could not figure out why the heck it was doing this. We never did solve this. AT&T came to their senses and realized no one wanted a 286 port of SVR3 and dropped that part of the project, and I got moved to the 386 port, where I added an interactive debugger to the kernel, and hacked up the device driver subsystem to allow dynamic loading of drivers at runtime instead of requiring them to be linked in at kernel boot time. (The kernel had grown to big for the real mode boot code, and no one wanted to deal with writing a new boot loader! Eventually, someone bit the bullet and wrote a new, protected mode, boot loader and so we didn't need my cool dynamic device loading system).
Another part of the project with AT&T was providing a way for 386 Unix to run binaries from 286 Unix (probably System III, but I don't recall for sure). Carl Hensler, the senior Unix guru at ISC, and I did that project. (Carl, after ISC was sold to Kodak and then to Sun, ended up at Sun where he became a Distinguished Engineer on Solaris. He now spends much of his time helping his mechanic maintain his Piper Comanche, which he flies around to visit craft breweries). The 286 used a segmented memory model. So did the 386, but since segments could be 4 GB, 386 processes only used one 3 segments (one code, one data, and one stack) which all actually pointed to the same 4 GB space. Fortunately, the segment numbers used for those 3 segments did not overlap the segments used in the 286 Unix process model, so we did not have to do any gross memory hacks to deal with 286 memory layout on the 386. We were able to do most of the 286 support via a user mode program, called 286emul. We modified the kernel to recognize attempts to exec a 286 binary, and to change that to an exec of 286emul, adding the path to the 286 program to the arguments. 286emul would then allocate memory (ordinary user process memory) and load the 286 process into it. We added a system call to the kernel that allowed a user mode process to ask the kernel to map segments for it. 286emul used that to set up the segments appropriately.
Another lucky break was that 286 Unix and 386 Unix used a different system call mechanism. The 286emul process was able to trap system call attempts from the 286 code and handle them itself.
Later, AT&T and Microsoft made some kind of deal, and as part of that they wanted something like 286emul, but for Xenix binaries instead of Unix binaries, and ISC got a contract to do that work. This was done by me and Darryl Richman. It was mostly similar to 286emul, as far as dealing with the kernel. Xenix was farther from 386 Unix than 286 Unix was, so we had quite a bit more work in the 286 Xenix emulator process to deal with system calls, but there was nothing too bad.
There was one crisis during development. Microsoft said that there was an issue that needed to be decided and that it could not be handled by email or by a conference call. We had to have a face to face meeting, and we had to send the whole team. So, Darryl and I had to fly to Redmond, which was annoying because I do not fly. I believe everyone is allowed, and should have, one stubbornly irrational fear, and I picked flying on aircraft that I am not piloting.
So we get to Microsoft, have a nice lunch, and then we gather with a bunch of Microsoft people to resolve the issue. The issue turned out to be dealing with a difference in signal handling between Xenix and Unix. To make this work, the kernel would have to know that a signal was for a Xenix process and treat it slightly different. So...we needed some way for a process to tell the kernel "use Xenix signal handling for me". Microsoft wanted to know if we wanted this to be done as a new flag on an existing ioctl, or if we wanted to add a new "set signal mode" system call. We told them a flag was fine, and they said that was it, and we could go. WTF...this could not have been done by email or over the phone?
But wait...it gets even more annoying. After we got back, and finished the project, Microsoft was very happy with it. They praised it, and sent Darryl copies of all Microsoft's consumer software products as thanks for a job well done. They sent me nothing.
On the 286emul project, Carl was the lead engineer, and the most experienced Unix guy in the company. If AT&T had decided to give out presents for 286emul, I would have fully understood if they gave them only to Carl. On the Xenix emulator, on the other hand, neither Darryl nor myself was lead engineer, and we had about the same overall experience level (I was the more experienced kernel guy, whereas he was a compiler guru, and I had been on the 286emul project that served as the starting point for the Xenix emulator).
All I can come up with for this apparent snub is that in 1982, when I was a senior at Caltech, Microsoft came to recruit on campus. I wasn't very enthusiastic at the interview (I had already decided I did not want to move from Southern California at that time), and I got some kind of brainteaser question they asked wrong (and when they tried to tell me I was wrong, I disagreed). I don't remember the problem for sure, but I think it might have been the Monty Hall problem. Maybe they recognized me at the face to face meeting as the idiot who couldn't solve their brainteaser in 1982, and so assumed Darryl had done all the work.
Three years later, Microsoft recruited Darryl away from ISC, so evidently they really liked him. (As with Carl, you cannot tell the Darryl story without beer playing a role. After Microsoft, Darryl ran his microbrewery for a while, and wrote a book on beer . I don't know why, but a lot of my old friends from school, and my old coworkers from my prior jobs, brew beer as either a hobby or as a side business, or are seriously into drinking craft beers. I do not drink beer at all, so it seems kind of odd that I apparently tend to befriend people with unusual propensities toward beer).
 I've heard of one hack that supposedly was actually used by a couple of vendors to do demand paged virtual memory on 68000. They put two 68000s in their system. They were running in parallel, running the same code and processing the same data, except one was running one instruction behind. If the first got a bus error on an attempted memory access, the second was halted, and the bus error handler on the first could examine the second to figure out the necessary state information to restart the failed instruction (after dealing with the page fault). This is one hell of a kludge. (Some versions of the tale say that after the first fixed the page fault, the processors swapped roles. The one that had been behind resumed as the lead processor, and the one that had been in the lead became the follower. I'm not really much of a hardware guy, but I think the first approach, where one processor is always the lead and the other is always the follower, would be easier).
 There was not enough information on the 68000 to figure out how to restart after a bus error in the general case, but you could in special cases. Compilers would insert a special "stack probe" at the start of functions. This would attempt to access a location on the stack deep enough to span all the space the function needed for local variables, struct returns, and function calls. The kernel knew about these stack probes, and so when it saw a bus error for an address in the stack segment but below the current stack, it could look around the return address to see if there was a stack probe instruction, and it could figure out a safe place to resume after expanding the stack.
 The extended exception frame contains internal processor state information. Different steppings of the same processor model might have different internal state information. After you deal with a page fault for a process, you'll have to resume that process on a processor that has a compatible extended exception frame.
We dealt with the 286 when I was at Mark Williams. They had done the compiler that Intel was using and reselling at the time. When the 286 came out, they were concerned about performance, what with the goofy segment registers and all the various memory models (compact, small, medium, large). The wanted us to guarantee that the performance of the compiled code would be equal to or better than the 8086. Naturally we resisted.
So you must know Ron Lachman.
Oh, also at Mark Williams, the year before I started with them they demonstrated Coherent (v7 unix-alike) on a vanilla IBM PC without any memory protection hardware. Later also done on the Atari ST.
I have a more recent story but I hope its still fun. A months ago I wrote a rootkit for our super-special viruses class.
The real point of the assignment was to write a self-propagating virus. I had teamed up with a friend who happens to be a fantastic programmer. He promised to cover the virus portion which freed me to go for the bonus marks.
This professor does bonus marks with the democratic method. At the beginning of the classes he announces the criteria then at some point the class votes. In our case the goal was "the most annoying virus".
As it happens my rootkit won us the bonus marks by a healthy margin. Something I wasn't prepared for since the class learned a quick way to disable the rootkit. What they would do is delete the kernel module loader before by deleting the kernel module loader before running the virus. When I heard they would this I was disheartened. Here was a method I had not thought of and was sure to make my work worthless.
As it happens they did this because the rootkit was evil. More evil than I intended.
In fact in thoeyr the rootkit was benign. The rootkit hooked the open system call and counted opened files from the /bin, /usr/bin, and /sbin folders. A bloom filter hidden in the kernel's task struct's dtrace fields prevented double counting the same file.
Then another hook, write, performed the attack. When a task reached the "too many files opened" threshold the rootkit caused any writes to return instantly. The goal was to identify anti-viruses programs when they were disinfecting files and sabotage their disinfectant.
In theory this was being nice to the other students since only the strongest students attempted disinfecting files. In practice it was much worse.
I did not realize it at the time but I bet you can guess what happened. For the web developers: stdout, the way to give output back to console, is really a file and output is sent with a write system call. A virus scanner would find and report viruses then hit the threshold and get silenced. Students would find the /bin viruses just fine but soon notice the viruses in /usr/bin were getting missed!
Of course they searched their code for an explanation how that folder was special and found none. Their programs just stopped working for no logical reason. Pure evil. This I think is how we won the bonus.
As an extra feature the rootkit would kernel panic your computer if you dared to unload it. It did this by re-hooking the system call table in the rootkit's unload handler. Once unloaded any write or open system call would segfault the kernel and the game was up. Nothing like a real rootkit's anti-anti-virus arsenal but it worked.
The year is 1969. The machine is a SDS Sigma 5, soon to be renamed XDS Sigma 5 after Xerox bought SDS from Max Palevsky.
The Sigma 5 had been used in aerospace data collection as it had the capability for data collection and the optional fixed-head disk, ideal for real-time operation.
We were building a system to take analog data from 12-lead electrocardiograms transmitted in three-channel audio FM over telephone lines, sampling the three channels of data from up to 20 phone lines at 500 samples per second, and queueing the data to disk for later collation and feeding to the analysis program as well as writing to tape.
It turns out that the operating system called RBM (Real-time Batch Monitor) would mask out interrupts during certain key events. Since the 500 samples per second was driven by a hardware timer interrupt, we needed that to not be masked out. So with every release of the OS, I had the job of locating all the places that the interrupt masking took place in the OS and changing the instruction so that it wouldn't mask the timer interrupt. This required a careful audit of the OS's use of the timer interrupt to be sure that we weren't exposing an inadvertent race condition. We were worried about skew leading to an appearance of noise on the digitized signal.
So I had the task of changing the card deck and recompiling the kernel.
All our interrupt driven work was done in assembly language. Probably would have used C, but it hadn't been invented yet when we started. But coroutines in that interrupt-rich environment were a damn sight easier in assembler than wrangling with threads in a higher-level language.
You worked on a Sigma 5 in 1969? Where were you? I was in Phoenix, at my first programming job. Summer of '69, Transdata in Phoenix hired me as the night operator. The best part was that they didn't offer their timesharing service at night, so basically I had the computer to myself.
We didn't have anything like electrocardiograms, but the neat thing was to see what kind of program you could write on a single card. Did you have any of the bird chirp cards that would toggle the front panel speaker and put out a whole flurry of bird sounds?
My favorite (or at least most useful) one-card program was when I found the print card: Put this card first in the card reader ahead of a stack of other cards, and it would print out the whole deck!
Only problem was that the print card was single buffered. It would read a card, then print it. Then read the next card, then print it. And on and on, with the card reader waiting while you print, and the printer waiting while you read the next card.
So I figured out a way to double-buffer the print routine: It read the next line while printing the one before, so it was much faster than before.
The company was Telemed and it was located near Chicago. In fact, for a while it was in an office complex a few hundred yards directly east of runway 27 R at O'hare. Often fully-laden Europe-bound 747 would take of, aiming directly overhead. We would be silently chanting "up! up!" as it felt like they were pretty close.
I don't think we did the single card trick. I do remember writing a few utilities like an editor so we could store the programs on disk instead of feeding them in each time. And a crude document processor for internal documentation. And we had a pretty serious engineering effort saving the real-time data to tape. There was always the concern about being sure that we had saved the data by the time the system signaled the hospital technician that it was done so they could unhook the patient.
The extracurricular activity I spent the most time on was porting the XPL compiler from one done at University of Washington to run on our particular configuration. Involved converting from a 7-track tape to a 9-track tape, regenerating the operating system to reduce the start address of user programs, and a few other hoops. I had read A Compiler Generator and my career was off on another track.
And we did the evening thing as well. We had two systems for redundancy, but for a while production required both machines. One would do data collection, and the other would run the diagnostic program, which spit out paper tape that was carried over to the teletypes. Since traffic was light at night, we would come in at some ungodly hour for the better part of a year doing the work to combine these systems. If a call came in, we had about an hour to get off the system so they could bring up the diagnostic system to do the analysis and generate the paper tape.
I think I remember the cards--they were binary and half full of holes, as opposed to the others (EBCDIC, i believe).
It's 1987. I'd just spent a few months building a small single-floppy bootable OS for the IBM PC. The purpose of this project was to display a small training demonstration for security-related personnel in a protected environment. There were to be no ways to interrupt/interfere with the system running the training program, and it absolutely could not have been done in DOS or CP/M - had to be its own standalone system, 100%. It had absolutely to be something that 'could not be copied in a normal computer', where normal was: any of the DOS-booting machines out there in the final location.
So we built a boot-loader, a small kernel, packed the training-material resources into a tiny VM, wrote a VM to process the bytecode and run the training app, and delivered two bootable - albeit 'uncopyable' - floppies with the app - one for the demo, and one for the final installation. The app worked great, but building the floppy required a fair bit of magic hand-waving, back in those days. I retired after giving the delivery-person their two, very valuable floppies, with only thoughts on my mind that perhaps one day I should automate all that sector-placing hand-waving magic ..
So, I get a call from the remote location at 4am in the morning, saying that the demo floppy had been placed on top of some magnetic thing accidentally, the person had been fired, and how do we make another copy of the floppy for the install at 7am?
Well, indeed. "We'll have to do a sector-copy. Do any of your DOS machines have debug.com installed, by any chance .." A 7-line assembly routine to do block copies, 15 minutes of waiting-on-hold listening to remote floppy copy noises later, and we had our copy. The scant few hours I had in between dreaming of the routine, and then actually having to explain to a non-technical person over the phone how to assemble it into a working program in a way that won't destroy the only working copy .. well, lets just say I learned a lot of things in that project that I'm still trying to un-learn. ;)
While writing a QnX clone I used DJGPP to bootstrap my fledgling new OS. DJGPP is a 32 bit 'extender', it allows you to write 32 bit code and run it from the 16 bit dos environment of the time. When the basics where in place and the OS was self hosting I made a mistake somewhere and managed to mess up the system to the point where it would no longer self host.
Having to go back to DJGPP, extracting the files from the (now unmountable) filesystem with the latest working version and then getting back to being self hosting was the stuff of nightmares and I considered giving up several times.
I never realized that the holy grail of self-hosting is also a trap-door until it had swung shut behind me!
In the end it all worked out and I got it back, but from then on I made sure to have at least two 'known good' kernels waiting and I checked the toolchain a lot more carefully to avoid bugs introduced by broken compilers.
I wrote an operating system for a number of different DSPs in 2001. The OS was C++, with different pieces written in assembly as needed. The design of the OS required that we take the lowest priority interrupt to handle rescheduling as we returned from interrupts, and made sure the correct task was running, etc. At the end of that lowest priority interrupt, we wouldn't RTI if a high priority task was switching in, we'd just drop back in to user space, and handle the rest of the RTI functionality.
One particular part had a fixed size hardware stack. When an interrupt occurred, it'd push stack/frame and a single register (ALU results), and begin executing the ISR. What we were seeing was occasionally we'd blow the hardware stack. Now, I've given enough data here to debug it, clearly we weren't resetting the hardware stack in the low priority interrupt, but at the time we were STUMPED.
I spent a full day working and thinking about the problem, and that night, I dreamt what the answer was, came in, wrote the three lines of assembly, and it all worked perfectly. It was the first (and unfortunately not the last) time I debugged software while sleeping.
Reminds me of a story from back in the mid-90s at my first job. I was working in Motorola's paging products group and helping bring up a ARM7-based microcontroller that we were testing for two-way pagers. I was in the ASIC group that had put together the chip and had written a simple test monitor shell that ran over serial and let us test out the registers.
I was on location at an engineering office in Florida where we were adding features to the monitor to test out new systems. At one point, the builds stopped working -- they never seemed to come out of reset successfully.
After work with the logic analyzer to try to watch the memory bus to see what instructions were running in memory, I finally hit on the answer. The linker was placing the init code at the end of our binary, and the latest code we added had pushed the init routines past the first 64K of program code. However, on this microcontroller, some of the address lines were multiplexed with general purpose IO lines, so out of reset, trying to get to higher addresses just wouldn't work; you'd be loading low-memory code instead.
A quick rework of the linker command line to reorder code sections and some modifications to the init code to flip those lines to address mode, and everything started working again.
A product ( with a massive, linear power supply ) has powerfail detection - when voltage dropped below <x>, ... powerfail. This triggered a digital counter and an R/C network as an intentional race condition to trigger a line going to a PIO on the processor. Which ever finished first, won. The software would then shut things down in an orderly fashion. It would wait in that state until reboot.
When it was tested, it was tested with an A/C relay driven by the parallel port on a PeeCee. Gen a random number, wait that many milliseconds, turn the relay off ( or on ).
But field service said the thing would latch in powerfail. I pointed to the automatic (RNG driven) test of powerfail ( run every software release ) and the lead field service guy says "but that relay only switches on zero crossings of the A/C line." Sure enough... we thumbed power strips for two days and got it to latch....
The fix? Add a software counter to the powerfail state. After <n> cycles, it pulled the reset line ( which was in I/O space; NEC V25) itself.
There was a board that over the course of two weeks would get progressively and perceptibly slower. Looking into it the heap was full of tiny allocations that nobody knew where they were coming from. What had happened is that someone had compile with the C++ compiler instead of the C compiler the file that had an interrupt handler. This was running in kernel context and the older compiler decided that there had been no data-structure allocated to keep tabs on exceptions for that thread, so it it malloced that little bitsy chunk in the from the function prolog - every time the interrupt occurred, about 15Hz in practice. Eventually it was taking tasks a longer and longer time to find free blocks in the fragmented heap. The quick solution was to add nothrow to the file ;)
Working at Apple in 1996 I went to a meeting where all departments send people to talk with the Copeland (Apple's attempt at a modern OS that still included the old OS as a first class API) leadership. The quicktime team got into an argument with the leaders when they couldn't promise that the OS could actually give quicktime a predictable time slice, thus making video almost impossible. I think at that moment everyone realized that Copeland was a disaster (though most already assumed it would be). I left Apple soon thereafter and in the next year (1) CTO Ellen Hancock killed Copeland (2) Apple bought NeXT (3) Steve returned.
Z80 embedded system debug session. Two flipping days working out why an NMI wasn't being captured by the kernel interrupt handler in a communications controller I was working on. The NMI handler religiously stopped working after ten minutes. Turned out some chump had bent the CPU _NMI pin in the socket of the device at the manufacturer and it wasn't contacting the socket reliably so when it warmed up it started floating. However, you couldn't see it with visual inspection. I assumed it was my fault and spent hours with the assembler and Z80 and vendor docs trying to work out why it wasn't working. Got miffed, plugged in a logic analyser which caused it to work perfectly.
Eventually I assumed the CPU was duff, gave a finger to the rules which involved not changing the hardware, yanked it and found the inverted pin. Grr!
Bear in mind this was MILSPEC and had gone through QC, soak and thermal testing.
No, given the early stage of boot at which it crashed, no threading was happening. The randomness was because the initial zeroing out of the kernel's global and static variables might or might not happen, as a result of a physically random process (electrical discharge), instead of being ensured by software.
Most bootloaders (well, a BIOS usually refers to one step before the bootloader, but still) have a pretty primitive command shell, through which you issue the commands telling it how to load the initial kernel (e.g. from storage, or over the network). My guess would be she had to add a line to the boot script that zeroed out the relevant RAM; that, or rewrite the bootloader and add a loop in machine code to zero out the memory.
There was already code written to zero out the BSS shared across all the bootloaders for PowerPC, the call to it had just gotten lost when our enthusiastic fellow kernel dev rewrote bootloaders for platforms they couldn't test. I assume I just added the call to the existing code back in.
No, as the post states, the non-determinism is due to the fact that DRAM cells lose their charge over time unless they are constantly refreshed. When the system is rebooted after having been powered off for a long time, the DRAM cells are all discharged, and thus uninitialized memory will be 0. The kernel was relying on the memory in the bss section to be 0, but was not actually zeroing it out. Therefore, the code would only work if the memory actually was 0 due to being discharged.