The Sigma 5 had been used in aerospace data collection as it had the capability for data collection and the optional fixed-head disk, ideal for real-time operation.
We were building a system to take analog data from 12-lead electrocardiograms transmitted in three-channel audio FM over telephone lines, sampling the three channels of data from up to 20 phone lines at 500 samples per second, and queueing the data to disk for later collation and feeding to the analysis program as well as writing to tape.
It turns out that the operating system called RBM (Real-time Batch Monitor) would mask out interrupts during certain key events. Since the 500 samples per second was driven by a hardware timer interrupt, we needed that to not be masked out. So with every release of the OS, I had the job of locating all the places that the interrupt masking took place in the OS and changing the instruction so that it wouldn't mask the timer interrupt. This required a careful audit of the OS's use of the timer interrupt to be sure that we weren't exposing an inadvertent race condition. We were worried about skew leading to an appearance of noise on the digitized signal.
So I had the task of changing the card deck and recompiling the kernel.
All our interrupt driven work was done in assembly language. Probably would have used C, but it hadn't been invented yet when we started. But coroutines in that interrupt-rich environment were a damn sight easier in assembler than wrangling with threads in a higher-level language.
Much fun, but then I got interested in compilers.
We didn't have anything like electrocardiograms, but the neat thing was to see what kind of program you could write on a single card. Did you have any of the bird chirp cards that would toggle the front panel speaker and put out a whole flurry of bird sounds?
My favorite (or at least most useful) one-card program was when I found the print card: Put this card first in the card reader ahead of a stack of other cards, and it would print out the whole deck!
Only problem was that the print card was single buffered. It would read a card, then print it. Then read the next card, then print it. And on and on, with the card reader waiting while you print, and the printer waiting while you read the next card.
So I figured out a way to double-buffer the print routine: It read the next line while printing the one before, so it was much faster than before.
And it still fit on a single card!
80 columns should be enough for anyone.
The company was Telemed and it was located near Chicago. In fact, for a while it was in an office complex a few hundred yards directly east of runway 27 R at O'hare. Often fully-laden Europe-bound 747 would take of, aiming directly overhead. We would be silently chanting "up! up!" as it felt like they were pretty close.
I don't think we did the single card trick. I do remember writing a few utilities like an editor so we could store the programs on disk instead of feeding them in each time. And a crude document processor for internal documentation. And we had a pretty serious engineering effort saving the real-time data to tape. There was always the concern about being sure that we had saved the data by the time the system signaled the hospital technician that it was done so they could unhook the patient.
The extracurricular activity I spent the most time on was porting the XPL compiler from one done at University of Washington to run on our particular configuration. Involved converting from a 7-track tape to a 9-track tape, regenerating the operating system to reduce the start address of user programs, and a few other hoops. I had read A Compiler Generator and my career was off on another track.
And we did the evening thing as well. We had two systems for redundancy, but for a while production required both machines. One would do data collection, and the other would run the diagnostic program, which spit out paper tape that was carried over to the teletypes. Since traffic was light at night, we would come in at some ungodly hour for the better part of a year doing the work to combine these systems. If a call came in, we had about an hour to get off the system so they could bring up the diagnostic system to do the analysis and generate the paper tape.
I think I remember the cards--they were binary and half full of holes, as opposed to the others (EBCDIC, i believe).
Eventually I assumed the CPU was duff, gave a finger to the rules which involved not changing the hardware, yanked it and found the inverted pin. Grr!
Bear in mind this was MILSPEC and had gone through QC, soak and thermal testing.
Having to go back to DJGPP, extracting the files from the (now unmountable) filesystem with the latest working version and then getting back to being self hosting was the stuff of nightmares and I considered giving up several times.
I never realized that the holy grail of self-hosting is also a trap-door until it had swung shut behind me!
In the end it all worked out and I got it back, but from then on I made sure to have at least two 'known good' kernels waiting and I checked the toolchain a lot more carefully to avoid bugs introduced by broken compilers.
Lessons learned the hard way for sure.
So we built a boot-loader, a small kernel, packed the training-material resources into a tiny VM, wrote a VM to process the bytecode and run the training app, and delivered two bootable - albeit 'uncopyable' - floppies with the app - one for the demo, and one for the final installation. The app worked great, but building the floppy required a fair bit of magic hand-waving, back in those days. I retired after giving the delivery-person their two, very valuable floppies, with only thoughts on my mind that perhaps one day I should automate all that sector-placing hand-waving magic ..
So, I get a call from the remote location at 4am in the morning, saying that the demo floppy had been placed on top of some magnetic thing accidentally, the person had been fired, and how do we make another copy of the floppy for the install at 7am?
Well, indeed. "We'll have to do a sector-copy. Do any of your DOS machines have debug.com installed, by any chance .." A 7-line assembly routine to do block copies, 15 minutes of waiting-on-hold listening to remote floppy copy noises later, and we had our copy. The scant few hours I had in between dreaming of the routine, and then actually having to explain to a non-technical person over the phone how to assemble it into a working program in a way that won't destroy the only working copy .. well, lets just say I learned a lot of things in that project that I'm still trying to un-learn. ;)
One particular part had a fixed size hardware stack. When an interrupt occurred, it'd push stack/frame and a single register (ALU results), and begin executing the ISR. What we were seeing was occasionally we'd blow the hardware stack. Now, I've given enough data here to debug it, clearly we weren't resetting the hardware stack in the low priority interrupt, but at the time we were STUMPED.
I spent a full day working and thinking about the problem, and that night, I dreamt what the answer was, came in, wrote the three lines of assembly, and it all worked perfectly. It was the first (and unfortunately not the last) time I debugged software while sleeping.
I was on location at an engineering office in Florida where we were adding features to the monitor to test out new systems. At one point, the builds stopped working -- they never seemed to come out of reset successfully.
After work with the logic analyzer to try to watch the memory bus to see what instructions were running in memory, I finally hit on the answer. The linker was placing the init code at the end of our binary, and the latest code we added had pushed the init routines past the first 64K of program code. However, on this microcontroller, some of the address lines were multiplexed with general purpose IO lines, so out of reset, trying to get to higher addresses just wouldn't work; you'd be loading low-memory code instead.
A quick rework of the linker command line to reorder code sections and some modifications to the init code to flip those lines to address mode, and everything started working again.
A product ( with a massive, linear power supply ) has powerfail detection - when voltage dropped below <x>, ... powerfail. This triggered a digital counter and an R/C network as an intentional race condition to trigger a line going to a PIO on the processor. Which ever finished first, won. The software would then shut things down in an orderly fashion. It would wait in that state until reboot.
When it was tested, it was tested with an A/C relay driven by the parallel port on a PeeCee. Gen a random number, wait that many milliseconds, turn the relay off ( or on ).
But field service said the thing would latch in powerfail. I pointed to the automatic (RNG driven) test of powerfail ( run every software release ) and the lead field service guy says "but that relay only switches on zero crossings of the A/C line." Sure enough... we thumbed power strips for two days and got it to latch....
The fix? Add a software counter to the powerfail state. After <n> cycles, it pulled the reset line ( which was in I/O space; NEC V25) itself.
Circa 1984: I was working at Callan Data Systems, a small 68k workstation maker in the greater Los Angeles region (just outside Thousand Oaks, for those familiar with the region).
We had been using Unix ports from UniSoft, but for the new 68010 and 68020 based systems we were developing we were doing our own port from the AT&T sources. I don't recall if our base was System III or if it was System V Release 1 (I'm sure it was earlier than SVR2, for reasons that will become apparent).
This version of Unix did not support demand paged virtual memory. It was a classic swapping system. I was rewriting the process and memory subsystems to add demand paged virtual memory (that's why I know we were starting from something earlier than SVR2, because SVR2 was when AT&T added demand paged virtual memory support).
It was running quite well, except I had this one annoying bug where occasionally when a signal was delivered to a process that had a signal handler installed for that process, the process would get some kind of error, like an illegal instruction trap. For instance, hitting control-C in the shell might hit the bug. There was no sign of memory corruption, and no illegal instructions where it would claim it had been executing.
I spend some long, late evenings with the in-circuit emulator and the logic analyzer, trying to figure out what the hell was going on. Eventually, I was able to determine that it only happened if the signal was delivered while the system was trying to return from handling a page fault for the process that was receiving the signal.
On the original 68000, virtual memory was not supported. When a bus error was generated by an invalid memory access, the exception stack frame that contained information about the error did not contain enough information to restart or resume the failed instruction. You had no choice really except to kill the process. Hence, almost all 68000 systems were pure swapping systems . (It was possible to do on-demand stack space allocation even on the 68000, through a bit of a kludge ).
The 68010 added support for virtual memory. The way it did this was to make bus error push a special extended exception stack frame. This extended frame contained internal processor state information. When you did the return from exception, the processor recognized that the exception had the extended frame, and restored that state information. (This is called instruction continuation, because the processor continues after the interrupt by resuming processing in the middle of the interrupted instruction. The other major approach, which is what I believe most Intel processors use, is called instruction restart. With that approach, the processor does not save any special internal state information. If it was 90% of the way through a complicated instruction when the fault occurred, it will redo the entire instruction when resuming. Instruction continuation raises some annoying problems on multiprocessor systems ).
The way signals are delivered is that whenever the kernel is about to return to user mode, it does a check to see if the user process has any signals queued. If it does, and they are trappable and the process has a signal handler, the kernel fiddles with the stack, making it so that (1) when it does the return to user mode it will return to the start of the signal handler instead of to where it would have otherwise returned, and (2) there is a made-up stack frame on the stack after that so that when the signal handler returns, it returns to the right place in the program.
This was fine if the kernel was entered via a mechanism that generated a normal interrupt stack frame, such as a system call or a timer interrupt or a device interrupt. When the kernel was entered via a bus error due to a page fault, then the stack frame was that special extended frame, with all the internal processor state in it. When we fiddled with that to make it return to the signal handler, the result was the processor tried to resume the first instruction of the signal handler, but the internal state was for a different interrupted instruction, and if these did not match bad things happened.
The fix? I put a check in the signal delivery code to check for the extended frame. If one was present, I turned on the debug flag and returned from the page fault without trying to deliver the signal. The instruction that incurred the page fault would then resume and complete, the processor would see that the debug flag was on, and would generate a debug interrupt. That gave control back to the kernel, where I could then turn the debug flag off, and during the return from the debug interrupt do the stack manipulation to deliver the signal.
(continued in reply)
We just could not figure out why the heck it was doing this. We never did solve this. AT&T came to their senses and realized no one wanted a 286 port of SVR3 and dropped that part of the project, and I got moved to the 386 port, where I added an interactive debugger to the kernel, and hacked up the device driver subsystem to allow dynamic loading of drivers at runtime instead of requiring them to be linked in at kernel boot time. (The kernel had grown to big for the real mode boot code, and no one wanted to deal with writing a new boot loader! Eventually, someone bit the bullet and wrote a new, protected mode, boot loader and so we didn't need my cool dynamic device loading system).
Another part of the project with AT&T was providing a way for 386 Unix to run binaries from 286 Unix (probably System III, but I don't recall for sure). Carl Hensler, the senior Unix guru at ISC, and I did that project. (Carl, after ISC was sold to Kodak and then to Sun, ended up at Sun where he became a Distinguished Engineer on Solaris. He now spends much of his time helping his mechanic maintain his Piper Comanche, which he flies around to visit craft breweries). The 286 used a segmented memory model. So did the 386, but since segments could be 4 GB, 386 processes only used one 3 segments (one code, one data, and one stack) which all actually pointed to the same 4 GB space. Fortunately, the segment numbers used for those 3 segments did not overlap the segments used in the 286 Unix process model, so we did not have to do any gross memory hacks to deal with 286 memory layout on the 386. We were able to do most of the 286 support via a user mode program, called 286emul. We modified the kernel to recognize attempts to exec a 286 binary, and to change that to an exec of 286emul, adding the path to the 286 program to the arguments. 286emul would then allocate memory (ordinary user process memory) and load the 286 process into it. We added a system call to the kernel that allowed a user mode process to ask the kernel to map segments for it. 286emul used that to set up the segments appropriately.
Another lucky break was that 286 Unix and 386 Unix used a different system call mechanism. The 286emul process was able to trap system call attempts from the 286 code and handle them itself.
Later, AT&T and Microsoft made some kind of deal, and as part of that they wanted something like 286emul, but for Xenix binaries instead of Unix binaries, and ISC got a contract to do that work. This was done by me and Darryl Richman. It was mostly similar to 286emul, as far as dealing with the kernel. Xenix was farther from 386 Unix than 286 Unix was, so we had quite a bit more work in the 286 Xenix emulator process to deal with system calls, but there was nothing too bad.
There was one crisis during development. Microsoft said that there was an issue that needed to be decided and that it could not be handled by email or by a conference call. We had to have a face to face meeting, and we had to send the whole team. So, Darryl and I had to fly to Redmond, which was annoying because I do not fly. I believe everyone is allowed, and should have, one stubbornly irrational fear, and I picked flying on aircraft that I am not piloting.
So we get to Microsoft, have a nice lunch, and then we gather with a bunch of Microsoft people to resolve the issue. The issue turned out to be dealing with a difference in signal handling between Xenix and Unix. To make this work, the kernel would have to know that a signal was for a Xenix process and treat it slightly different. So...we needed some way for a process to tell the kernel "use Xenix signal handling for me". Microsoft wanted to know if we wanted this to be done as a new flag on an existing ioctl, or if we wanted to add a new "set signal mode" system call. We told them a flag was fine, and they said that was it, and we could go. WTF...this could not have been done by email or over the phone?
But wait...it gets even more annoying. After we got back, and finished the project, Microsoft was very happy with it. They praised it, and sent Darryl copies of all Microsoft's consumer software products as thanks for a job well done. They sent me nothing.
On the 286emul project, Carl was the lead engineer, and the most experienced Unix guy in the company. If AT&T had decided to give out presents for 286emul, I would have fully understood if they gave them only to Carl. On the Xenix emulator, on the other hand, neither Darryl nor myself was lead engineer, and we had about the same overall experience level (I was the more experienced kernel guy, whereas he was a compiler guru, and I had been on the 286emul project that served as the starting point for the Xenix emulator).
All I can come up with for this apparent snub is that in 1982, when I was a senior at Caltech, Microsoft came to recruit on campus. I wasn't very enthusiastic at the interview (I had already decided I did not want to move from Southern California at that time), and I got some kind of brainteaser question they asked wrong (and when they tried to tell me I was wrong, I disagreed). I don't remember the problem for sure, but I think it might have been the Monty Hall problem. Maybe they recognized me at the face to face meeting as the idiot who couldn't solve their brainteaser in 1982, and so assumed Darryl had done all the work.
Three years later, Microsoft recruited Darryl away from ISC, so evidently they really liked him. (As with Carl, you cannot tell the Darryl story without beer playing a role. After Microsoft, Darryl ran his microbrewery for a while, and wrote a book on beer . I don't know why, but a lot of my old friends from school, and my old coworkers from my prior jobs, brew beer as either a hobby or as a side business, or are seriously into drinking craft beers. I do not drink beer at all, so it seems kind of odd that I apparently tend to befriend people with unusual propensities toward beer).
 I've heard of one hack that supposedly was actually used by a couple of vendors to do demand paged virtual memory on 68000. They put two 68000s in their system. They were running in parallel, running the same code and processing the same data, except one was running one instruction behind. If the first got a bus error on an attempted memory access, the second was halted, and the bus error handler on the first could examine the second to figure out the necessary state information to restart the failed instruction (after dealing with the page fault). This is one hell of a kludge. (Some versions of the tale say that after the first fixed the page fault, the processors swapped roles. The one that had been behind resumed as the lead processor, and the one that had been in the lead became the follower. I'm not really much of a hardware guy, but I think the first approach, where one processor is always the lead and the other is always the follower, would be easier).
 There was not enough information on the 68000 to figure out how to restart after a bus error in the general case, but you could in special cases. Compilers would insert a special "stack probe" at the start of functions. This would attempt to access a location on the stack deep enough to span all the space the function needed for local variables, struct returns, and function calls. The kernel knew about these stack probes, and so when it saw a bus error for an address in the stack segment but below the current stack, it could look around the return address to see if there was a stack probe instruction, and it could figure out a safe place to resume after expanding the stack.
 The extended exception frame contains internal processor state information. Different steppings of the same processor model might have different internal state information. After you deal with a page fault for a process, you'll have to resume that process on a processor that has a compatible extended exception frame.
We dealt with the 286 when I was at Mark Williams. They had done the compiler that Intel was using and reselling at the time. When the 286 came out, they were concerned about performance, what with the goofy segment registers and all the various memory models (compact, small, medium, large). The wanted us to guarantee that the performance of the compiled code would be equal to or better than the 8086. Naturally we resisted.
So you must know Ron Lachman.
Oh, also at Mark Williams, the year before I started with them they demonstrated Coherent (v7 unix-alike) on a vanilla IBM PC without any memory protection hardware. Later also done on the Atari ST.
I used to work with this guy -> http://www.multicians.org/thvv/
He was really fantastic to work with, a great project manager, great stories, and a good example of how to stay relevant in the tech industry for 40 years.
The real point of the assignment was to write a self-propagating virus. I had teamed up with a friend who happens to be a fantastic programmer. He promised to cover the virus portion which freed me to go for the bonus marks.
This professor does bonus marks with the democratic method. At the beginning of the classes he announces the criteria then at some point the class votes. In our case the goal was "the most annoying virus".
As it happens my rootkit won us the bonus marks by a healthy margin. Something I wasn't prepared for since the class learned a quick way to disable the rootkit. What they would do is delete the kernel module loader before by deleting the kernel module loader before running the virus. When I heard they would this I was disheartened. Here was a method I had not thought of and was sure to make my work worthless.
As it happens they did this because the rootkit was evil. More evil than I intended.
In fact in thoeyr the rootkit was benign. The rootkit hooked the open system call and counted opened files from the /bin, /usr/bin, and /sbin folders. A bloom filter hidden in the kernel's task struct's dtrace fields prevented double counting the same file.
Then another hook, write, performed the attack. When a task reached the "too many files opened" threshold the rootkit caused any writes to return instantly. The goal was to identify anti-viruses programs when they were disinfecting files and sabotage their disinfectant.
In theory this was being nice to the other students since only the strongest students attempted disinfecting files. In practice it was much worse.
I did not realize it at the time but I bet you can guess what happened. For the web developers: stdout, the way to give output back to console, is really a file and output is sent with a write system call. A virus scanner would find and report viruses then hit the threshold and get silenced. Students would find the /bin viruses just fine but soon notice the viruses in /usr/bin were getting missed!
Of course they searched their code for an explanation how that folder was special and found none. Their programs just stopped working for no logical reason. Pure evil. This I think is how we won the bonus.
As an extra feature the rootkit would kernel panic your computer if you dared to unload it. It did this by re-hooking the system call table in the rootkit's unload handler. Once unloaded any write or open system call would segfault the kernel and the game was up. Nothing like a real rootkit's anti-anti-virus arsenal but it worked.
I did write up a post about the rootkit and the course if anyone wants more details: http://danieru.com/2013/12/28/cpsc-527-or-how-i-learned-to-s...
Most bootloaders (well, a BIOS usually refers to one step before the bootloader, but still) have a pretty primitive command shell, through which you issue the commands telling it how to load the initial kernel (e.g. from storage, or over the network). My guess would be she had to add a line to the boot script that zeroed out the relevant RAM; that, or rewrite the bootloader and add a loop in machine code to zero out the memory.