Hacker News new | past | comments | ask | show | jobs | submit login

While Linux would have certainly killed Coherent eventually, that's not quite the case. First Coherent was out long before Linux. Coh was around long enough for AT&T to have sent Dennis Ritchie to Chicago to inspect the code and evaluate it for copyright claims. Coherent ran on the PDP11 and on the 80286. Linux became a real force in the Unix market around 1998. MWC went out of business in Feb 1995. The first round of layoffs at MWC happened in Oct, 1994.

A perfect storm of several things killed Coherent The two biggest problems were:

The customers dinging Mark Williams Company in the newsgroup mainly complained about the lack of TCP/IP networking. This happened because MWC had done a customer pole to see what big feature should come next? TCP/IP or X11. X11 won.

The real or perceived drop in quality of the product. This one is hard to explain. Coherent 3.10 and 4.0 had been solid V7 Unix clones with V7 sensibilities. When 4.2.05 shipped it included a really nasty disk driver bug that basically destroyed your file system beyond the ability of fsck to fix. The bug was triggered when your drive when into a very common thermal recalibration mode. This mode was rare or hadn't existed during the days of MFM/RLL/ESDI drives but became common with ATA drives especially as the market got flooded with cheap 504MB drives. While the bug was fixed somewhere between 4.2.10 and 4.2.14, the damage to Coherent's reputation was done.




> The bug was triggered when your drive when into a very common thermal recalibration mode

As the person responsible (alas), my specific recollection of this particular bug was that the root cause wasn't thermal recalibration, but rather UDMA signalling errors.

Prior versions of Coherent using PIO mode had excruciatingly slow access, and when adding support for UDMA I also added support for the disk driver to recognise sequential access and issue multisector transfer requests; this boosted performance fairly massively, something like 3-4 times for some common things, and it was run for a fairly long time in-house and by beta testers with no trouble before it shipped.

The problem though, was a small - literally one line - arithmetic error when the drive end of things reported a UDMA transfer error had occurred in the middle of a multisector operation; the error-handling code that set up a retry of the operation didn't compute the start kernel address correctly when a whole bunch of transfers had been merged (and some subset had worked).

The primary problem with the UDMA modes was sensitivity to correct cable termination - see https://en.wikipedia.org/wiki/Parallel_ATA#Cable_select for some of that; basically, signal reflections from parallel ATA cable runs that didn't have terminating resistors made things electrically marginal and some systems would have really excessive numbers of UDMA CRC faults as a consequence, and given sufficiently high error rates and really bad timing that could end up polluting the buffer cache with stuff that was skewed by a sector :-(

The big thing (on top of not having any in-house hardware that triggered this specific bug) was the sheer volume of work required for those releases, since getting from what was basically a fairly vanilla Seventh Edition UNIX to where it needed to be to start running large pieces of third-party code expecting POSIX was a big lift. Since there weren't many people, everyone was having to wear lots of hats; for instance, aside from kernel work I did a huge amount of work for POSIX.1 and .2 compatibility and on top of doing the underlying code changes (which ranged all over the system, particularly for some of the stuff we ran into Autotools scripts relying on) all of those needed documenting, too.

[ Fred Butzen did amazing work writing the actual manpage text and making it really easy to understand - he justly deserved the credit for the quality of the manual in terms of its readability. But the scale of the changes needed to bring so many parts and pieces from V7 to POSIX meant lots and lots and lots of work trying to iterate over docs for technical accuracy at the same time as having to redesign all the affected parts and pieces. It was, in a word, exhausting. ]


I don't remember the exact cause per say beyond Bob saying that some drives did a thermal recalibration and if this occured during a multi sector transfer, the filesystem got shot to hell. Everyone at MWC had taken a crack at it and you're right, it didn't show up in any of the MWC equipment. At some point in time I think I bought a brand new WD 504MB disk for my home box and I discovered that I could replicate the conditions of the bug regularly. I do remember having a eureka moment on a Saturday morning when I realized that the the issue was error handling after a multi-sector transfer gone bad. However, that was one of the last things I did on the MWC payroll. If I remember right, the first round of layoffs was in October of 1994 and I was in them along with Ed Bravo and a few others. Ed and I ran down to a pub that Addison Snell and Jeff Day had showed me a few weeks earlier and threw some darts.

PC hardware was all over the map in those days. I didn't remember this bug being tied to cabling but I only worked it to the point where we recognized that the cause was not handling an error in multi-sector transfers correctly. I do remember putting Scatter/Gather handling into the SCSI driver so that SCSI drives could do the same multi-sector trick. I also dimly remember that Louis Gilberto had to patch my driver for a bug afterwards and Hal said that he didn't have kind words for me.


Thank you for sharing the story! Even these days I enjoy hearing about the company that produced the first Unix system that I used at home (Coherent 3.0). It was my main operating system for more than 10 years and, looking back, I have never enjoyed computing more than in those days.

Regarding the driver bug, I guess I was lucky, because I still used a MR-535 MFM disk in the late 1990's. I think I later upgraded to a 100MB IDE disk, but was still lucky.

Personally I was not even happy about the move from V7 to POSIX, because I enjoyed the simplicity of V7 very much, but things started to change and supporting POSIX was certainly a neccessity at that time.

Anyway, thanks for contributing to a great OS! I still keep a VM with 3.2 around and use it regularly.


> I think I later upgraded to a 100MB IDE disk, but was still lucky.

Well, it really wasn't luck as much as keeping with the electrical specs. Do that, you'd never see a problem, and the later IDE "cable select" schemes did really help to mitigate a lot of the damage from improperly terminated cables.

> Anyway, thanks for contributing to a great OS!

Well, other than living in infamy due to introducing that bug, I didn't start there until the push to turn 4.0 from a tech demo into a real product. So really all the credit for 3.2 and earlier which set the foundation for Coherent belongs to the other guys, many of whom were long gone by the time I got there like Dave Conroy (who wrote the MicroEMACS I loved to use) and Randall Howard (who went on to found MKS). There were some great people from the earlier days still there though, like Norm Bartek, Hal Snyder and La Monte Yarroll who where there when I joined and of course Steve Ness who was the sole man behind the MWC C Compiler (much as Fred was responsible for that remarkable manual).

Also worth a mention, among all the other notable characters I remember fondly is one of the support/QA folks at MWC: Jim Leonard aka Trixter, who became a notable demoscene figure - https://trixter.oldskool.org/2015/04/07/8088-mph-we-break-al...


You were then responsible for my first paid "unix" contract: data recovery on some pooched Coherent disks for a Port of Seattle contractor.

What a small world!


> Coherent ran on the PDP11 and on the 80286.

And on the 8086 and on the Z8000.

Did not know about the disk driver bug! I guess I was lucky, because I used small MFM disks even when IDE disks became ubiquitous. Anyway, I wonder why I never heard about it. I read comp.os.coherent every day.




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: