A perfect storm of several things killed Coherent The two biggest
The customers dinging Mark Williams Company in the newsgroup mainly
complained about the lack of TCP/IP networking. This happened because
MWC had done a customer pole to see what big feature should come next?
TCP/IP or X11. X11 won.
The real or perceived drop in quality of the product. This one is hard
to explain. Coherent 3.10 and 4.0 had been solid V7 Unix clones with
V7 sensibilities. When 4.2.05 shipped it included a really nasty disk
driver bug that basically destroyed your file system beyond the
ability of fsck to fix. The bug was triggered when your drive when
into a very common thermal recalibration mode. This mode was rare or
hadn't existed during the days of MFM/RLL/ESDI drives but became
common with ATA drives especially as the market got flooded with cheap
504MB drives. While the bug was fixed somewhere between 4.2.10 and
4.2.14, the damage to Coherent's reputation was done.
As the person responsible (alas), my specific recollection of this particular bug was that the root cause wasn't thermal recalibration, but rather UDMA signalling errors.
Prior versions of Coherent using PIO mode had excruciatingly slow access, and when adding support for UDMA I also added support for the disk driver to recognise sequential access and issue multisector transfer requests; this boosted performance fairly massively, something like 3-4 times for some common things, and it was run for a fairly long time in-house and by beta testers with no trouble before it shipped.
The problem though, was a small - literally one line - arithmetic error when the drive end of things reported a UDMA transfer error had occurred in the middle of a multisector operation; the error-handling code that set up a retry of the operation didn't compute the start kernel address correctly when a whole bunch of transfers had been merged (and some subset had worked).
The primary problem with the UDMA modes was sensitivity to correct cable termination - see https://en.wikipedia.org/wiki/Parallel_ATA#Cable_select for some of that; basically, signal reflections from parallel ATA cable runs that didn't have terminating resistors made things electrically marginal and some systems would have really excessive numbers of UDMA CRC faults as a consequence, and given sufficiently high error rates and really bad timing that could end up polluting the buffer cache with stuff that was skewed by a sector :-(
The big thing (on top of not having any in-house hardware that triggered this specific bug) was the sheer volume of work required for those releases, since getting from what was basically a fairly vanilla Seventh Edition UNIX to where it needed to be to start running large pieces of third-party code expecting POSIX was a big lift. Since there weren't many people, everyone was having to wear lots of hats; for instance, aside from kernel work I did a huge amount of work for POSIX.1 and .2 compatibility and on top of doing the underlying code changes (which ranged all over the system, particularly for some of the stuff we ran into Autotools scripts relying on) all of those needed documenting, too.
[ Fred Butzen did amazing work writing the actual manpage text and making it really easy to understand - he justly deserved the credit for the quality of the manual in terms of its readability. But the scale of the changes needed to bring so many parts and pieces from V7 to POSIX meant lots and lots and lots of work trying to iterate over docs for technical accuracy at the same time as having to redesign all the affected parts and pieces. It was, in a word, exhausting. ]
PC hardware was all over the map in those days. I didn't remember this bug being tied to cabling but I only worked it to the point where we recognized that the cause was not handling an error in multi-sector transfers correctly. I do remember putting Scatter/Gather handling into the SCSI driver so that SCSI drives could do the same multi-sector trick. I also dimly remember that Louis Gilberto had to patch my driver for a bug afterwards and Hal said that he didn't have kind words for me.
Regarding the driver bug, I guess I was lucky, because I still used a MR-535 MFM disk in the late 1990's. I think I later upgraded to a 100MB IDE disk, but was still lucky.
Personally I was not even happy about the move from V7 to POSIX, because I enjoyed the simplicity of V7 very much, but things started to change and supporting POSIX was certainly a neccessity at that time.
Anyway, thanks for contributing to a great OS! I still keep a VM with 3.2 around and use it regularly.
Well, it really wasn't luck as much as keeping with the electrical specs. Do that, you'd never see a problem, and the later IDE "cable select" schemes did really help to mitigate a lot of the damage from improperly terminated cables.
> Anyway, thanks for contributing to a great OS!
Well, other than living in infamy due to introducing that bug, I didn't start there until the push to turn 4.0 from a tech demo into a real product. So really all the credit for 3.2 and earlier which set the foundation for Coherent belongs to the other guys, many of whom were long gone by the time I got there like Dave Conroy (who wrote the MicroEMACS I loved to use) and Randall Howard (who went on to found MKS). There were some great people from the earlier days still there though, like Norm Bartek, Hal Snyder and La Monte Yarroll who where there when I joined and of course Steve Ness who was the sole man behind the MWC C Compiler (much as Fred was responsible for that remarkable manual).
Also worth a mention, among all the other notable characters I remember fondly is one of the support/QA folks at MWC: Jim Leonard aka Trixter, who became a notable demoscene figure - https://trixter.oldskool.org/2015/04/07/8088-mph-we-break-al...
What a small world!
And on the 8086 and on the Z8000.
Did not know about the disk driver bug! I guess I was lucky, because I used small MFM disks even when IDE disks became ubiquitous. Anyway, I wonder why I never heard about it. I read comp.os.coherent every day.