
Brittle systems - vezzy-fnord
http://blog.dshr.org/2015/06/brittle-systems.html
======
amsha
Security and fault-tolerance are cost centers, so unless they are explicit
features of a deliverable they will be ignored. Planes, cars, and banks all
have anomaly detection and auditing because people demand it. They understand
the risks of failure. Conversely, very few people are demanding remote
loopback facilities for IP. That's not a criticism of fault-tolerance, but if
it's important then we should communicate more effectively. Why is fault-
tolerance important? What goes wrong without it? What catastrophes could have
been averted if we had considered it?

Edit for clarity

------
nickpsecurity
This is a problem that goes way back. Look at Burroughs B5000, System/38,
KeyKOS, VMS clustering, NonStop... all architectures that prevented or easily
recovered from all sorts of problems. Market almost always chose against them
with only two still marketable. There are currently inexpensive CPU's, esp
embedded PPC, supporting lock-step along with high reliability OS's. Avoided
in most deployments even for important systems. The old security engineering
techniques of specifying all good/bad states, simplified implementation, non-
bypassable TCB's, covert channel analysis, use of guards, and so on have been
largely ignored in security industry despite empirical evidence of their
benefit. Tiny niche in defense & safety-critical, as author says, that still
knows some of this stuff. Example [1].

Everything mainstream, proprietary or false, just seems thrown together for a
variety of reasons with few exceptions. Even stuff that needs to be better
doesn't get that way. Further, I'm not even sure most have ever heard of the
approaches that work: can't even get a good start without a good foundation to
build on. I think the only solution will be a killer app needing resilience
that does its thing right, uses all the right techniques, is affordable, is
easily extended, and creates awareness of good engineering practices when
people try to imitate it.

I can't see anything else working. Btw HN readers, regarding Byzantine etc,
the linked document shows Boeing's Survivable Spread leveraged a trusted
component to reduce fault-tolerance cost f+1 replicas for f failures minus a
few use cases. Do any readers up-to-date on FT research know of advances in
past few years for similarly minimal-cost schemes for FT or BFT?

[1]
[http://www.dtic.mil/dtic/tr/fulltext/u2/a425566.pdf](http://www.dtic.mil/dtic/tr/fulltext/u2/a425566.pdf)

~~~
vezzy-fnord
It seems like the only real survivors of that era are IBM and their big iron,
particularly the POWER architecture. Pretty much everything virtualization-
related can be traced back to IBM and the containers that people are swooning
over today can be traced to LPARs in OS/400, though earlier examples can be
found if you stretch the definition.

The Burroughs B5000 blew my mind when I read about it after Alan Kay mentioned
it. To think that it used ALGOL as its machine language (among other things)
_in 1961_ makes you want to weep when you see things like MMX and SSE
instructions being the hot thing of the present.

~~~
listic
To Intel's defense, there _is_ a reason they resort to their approach of
gradual improvements. In 1981, after a couple of successful iterations on
their CPU, they _did_ make a clean break with the past with iAPX 432 [1] - an
entirely new 32-bit CPU designed to be programmed entirely in high-level
languages. Hardly anyone seems to remember it, and for a reason: it failed,
hard. Intel seems to have learned the lesson and didn't do this anymore.

[1]
[https://en.wikipedia.org/wiki/Intel_iAPX_432](https://en.wikipedia.org/wiki/Intel_iAPX_432)

~~~
vezzy-fnord
Interesting, though this shouldn't be an indictment on HLAs in general, rather
the iAPX 432 in particular. Same way the Mach server was only the tip of the
iceberg in the sphere of microkernels.

~~~
nickpsecurity
True: the LISP machines, Wirth's Lilith w/ M-code processor, ASOS embedded Ada
system, JOP embedded Java processor, and Azul Systems' Vega processors show
HLA's can work just fine. Even better than competing offerings in ways. :)

------
guard-of-terra
In the future, almost every action user might do on an information system
ought to be rolled back cleanly and easily once presented with higher priority
key. That will make e.g. account takeover not worth the hassle.

But right now all the systems we use are frighteningly brittle.

~~~
btown
Even if everything was one system, how do you handle side effects? An order I
make on Seamless causes physical inventory to be consumed. That's just de
nature of the system. No way to undo that. So there's still a reason for
account takeover there, if someone else could consume on my dime by hacking my
account temporarily.

