
The Good, the Bad, and the Ugly: The Unix Legacy - orib
http://herpolhode.com/rob/ugly.pdf
======
pavelludiq
"1. What is the best thing about Unix? A: The community. 2\. What is the worst
thing about Unix? A: That there are so many communities."

I can't say i agree. Why are there so many BSD's and why are there even more
Linux distros, and why are there so many 'buntus? Well its my opinion that
having that many choices is an advantage. I can't stand 90% of the Linux
distributions, but some people might think that my choice(Kubuntu) is a bad
one, some would say that choosing Linux instead of FreeBSD or OpenBSD is a bad
idea. Choice is good, sometimes more choices are confusing, but if you just go
for the most popular choices and ignore the less popular ones, its easy to
make them. Will we have one unified Unix one day? If it runs gnome i'll switch
to vista, no tanks Ubuntu, KDE is MY choice, its good to have one.

~~~
silentbicycle
The Unix culture intrinsically values having so many choices, which is a
consequence, I think, of historically being a platform by programmers/for
programmers. Some people are picky about tools, and it _encourages them to
make their own_. It's highly unlikely that it will unify in any real sense as
long as its core design attracts people who bristle at the thought of being
stuck using anything besides KDE, or GNOME, or Fluxbox, or ratpoison, or dwm,
...

I think it's a really good thing that different distributions tend to focus on
particular goals, e.g. security as a design principle in OpenBSD, portability
in NetBSD, being available as a free OS (in various senses) with Debian and
Ubuntu, etc. (I really don't want this to turn into an argument about
different distros; I know I'm trailing off there, I had a harder time
summarizing Debian). Anyway, different foci lead to different discoveries,
bugfixes, etc., but _due to the pervasive Unix culture_ they are shared.

Still, that avalanche of necessary choices will probably keep it thriving in
different niches from a system designed to have a consistent face for
everyone. It's harder to teach people to use, harder to market, etc.

------
gaius
_by making all hardware equivalent, good software enabled bad hardware._

An excellent point.

~~~
tptacek
Is it really? High-end X86 server hardware is quite good.

~~~
orib
Compared to most architectures, X86 sucks pretty badly. Thanks to good
compilers, though, you don't have to care about the lameness of the
instruction set (eg, there are ambiguities in the encoding that need to be
disambiguated by register prefixes, there are lots of subtle and pointless
variations in the instructions, etc.) Good software lets you pretend that the
instruction set you're using isn't actually complete crap.

~~~
tptacek
It's definitely the conventional wisdom that X86 is inferior to MIPS, SPARC,
and PowerPC, but I rarely see that opinion backed up.

There is a lot of legacy crap tacked on to X86 instructions, which X86 CPUs
don't actually execute directly. There's also legacy crap built into other
instruction sets, like register windows in SPARC, predicated instructions in
ARM, and bloated call frames on PowerPC.

Can you be more specific about what you don't like about the X86 architecture?

(A couple years ago, you'd have an easy win with "the bus").

~~~
orib
_Edit: Some people seem to have missed my point; in summary: x86 is ugly (and
below is why I think so) but we don't care because compilers enable us to just
forget about what ISA we're using. This is a good thing. This is the way it
should be. But it's also an example of what the OP was talking about -- bad
hardware design (in this case the x86 ISA; the actual hardware is quite good)
not mattering because software is sufficiently good_

Sure. Some of these I'm not 100% sure of, so I hope I don't make myself look
too stupid (sorry, don't have access to the docs right now) but here goes:

    
    
      - Too few registers
           Mostly fixed in x86_64, but still an issue if you 
           want to support 99% of the machines out there still 
           on 32 bits
    
      - Overlapping register classes
           This makes the register allocator's job much more
           difficult, since not only do registers interfere with
           other registers which hold the same mode of varible,
           they sometimes -- but not always -- interfere with
           other registers of the same class.
    
      - Irregular instructions
           If you use some registers, they're encoded
           in less bytes than others, so the optimizer/allocator
           has yet another parameter to take into account.
    
      - Fixed registers for integer multiplication/division
           You're forced to clobber %eax/%edx and one other reg of
           your choice. This causes more spills than it should,
           and makes the job of the register allocator harder.
    
      - Way too many jump variants
           I think I counted 63 variants of the instructions,
           and you can encode the same one multiple times, which
           gives you hundreds of ways of encoding a jump.
    
      - Complicated encoding
           First, it's a variable length encoding. You can have
           one, two, or 3 byte instruction opcodes, with various
           prefixes and suffixes.
    
      - Ambiguity in some cases
           Instructions can sometimes take the same initial byte
           sequences as a prefix, despite not actually having
           that prefix switched on; when decoding you have to
           figure out the rest of the instruction before you
           decide if you've got a prefix.
    
      - Useless Instructions (minor annoyance)
           x86 has lots (and lots) of instructions that are
           unused by compilers, are slower than simply doing
           the smaller instructions that they are composed of,
           but are still around for compatibility. Sure, it's
           needed, but do I have to like it?
    

Really, x86 implementations these days are quite good, but the instruction set
is not pretty. x86 has more cruft than most other architectures out there.
It's certainly not impossible to write a good compiler for it, it's just a
whole lot harder, especially when it comes to doing register allocation. (x86
isn't even too painful to write by hand!)

~~~
ajross
I think you're overstating things. Generating x86 output from an intermediate
language really isn't that bad. Is it what you would come up with if you were
designing an ISA from scratch, but it's hardly rocket science either.

More to the point: the real determining factor for "CPU uberness" isn't the
ISA at all, it's the process technologies. Modern x86 CPUs aren't exactly
handicapped; in fact they're pretty much the best parts you can buy in almost
all major market areas. Other architectures at this point are increasingly
niche parts: very low power (the ARM family, although Intel is moving into
this market as we speak), or very high parallelism (Sun's Niagra, IBM's Cell),
etc... If any of the stuff you're complaining about really mattered, it ought
to be a competitive advantage to someone, no? But it's verifiably not.

~~~
orib
I fully agree, but I was asked for why I don't like the x86 architecture, so I
gave an answer.

Current x86 CPUs are pretty awesome when it comes down to it. the ISA is quite
hairy, and that makes writing tools for them quite a bit more painful than
for, eg, mips, but because compilers are good you don't feel the pain from
using x86 anymore, so nobody cares -- or should care -- that they're actually
using x86.

------
iigs
from the pdf:

 _cat . doesn't work anymore (readdir vs. read)._

Does anyone know when this last worked? I'd love to install whatever version
of UNIX had this and see the output from it.

~~~
tptacek
It's binary gibberish. It's also the answer to an old DE Shaw interview
question --- how do you tell whether you're on an NFS-mounted filesystem or
not? ("cat ." will fail on NFS).

~~~
nailer
Interesting. Was there no 'mount' command, or for those of us who like to save
commands, 'df .'?

~~~
tptacek
Not in terms and conditions of the question, no.

------
newt0311
Another excellent point: C. C is absolutely wonderful for writing low-level
code like kernels, device drivers, and language interpreters. For anything
else, it is atrocious. The problem is that these two spheres require
drastically different tool sets. Kernels and the kind need to be exposed to
the underlying implementations because they need to manipulate these low level
devices. Application software on the other hand should be completely insulated
from such low level details of the system.

~~~
ajross
Honestly, that conventional wisdom is a little hackneyed.

I'm not sure most folks would call C "wonderful" for low level code. It's
warts are well known, and really don't need to be there: the structure syntax
is needlessly complicated (. and -> are separate operators for no good
reason); the calling conventions are backwards to support a varargs facility
that almost no one understands; they underspecified the type conventions,
leading to perennial confusion about the sizes of ints and longs or the
signedness of char; etc... If you were starting from scratch to write a
language to live in the same space, you probably would fix most of those.

But conversely, it's not "atrocious" for "application" code either. Plenty of
good software has been and will continue to be written in C. It's probably a
poor choice for code that handles mostly string data. It's doesn't have a
whole lot of advantages for an architecture with lots of parallelism (i.e. one
which doesn't need to worry about single-CPU performance issues). Since those
two criteria generally define "web development", you don't see a whole lot of
C used for the kind of problems YC startups work on. Just recognize that those
defined a _very_ limited notion of "application", and that lots of people in
the real world are working on "applications" for which C et. al. are very
useful, eminently practical languages choices.

Really, C is there to abstract the CPU. If the CPU has an instruction to do
something, you'll probably find it as an operator in C. If it doesn't, you
won't. If your problem is appropriately targetted at that level of
abstraction, you'll think C is great. If it's not, you won't. I guess that's
hardly a surprise.

~~~
stcredzero
I've worked on a large client/server property/casualty insurance application
written in C. I've also written programs and middleware in C++, Java, Perl,
and Smalltalk.

C is great for getting the CPU to do something at a somewhat low level, but
still having some ability to abstract. If you want lots of abstraction my
experience is that C++, Java, and Smalltalk are much better choices. Of those,
C++ gives you the most leeway for getting yourself into deep trouble that's
hard to debug. Smalltalk allows you to blow up or lock up the world with one
errant statement, but messing with those parts of the library is rare, and you
can get everything back anyhow because your source code and every change to
the image is kept in something like a transaction log. Java gives you a lot of
the benefits of Smalltalk, but saddles you with a syntax that was designed to
allow low level programming with somewhat high level abstraction, even though
you aren't doing the former and want the latter in spades.

