Between 1970 and 2010 if you could design special purpose hardware that ran 10 times faster than the state of the art you would need to get it to market in volume in under 3 years.
If you took any longer the general purpose CPUs from Intel would by that point be within spitting distance of your superior architecture, at a fraction of the cost.
That's what happened to Symbolics, general purpose PC's could run their software faster than the dedicated machines they designed.
IMO, what killed them off were caches.
First because more than half their shtick was that they ran their VM interpreter in microcode in order to avoid the bus contention between the interpreter and the code it was trying to run. When the inner loop of an interpreter can fit into icache, that gain is moot. (This is the same reason why those silly memcpy and strcpy single instructions you see in just about every cisc exist. They weren't that silly at the time, about half your bandwdith would be ifetch otherwise).
And less importantly at the time, in a caching architecture their naive implementation with lots of pointer chasing is about worse case from a data flow perspective.
So the Lisp machines were targeting a fairly small segment of the market that might have exploded into the mainstream but didn't, so they had less income to reinvest and therefore struggled to maintain their lead over the general purpose/mass market hardware vendors.
Symbolics had C, Pascal and Fortran compilers written in Lisp running on their machines. Others developed an Ada system for these machines. Drawback for all those: expensive and running not that fast. The interesting part: they allowed interactive development in these languages and one could run some C software a bit more safely due to runtime type checking. Examples for non-Lisp programs running on them: TeX and the usual X11 server.
The explosion did not happen during their lifetime, because the whole package was expensive (RAM&Disk was very expensive), it was tied to higher-end hardware with lots of memory, it has never been ported to other architectures on the metal (an Alpha machine - expensive already - would have been fast enough to run a full Lisp OS - but all one had was an emulator for it ... there was not enough manpower to make it run natively) and because the OS was mostly limited to one person, one GUI and one Lisp per machine - it was no real server OS with securely (somehow) separated programs.
I think something like Lisp/Smalltalk/SomethingNewOS machines might have another shot here soon, especially with the loss of gains from Moore's law
Luckily, there’s a way out: have your lisp machine transpile to blub.
It is the mediocrities with delusions of grandeur that do.
Citation needed. This is the exact opposite of my experience in life so far. Anecdotal maybe, but I met quite a few people from >10 different countries.
So given the choice between "mainly LISP" and "LISP, C, Pascal, etc." from larger vendors that were virtually guaranteed to be around in 5 years, it was a rational choice to choose the bog-standard machines.
Symbolics had C, Fortran, Pascal, and Prolog compilers. Scott L. Burson (on HN as ScottBurson) wrote a C compiler that ran on Symbolics and also TI and LMI machines (https://cliki.net/Zeta-C). So there were two C compilers for the Lisp machines.
Some of the reasons given were unique to Symbolics, others more general. LMI went under earlier, and the other two Lisp machine manufacturers (TI and Xerox) stopped manufacturing them but they had other products.
A unikernel like MirageOS is basically an OCaml Machine, except that you don't develop software within it. Someone could create a Lisp on Xen equivalent.
But generally I would agree that the 68020 + MMU would be mostly okay - given that MCL then ran fine under 68030 + 68040. But the MCL runtime was quite a bit weaker: cooperative multitasking, less CLOS and a less capable memory management.
They were expensive and didn't run Unix or MS-DOS.
but they were expensive. like really expensive. not just a few 10s of k to buy, but you basically had to have a really fat support contract to keep them running. and they were really single user, so to have a lab supporting 8 users was a well over a million dollar investment. a deskside sun with terminals or pizza box sun 3s as stations was alot cheaper. and they were fragile. i'm pretty sure one of the ones we had was wire wrapped.
and they were slow. I may be misremembering, but the flashing status line at the bottom of the monitor would sometimes say the equivalent of 'i am performing a floating point operation now'. we would often reboot them overnight (it took a couple hours) so we could run for a while in the morning before the GC started bogging everything down. the little one they came out with near the end, I think it was the 3620, was largely unusable.
the difference was so marked in the end that if you wanted a lisp and didn't want the ferrari model, you could get a mac or a sun and run lucid or allegro and such for basically 1/10th the cost. and it would be faster.
just an additional historical note, the other vertical they had besides DoD AI was graphics. for an additional 20k (I think) you could get a 24 bit frame buffer with a nice trinitron monitor and they had some really pleasant software to work with.
A mid 80s Symbolics Lisp Machine boots in three minutes into a customized Lisp world.
> the little one they came out with near the end, I think it was the 3620
The 3620 was hardly near the end in 1986. In 1988 a completely new generation with a in-house designed microprocessor was released and the last models of those were released in 1992.
Some of the rest is also not really what you say.
How is this a limitation in the hardware? Operating system could switch between tasks using context switching right? Or are you suggesting these machines didn't have any mechanism to make sense of time/ticks or didn't have interrupts?
The MIT/LMI/TI ones didn't take interrupts, the microcode needed to poll the interrupt register at suitable points.
They can be. For example multi-tape turing machines can be nice for modelling certain problems (e.g. "monotone turing machines", with a read-only tape, a write-only tape and a normal read/write tape appear a lot in algorithmic information theory).
> if made right you could cut the machine in half and get two lisp machines
I get where you're coming from in theory, but that's quite far removed from the common usage of the term "lisp machines" (i.e. those which were sold in the 80s), which weren't particularly different from other computers of the time, except for being optimised (in tandem with the OS) for particular workloads (i.e. Lisp programs).
When we look at what computation is, physically, we find it's a lot like a Turing machine underneath (physical rewriting systems), albeit with different kinds of structures that are being rewritten.
Anything involving trees/terms in general seems to be rather messy in terms of physical computation, because there's no straightforward way to represent a manipulatable tree regardless of the medium.
Part of the reason TM-like computing devices sprung up is because strings, i.e linear sequences of symbols, are surprisingly easy to represent and manipulate physically. Turing himself appealed to physical intuition in his original paper.
Just my thoughts as a person who's tried to bootstrap his own LC-based computing environment. I did succeed, but I'm not entirely happy with the results.
There are ways of composing Turing Machines much like composing functions to yield higher abstraction levels in the Lambda Calculus. It comes in the form of building up larger forms of instructions out of simpler machines.
I don't think humans think in a LC-like manner, I think we think compositionally: small parts make up big parts, use the parts you make to make bigger parts. Whether those come in the form of a physical machine or an applicative system doesn't matter, what matters is how you string primitives together.
Check this out. This walks you through building up a more usable language out of the rough-and-ready Turing Machine. There are many ways to go about this idea of composition, but this is a good introduction.
Lambda Calculus, Turing Machines, etc. are no different in this regard. One has its steps realized in reduction and substitution rules, the other has its steps realized in movements of the head and manipulations of the tape.
You'll need more to disprove my hypothesis. All software is built compositionally, up from smaller pieces. It's why we commonly refer to everything as a "software stack": we can reason about smaller making up larger. If you examine most of the code written today, this very minute, you'll see it's compositional, in the way that larger codebases are built from components.
Neither LC or TMs are "natural ways of thought", they're mechanisms. Turns out that TMs won out not only because they admit an easy physical implementation, but also because they're compositional. I can wire up two TMs and get them to perform operations in sequence, perform conditional branching, etc.
All software is compositional, regardless of the medium, and all computation is purely mechanical, regardless of the model of computation, because unless you want to stare at a lambda expression and infer the value, you need to perform some kind of step-wise reduction. That, after all, was Church's point.
I agree. I think it's like Moravec's paradox: we tend to forget how hard it was to first grasp the concepts of programming, and go on to treat whatever's most familar as being "obvious", even if it's not.
Still, there are a few specific points I take issue with:
> Lambda Calculus, Turing Machines, etc. are no different in this regard. One has its steps realized in reduction and substitution rules, the other has its steps realized in movements of the head and manipulations of the tape.
There is a very big difference between TMs and LC in this regard: each step of a TM takes a constant amount of resources (time, energy, whatever https://en.wikipedia.org/wiki/Blum_axioms ); whilst a single beta-reduction "step" in LC may require an arbitrary amount of work (depending on how many times a variable occurs, whether we need to rename to avoid name capture, etc.). My (perhaps naive) assumption is that the work required for beta-reduction is only bounded by the busy beaver function.
This is probably why most computational complexity research sticks with machine models (operational semantics) like TMs. Still, there are alternative approaches which are very natural for LC-style programming, like "cost semantics" (essentially an alternative set of reduction rules for LC, which evaluate a program into its the algorithmic complexity, rather than into its return value).
> Turns out that TMs won out not only because they admit an easy physical implementation, but also because they're compositional.
I'd hesitate to call TMs compositional. As I wrote at https://cstheory.stackexchange.com/questions/21705/what-is-t... TMs can't really be re-used. Consider a TM (or program for a universal TM) for adding two numbers: it's hard to actually use that as part of any other TM/program, since it may clobber the rest of the tape, we need to setup the contents of the tape, state and read-head just right before invoking it, and clean up afterwards, those may require that we shuttle the rest of the tape contents along to make room, and so on. In contrast, LC lives in an abstract world where subexpressions can expand and contract without bumping into each other, terms can be duplicated and rearranged without having to drag them across the intermediate symbols, etc.
A doctor doesn't get to destructure you, and then construct a new you that is exactly like you except without the bacterial infection.
The runtime data crunching, GC, and optimization techniques aren't substantially different from other modern dynamic (or semi-dynamic) languages, though Lisp tends to have more features (dynamic bindings, complex numbers, in-language macros, etc). Cons cells at runtime are usually used as plain linked lists, as any dynamic list/array would be in other languages.
The basis of Lisp is the Lambda Calculus, and a generalization of that gets you term rewriting. Both the basis and the generalization of Lisp admit to requiring a tree (or a graph, in the case of Cons-cells) to get on with the computations, because that's the "state" that you're working with.
The problem is that while these things in the minds of humans are pretty easy to work with, implementing them physically has you rely on things like the RAM model, which, while pretty far away from the Turing Machine, still has machine-like qualities.
That's my argument. We can't really build "Lisp Machines", we can build "Random Access Machines with Lisp Interfaces", because there's no physically implementable version of a tree or graph memory outside of linking randomly-accessible cells.
If you want to manually optimize such a linked data structure on top of an array of node structs using offsets, or hash tables, or whatever, you can do that in Lisp or any other language as well.
Any form of non-serialized memory access will penalize you in today's memory heirarchies, no matter the language. Care can be taken in any language to have cache-aligned data structures and preprocessing to attempt to linearize the most common access patterns, usually with preallocated arrays. But generally that level of fine-grained microoptimization isn't needed, and basic data structure best practices are language-agnostic.
All the terms in lambda calculus are functions; there are no conses, symbols, numbers, strings, nothing. (Immutable) cons cells can be simulated in lambda calculus using functions. This being in the sense that functions resembling cons, car and cdr can be expressed such that car(cons(x, y)) = x, and cdr(cons(x, y)) = y.
Lisp arises from the realization that expressions built out of trees that consist of pairs can be executed using an eval function, in a way that incorporates some concepts from lambda calculus, such as anonymous functions.
McCarthy himself noted this about Lambda Calculus and came up with EVAL. In fact several people have also noticed this. They simply shrugged and delta-ruled all the things.
Not at all. In fact, electronic circuitry is quite similar to "LC style" programming, for example we can think of both as a sort of "pipeline for data to flow through".
On the other hand, Turing machines, RAM machines, etc. aren't particularly well suited to being implemented electronically. We end up introducing clock signals, addressing schemes, etc. and it all ends up very complicated, slow and power hungry. I think the only reason we put up with it at all is because electronics, especially ICs, are so incredibly small and fast to begin with, that these inefficiencies aren't too noticable at the human scale.
I don't think that's the case at all. We see sequential TM-like processes when we look at mainstream electronic computers, but that's kind of a circular argument. When we look at physical processes in general, the computations they're performing tend to be massively parallel.
Pure computation (e.g. lambda calculus, and especially combinatory logic) is well suited to running on massively parallel circuitry (e.g. https://en.wikipedia.org/wiki/Graph_reduction_machine ).
It's perhaps also worth mentioning that such languages can be interpreted as (descriptions of) circuitry! E.g. see http://conal.net/papers/compiling-to-categories
There is also the assq chip, meant as a co-processor for Scheme-81 chip design (which doesn't appear to have an AI Memo), as told by the fascinating Phil Agre (later of Red Rock Eater News Service fame):
I'm sorry I missed out on this: