Hacker News new | past | comments | ask | show | jobs | submit login
The Scheme Machine (1994) [pdf] (burgerrg.github.io)
78 points by steven741 7 months ago | hide | past | web | favorite | 55 comments

Quick question: why did we stop producing lisp machines, or any other machine more closely related to Lambda calculus model of computation than a Turing machine? Is it merely cultural, or are there technical reasons as to why we stopped producing them competitively?

The target markets were too small (or even were shrinking) to justify the investments necessary to keep the architectures going. Keep in mind that all in all only around 10k machines were produced over the technology lifetime (75-92). That's not a large number. Lisp machines started with hand-made CPUs and ended with special micro-processors (TI, Symbolics, ...). The jump to Lisp-supporting newer RISC (or similar) architectures did not happen, because the main sponsors (DARPA, Military, etc.) did no longer saw them as interesting - applications could run on workstations and high-end PCs. Next-gen CPUs were in the making in end 80s / early 90s (Symbolics, Xerox, LMI, SPUR...) but they did not reach the market - money was running out. The whole thing had its last incarnation in commercial emulators: Medley (the Interlisp-D from Xerox) for Intel+SPARC and Open Genera (Symbolics) for DEC Alpha.

In short Moore's Law.

Between 1970 and 2010 if you could design special purpose hardware that ran 10 times faster than the state of the art you would need to get it to market in volume in under 3 years.

If you took any longer the general purpose CPUs from Intel would by that point be within spitting distance of your superior architecture, at a fraction of the cost.

That's what happened to Symbolics, general purpose PC's could run their software faster than the dedicated machines they designed.

IDK. There were plenty of competitors to x86, both old and new, well into the late 90s. Lisp machines died off well before that. Moore's law was an inevitable churn, but it was predictable (that was the point). You knew what gate count you could get at what cost in the future, so you could plan ahead still.

IMO, what killed them off were caches.

First because more than half their shtick was that they ran their VM interpreter in microcode in order to avoid the bus contention between the interpreter and the code it was trying to run. When the inner loop of an interpreter can fit into icache, that gain is moot. (This is the same reason why those silly memcpy and strcpy single instructions you see in just about every cisc exist. They weren't that silly at the time, about half your bandwdith would be ifetch otherwise).

And less importantly at the time, in a caching architecture their naive implementation with lots of pointer chasing is about worse case from a data flow perspective.

Also if you make hardware absolutely optomised for Lisp then all the C, FORTRAN, and Cobol (this was the 1970's!) programmers aren't particularly interested because it does nothing for their code.

So the Lisp machines were targeting a fairly small segment of the market that might have exploded into the mainstream but didn't, so they had less income to reinvest and therefore struggled to maintain their lead over the general purpose/mass market hardware vendors.

The early architectures were micro-programmable. For example the Xerox Lisp Machines were using the exact same hardware as the Smalltalk or the Xerox office systems running Mesa-software. See: https://en.wikipedia.org/wiki/Xerox_Star

Symbolics had C, Pascal and Fortran compilers written in Lisp running on their machines. Others developed an Ada system for these machines. Drawback for all those: expensive and running not that fast. The interesting part: they allowed interactive development in these languages and one could run some C software a bit more safely due to runtime type checking. Examples for non-Lisp programs running on them: TeX and the usual X11 server.

The explosion did not happen during their lifetime, because the whole package was expensive (RAM&Disk was very expensive), it was tied to higher-end hardware with lots of memory, it has never been ported to other architectures on the metal (an Alpha machine - expensive already - would have been fast enough to run a full Lisp OS - but all one had was an emulator for it ... there was not enough manpower to make it run natively) and because the OS was mostly limited to one person, one GUI and one Lisp per machine - it was no real server OS with securely (somehow) separated programs.

Possible additional point: going into the 80s/90s (the last gasps of "alternative" personal computing architecture), there were not many widely-adopted standards across computing. This is why we all bitched about file incompatibility across systems etc. But now we have all of this great stuff: format standards, APIs, hell, even TCP/IP is a universal standard that allows all manner of exotic hardware to speak to the rest of the world.

I think something like Lisp/Smalltalk/SomethingNewOS machines might have another shot here soon, especially with the loss of gains from Moore's law

I would buy a lisp machine the moment someone releases it. Of course, I'd be interested in actual competitive personal/embedded CPUs, not hobbyists attempts.

I'd love to have a true Lisp or Smalltalk OS.

Closest thing in development at the moment: https://github.com/nopsys/CogNOS

> Also if you make hardware absolutely optomised for Lisp then all the C, FORTRAN, and Cobol (this was the 1970's!) programmers aren't particularly interested because it does nothing for their code.

Yes, but it's different now. Now our programming languages are closer to lisp than C. We use javascript, python, ruby, haskell and throw functions all over the place and they come with a cost of frames etc. Even "low level" languages like C++ and Java recently gained powerful capabilities to express lambdas, and C++ programmers use lambdas and std::function pretty often. Why not experiment with a machine that can handle function passing better Turing machines. It seems like if you use enough high-level structures in your code, even a "slower" lisp machine can run your faster than "fast" classical Turing-machine-like-CPU.

This is not even for general purpose CPUs. There are embedded devices people write python or javascript code on. Why not make those CPUs lisp machines?

Also Lisp sucks in large groups. The simplicity is both its biggest strength and weakness. There are fewer smart people by definition, and the smartest people tend to want to make money. The net result is that you end up with blub powering the world.

Luckily, there’s a way out: have your lisp machine transpile to blub.

The smartest people I know are not making any money.

It is the mediocrities with delusions of grandeur that do.

> the smartest people tend to want to make money.

Citation needed. This is the exact opposite of my experience in life so far. Anecdotal maybe, but I met quite a few people from >10 different countries.

The reason is that LISP implementations that ran on "bog standard" hardware such as Motorola 68K, Intel 386 (and higher, but the 386 was released in 1985) and Sun's SPARC machines, could run reasonably well, while also being able to run compilers for other languages and a whole host of other apps. (Yes, there was a C compiler for the LISP machines from Symbolics.)

So given the choice between "mainly LISP" and "LISP, C, Pascal, etc." from larger vendors that were virtually guaranteed to be around in 5 years, it was a rational choice to choose the bog-standard machines.

> Yes, there was a C compiler for the LISP machines from Symbolics.

Symbolics had C, Fortran, Pascal, and Prolog compilers. Scott L. Burson (on HN as ScottBurson) wrote a C compiler that ran on Symbolics and also TI and LMI machines (https://cliki.net/Zeta-C). So there were two C compilers for the Lisp machines.

These articles explain the demise of Symbolics:



Some of the reasons given were unique to Symbolics, others more general. LMI went under earlier, and the other two Lisp machine manufacturers (TI and Xerox) stopped manufacturing them but they had other products.

One road not taken would have been to port the Lisp Machine software to a commercial CPU, not on top of a multitasking Operating System. Then add a native compiler. A fast 68020 plus a custom MMU would have been competitive with the microcoded machines.

A unikernel like MirageOS is basically an OCaml Machine, except that you don't develop software within it. Someone could create a Lisp on Xen equivalent.

The view at Symbolics was mostly (AFAIK) that 32bit were not enough for their software - their microprocessor was a 40bit + 8bit ECC architecture. They also ported their software to the DEC Alpha, because that was a 64bit machine.

But generally I would agree that the 68020 + MMU would be mostly okay - given that MCL then ran fine under 68030 + 68040. But the MCL runtime was quite a bit weaker: cooperative multitasking, less CLOS and a less capable memory management.

The very last version of the Symbolics machine wasn't hardware but instead was a virtual machine running on the Alpha processor. So I would say it was a road that was taken, though it was not a success (probably bad timing since I agree it was a good idea).


The VLM doesn't have a native (to Alpha) compiler.

> why did we stop producing lisp machines

They were expensive and didn't run Unix or MS-DOS.

they did work ok in an unix environment. they had ethernets, you could run them with NFS, you could telnet into them sorta.

but they were expensive. like really expensive. not just a few 10s of k to buy, but you basically had to have a really fat support contract to keep them running. and they were really single user, so to have a lab supporting 8 users was a well over a million dollar investment. a deskside sun with terminals or pizza box sun 3s as stations was alot cheaper. and they were fragile. i'm pretty sure one of the ones we had was wire wrapped.

and they were slow. I may be misremembering, but the flashing status line at the bottom of the monitor would sometimes say the equivalent of 'i am performing a floating point operation now'. we would often reboot them overnight (it took a couple hours) so we could run for a while in the morning before the GC started bogging everything down. the little one they came out with near the end, I think it was the 3620, was largely unusable.

the difference was so marked in the end that if you wanted a lisp and didn't want the ferrari model, you could get a mac or a sun and run lucid or allegro and such for basically 1/10th the cost. and it would be faster.

just an additional historical note, the other vertical they had besides DoD AI was graphics. for an additional 20k (I think) you could get a 24 bit frame buffer with a nice trinitron monitor and they had some really pleasant software to work with.

> we would often reboot them overnight (it took a couple hours)

A mid 80s Symbolics Lisp Machine boots in three minutes into a customized Lisp world.

> the little one they came out with near the end, I think it was the 3620

The 3620 was hardly near the end in 1986. In 1988 a completely new generation with a in-house designed microprocessor was released and the last models of those were released in 1992.

Some of the rest is also not really what you say.

> they were really single user

How is this a limitation in the hardware? Operating system could switch between tasks using context switching right? Or are you suggesting these machines didn't have any mechanism to make sense of time/ticks or didn't have interrupts?

There was no concept of name/address space or device virtualization in the Lisp Machine operating systems. I don't know the instruction set, but I have never seen references to privileged instructions or protection rings. Not that that would be an impediment - you would really want to do virtualization at the software level in a Lisp OS anyway.

They had a single address space, tasks were more like threads in a modern system.

The MIT/LMI/TI ones didn't take interrupts, the microcode needed to poll the interrupt register at suitable points.

It could have been due to their lack of speed and performance. Can't run multiple users' software if it can barely run one user's software

more importantly the whole point of the symbolics was the environment. one hi definition mono display and one space keyboard meant one 3600

The TI Explorer LX had a 68020 co-processor running Unix: http://lemonodor.com/archives/2002/10/ti_explorer_fam.html

My guess is that because the accent was put on serial computations. I may be wrong but for for me the big selling point of a(n ideal) lisp machine is that if made right you could cut the machine in half and get two lisp machines. Conversely, two lisp machines working together are a bigger lisp machine, while two Turing machines working together are not a Turing machine. Maybe now is the time for these machines, or better machine parts randomly assembled by the big blind watchmaker [0] over the web.

[0] https://en.wikipedia.org/wiki/The_Blind_Watchmaker

> two Turing machines working together are not a Turing machine

They can be. For example multi-tape turing machines can be nice for modelling certain problems (e.g. "monotone turing machines", with a read-only tape, a write-only tape and a normal read/write tape appear a lot in algorithmic information theory).

> if made right you could cut the machine in half and get two lisp machines

I get where you're coming from in theory, but that's quite far removed from the common usage of the term "lisp machines" (i.e. those which were sold in the 80s), which weren't particularly different from other computers of the time, except for being optimised (in tandem with the OS) for particular workloads (i.e. Lisp programs).

True for both comments. In the case of the "ideal" lisp machine, in my mind should be something which rewrites local patterns in memory, randomly, in many places. It is possible, theoretically. Multiheaded TMs can also be seen in the same way, provided the state of the heads are also actually on the tape. Both models are particular rewrite systems and among them the lambda calculus based one is simpler (has fewer rewrites) than the TM one. Now that we really feel the need for decentralized computing, maybe some simple generic hardware (much simpler than a TM style processor) could be more fit than what we have now.

Because the Lambda calculus is, in large part, rather messy to implement when it comes to physical computing.

When we look at what computation is, physically, we find it's a lot like a Turing machine underneath (physical rewriting systems), albeit with different kinds of structures that are being rewritten.

Anything involving trees/terms in general seems to be rather messy in terms of physical computation, because there's no straightforward way to represent a manipulatable tree regardless of the medium.

Part of the reason TM-like computing devices sprung up is because strings, i.e linear sequences of symbols, are surprisingly easy to represent and manipulate physically. Turing himself appealed to physical intuition in his original paper.

Just my thoughts as a person who's tried to bootstrap his own LC-based computing environment. I did succeed, but I'm not entirely happy with the results.

I'm under the impression that LC is closer to how humans think of computation, and it seems that even though it's easier to implement TM as a bare-metal machine, isn't it a trade-off? If humans mostly produce LC-like high-level code, then compile them to TM-like machine code, wouldn't it be better at some point to produce LC-like machines? Because even though they're slower at face value, wouldn't it be faster on "human software".

This comment assumes 'humans mostly produce LC-like high-level code', but I think this is valid; other than C, I cannot think of a language in which programmers don't use lambdas, higher-order functions ubiquitously. It seems like our CPUs are optimized for C, but I'm curious how would it change if we optimize them for javascript, python or haskell...

Here's a test: do you think in terms of step-by-step instructions to solve a process?

There are ways of composing Turing Machines much like composing functions to yield higher abstraction levels in the Lambda Calculus. It comes in the form of building up larger forms of instructions out of simpler machines.

I don't think humans think in a LC-like manner, I think we think compositionally: small parts make up big parts, use the parts you make to make bigger parts. Whether those come in the form of a physical machine or an applicative system doesn't matter, what matters is how you string primitives together.

Check this[1] out. This walks you through building up a more usable language out of the rough-and-ready Turing Machine. There are many ways to go about this idea of composition, but this is a good introduction.


I disagree with what you're suggesting. It's tempting to think humans think procedurally, because that's how we study algorithms. I think this is very misleading. Algorithm is an abstraction of a step-by-step process; studying algorithms in an applicative fashion would be harder simply because they're different sort of things (algorithm is abstracted out of something else). Computation, on the other hand, is a larger category where you're not interested in a particular algorithm in isolation; instead, you have N algorithms and M data.

Your hypothesis doesn't pass any test, if you look at most of the code written today, this very minute, you'll see that it is mostly written in LC-like manner. I'm also not suggesting it is purely applicative, but most PLs seem to be a cross between two models, LC getting more traction recently (look at programming languages I listed above). Compare how popular C was a few decades ago vs now. Our web browsers' native language is javascript, which is essentially lisp in a C-suit, and people started using the same language on desktop and even in embedded devices. Where is C, Fortran, Algol?

If you can't define an algorithm as a step-by-step process, what can you define it as? That's the whole point: it is a process, manifested as sets of discrete steps, towards an end result by following sets of rules.

Lambda Calculus, Turing Machines, etc. are no different in this regard. One has its steps realized in reduction and substitution rules, the other has its steps realized in movements of the head and manipulations of the tape.

You'll need more to disprove my hypothesis. All software is built compositionally, up from smaller pieces. It's why we commonly refer to everything as a "software stack": we can reason about smaller making up larger. If you examine most of the code written today, this very minute, you'll see it's compositional, in the way that larger codebases are built from components.

Neither LC or TMs are "natural ways of thought", they're mechanisms. Turns out that TMs won out not only because they admit an easy physical implementation, but also because they're compositional. I can wire up two TMs and get them to perform operations in sequence, perform conditional branching, etc.

JavaScript is hardly Lisp in a C-suit, but that's beside the point, because C is pretty terrible (despite it being my favorite) regardless of what paradigm you choose. I'm not going to devote much of this comment towards that.

All software is compositional, regardless of the medium, and all computation is purely mechanical, regardless of the model of computation, because unless you want to stare at a lambda expression and infer the value, you need to perform some kind of step-wise reduction. That, after all, was Church's point.

> Neither LC or TMs are "natural ways of thought"

I agree. I think it's like Moravec's paradox: we tend to forget how hard it was to first grasp the concepts of programming, and go on to treat whatever's most familar as being "obvious", even if it's not.

Still, there are a few specific points I take issue with:

> Lambda Calculus, Turing Machines, etc. are no different in this regard. One has its steps realized in reduction and substitution rules, the other has its steps realized in movements of the head and manipulations of the tape.

There is a very big difference between TMs and LC in this regard: each step of a TM takes a constant amount of resources (time, energy, whatever https://en.wikipedia.org/wiki/Blum_axioms ); whilst a single beta-reduction "step" in LC may require an arbitrary amount of work (depending on how many times a variable occurs, whether we need to rename to avoid name capture, etc.). My (perhaps naive) assumption is that the work required for beta-reduction is only bounded by the busy beaver function.

This is probably why most computational complexity research sticks with machine models (operational semantics) like TMs. Still, there are alternative approaches which are very natural for LC-style programming, like "cost semantics" (essentially an alternative set of reduction rules for LC, which evaluate a program into its the algorithmic complexity, rather than into its return value).

> Turns out that TMs won out not only because they admit an easy physical implementation, but also because they're compositional.

I'd hesitate to call TMs compositional. As I wrote at https://cstheory.stackexchange.com/questions/21705/what-is-t... TMs can't really be re-used. Consider a TM (or program for a universal TM) for adding two numbers: it's hard to actually use that as part of any other TM/program, since it may clobber the rest of the tape, we need to setup the contents of the tape, state and read-head just right before invoking it, and clean up afterwards, those may require that we shuttle the rest of the tape contents along to make room, and so on. In contrast, LC lives in an abstract world where subexpressions can expand and contract without bumping into each other, terms can be duplicated and rearranged without having to drag them across the intermediate symbols, etc.

I think you're right w.r.t re-usability, though I've found semi-Thue systems more useful for formulating Turing Machines in order to avoid shifting things around. The string can just expand as needed, though the complexity probably changes.

Humans think procedurally because that's how the real world works. Assembling something, cooking, developing land, treating a patient, you name it.

A doctor doesn't get to destructure you, and then construct a new you that is exactly like you except without the bacterial infection.

Cons-based tree data structures have little to do with data processing in Lisp. Specifically, it is a compile time feature (which is available at runtime, too) that source code is contained in these structures in a sort of AST gives it its metaprogramming capabilities.

The runtime data crunching, GC, and optimization techniques aren't substantially different from other modern dynamic (or semi-dynamic) languages, though Lisp tends to have more features (dynamic bindings, complex numbers, in-language macros, etc). Cons cells at runtime are usually used as plain linked lists, as any dynamic list/array would be in other languages.

On a related note, the SPEC CINT92 benchmark suite included running a tree walking algoritm in interpreted XLISP. I have no idea what the benchmark author thought it demonstrated.

The problem is all of this relies on a representation of the data structure you're manipulating. What exactly is the structure that a Lisp machine is manipulating?

The basis of Lisp is the Lambda Calculus, and a generalization of that gets you term rewriting. Both the basis and the generalization of Lisp admit to requiring a tree (or a graph, in the case of Cons-cells) to get on with the computations, because that's the "state" that you're working with.

The problem is that while these things in the minds of humans are pretty easy to work with, implementing them physically has you rely on things like the RAM model, which, while pretty far away from the Turing Machine, still has machine-like qualities.

That's my argument. We can't really build "Lisp Machines", we can build "Random Access Machines with Lisp Interfaces", because there's no physically implementable version of a tree or graph memory outside of linking randomly-accessible cells.

I have no idea how any of that is unique to Lisp or lambda calculus. If you want a tree or graph structure in another language, you'll have random access of nodes in memory, too.

If you want to manually optimize such a linked data structure on top of an array of node structs using offsets, or hash tables, or whatever, you can do that in Lisp or any other language as well.

Any form of non-serialized memory access will penalize you in today's memory heirarchies, no matter the language. Care can be taken in any language to have cache-aligned data structures and preprocessing to attempt to linearize the most common access patterns, usually with preallocated arrays. But generally that level of fine-grained microoptimization isn't needed, and basic data structure best practices are language-agnostic.

The basis of Lisp isn't lambda calculus. Lambda calculus doesn't have a concept of its own expressions being data manipulated by lambda calculus. There is no quote operator in lambda calculus.

All the terms in lambda calculus are functions; there are no conses, symbols, numbers, strings, nothing. (Immutable) cons cells can be simulated in lambda calculus using functions. This being in the sense that functions resembling cons, car and cdr can be expressed such that car(cons(x, y)) = x, and cdr(cons(x, y)) = y.

Lisp arises from the realization that expressions built out of trees that consist of pairs can be executed using an eval function, in a way that incorporates some concepts from lambda calculus, such as anonymous functions.

> Lambda calculus doesn't have a concept of its own expressions being data manipulated by lambda calculus. There is no quote operator in lambda calculus.

McCarthy himself noted this about Lambda Calculus and came up with EVAL. In fact several people have also noticed this. They simply shrugged and delta-ruled all the things.

> implementing them physically has you rely on things like the RAM model

Not at all. In fact, electronic circuitry is quite similar to "LC style" programming, for example we can think of both as a sort of "pipeline for data to flow through".

On the other hand, Turing machines, RAM machines, etc. aren't particularly well suited to being implemented electronically. We end up introducing clock signals, addressing schemes, etc. and it all ends up very complicated, slow and power hungry. I think the only reason we put up with it at all is because electronics, especially ICs, are so incredibly small and fast to begin with, that these inefficiencies aren't too noticable at the human scale.

> When we look at what computation is, physically, we find it's a lot like a Turing machine underneath (physical rewriting systems), albeit with different kinds of structures that are being rewritten.

I don't think that's the case at all. We see sequential TM-like processes when we look at mainstream electronic computers, but that's kind of a circular argument. When we look at physical processes in general, the computations they're performing tend to be massively parallel.

Pure computation (e.g. lambda calculus, and especially combinatory logic) is well suited to running on massively parallel circuitry (e.g. https://en.wikipedia.org/wiki/Graph_reduction_machine ).

It's perhaps also worth mentioning that such languages can be interpreted as (descriptions of) circuitry! E.g. see http://conal.net/papers/compiling-to-categories

Here's a similar older project done at MIT AI Lab, another Scheme CPU called the Scheme86: https://dspace.mit.edu/handle/1721.1/6468

don't forget what must by definition be the first scheme hardware design (lambda: the ultimate opcode) http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-...

And the follow-on implementation: The Scheme-79 Chip https://dspace.mit.edu/handle/1721.1/6334

There is also the assq chip, meant as a co-processor for Scheme-81 chip design (which doesn't appear to have an AI Memo), as told by the fascinating Phil Agre (later of Red Rock Eater News Service fame): https://dspace.mit.edu/bitstream/handle/1721.1/41168/AI_WP_2...

I'm sorry I missed out on this: https://www.artsy.net/artwork/gerald-sussman-scheme

For all those downvoting my comments, why? Am I not contributing to the discussion?

I'm thinking that your extraordinary rudeness to Carl Hewitt, one of our elders, on the Actor thread was not appreciated.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact