
The Future of the LispM - blue1
http://arrdem.com/2014/11/28/the_future_of_the_lispm/
======
Animats
LISP machines failed for a number of reasons, only some of which are LISP-
related.

The Symbolics refrigerator-sized LISP machine was one of the most expensive
single-user machines ever built. It was designed as the ultimate hacker toy,
with a MIT Space Cadet keyboard.[1] Price/performance wasn't that great.
Reliability was poor. Service was awful, because it needed on-site servicing
and Symbolics didn't have enough local offices. The original garbage collector
could take _hours_ , because the virtual memory and the garbage collector did
not play well together. Symbolics as a company was quite arrogant. They had
the attitude "We are the one gateway to AI".

Soon, LISP compilers for UNIX workstations were written, and Symbolics lost
their exclusivity. Later Symbolics products were better, but unnecessary.
Around that time, the expert systems bubble deflated. So did the LISP
industry.

(I once did a lot of work in LISP, but mostly on Franz LISP on Sun
workstations. I used an early Symbolics refrigerator briefly. Cool, but
inefficient.)

[1] [http://en.wikipedia.org/wiki/Space-
cadet_keyboard#mediaviewe...](http://en.wikipedia.org/wiki/Space-
cadet_keyboard#mediaviewer/File:Space-cadet.jpg)

------
FullyFunctional
Some basic misunderstanding here: "However, as a stack machine architecture,
there is no opportunity for the instruction level parallelism exploited by
modern architectures as the exact state of the stack which is depended on by
every instruction changes with every instruction. This forces all program
execution to wait during slow operations like main memory reads, ..."

This is of course nonsense. Even an in-order implementation with non-blocking
caches will let execution proceed in parallel with a load until the result is
demanded. An actual out-of-order implementation will rename the (implicit)
registers and proceed as normal.

The _only_ issue with stack architectures is that they are awkward (but not
impossible) for compilers to generate optimized code for.

EDIT: I agree with basic motivation for pushing all the way to hardware (just
not for Lisp). That's why I work on
[https://github.com/tommythorn/Reduceron](https://github.com/tommythorn/Reduceron)

------
rdmckenzie
Author here.

Oh. HN frontpage. That's cool I guess.

Well I have plans for today, so I'm sorry that I won't be sitting here
defending my article.

However if you hit me up on twitter at the link at the bottom or email me at
my listed email address I'd be more than happy to respond and debate this
piece in due time.

Cheers! Reid

------
white-flame
The cache locality of linked list cells is a red herring. Garbage collectors
seek to keep chained cons cells consecutive in memory.

Also, does the author not understand that most Common Lisp implementations are
already built on assemblers implemented in Lisp, generally with portable
intermediate representation DSLs generating native machine code?

~~~
bitwize
The big lesson of performance computing has been this: if you have a garbage
collector, you lose! Once you surrender control of how memory gets used to a
GC, you WILL take a performance it. If you need to crunch through lots of data
items and do it fast, use C++ and std::vector.

~~~
white-flame
If you have a constrained situation such that the iteration is known over a
range of fixed-type consecutive data, any language should be able to generate
the exact same optimized assembly as a manual C++ vector iteration.

Also, if it's large and you're bound by compute time rather than memory
bandwidth, iterating a single vector will be slower than splitting it up
between threads.

Also, if your data to work on is being streamed in, having to make the choices
in managing the allocations & uses of std::vector buffers is much less useful
than having the system heuristically balance in a more managed environment.

So, no, C++ std::vector isn't a silver bullet except for microbenchmarks,
where other languages can (or should be able to) match. And vector iteration
has nothing to do with GC.

~~~
jblow
"any language should be able to generate the exact same optimized assembly as
a manual C++ vector iteration"

This is absolutely, massively untrue. If you try making compilers sometimes,
you will see how very hard it is for compilers to be sure about anything.

For example: Are you calling a function anywhere inside that iteration? Is
this a copying collector? Could anything that function does (or anyone it
calls) possibly cause a GC, or cause us to be confused enough that we can't
tell whether a GC might happen or not? Then you need read barriers on all your
operations in this function, i.e. your iteration is going to be slow.

"Also, if your data to work on is being streamed in, having to make the
choices in managing the allocations & uses of std::vector buffers is much less
useful than having the system heuristically balance in a more managed
environment."

Also absolutely, massively untrue. Your application knows more about its use
cases than the generic systems it is built on (which must handle many many
different kinds of programs). Because your application knows what is meant to
happen, it can make much better performance decisions.

~~~
white-flame
Regarding iteration:

First, if something within an iteration calls out and performs memory
allocation, well the first thing is that microoptimizations are likely to be
dwarfed anyway. :) But most of the GCs I've worked with do not use read
barriers. If a GC occurs which moves the iterated structure, all pointers to
that area are modified and the iteration continues unaware that the movement
has happened.

There certainly are benefits to read-barrier systems, for instance in
idealized fully concurrent non-pausing GC, but they're certainly not
universal. In particular, they're the vast minority in terms of Lisp systems.
(Of course, they can be zero-overhead in a LispM.)

Regarding heuristics:

Yes, I should convey a bit more context. In Lisp development, it's hard to
draw a line where the environment stops and user code starts. Plus, my
thinking is from heavy server deployments, not tightly fixed-path systems like
games & supercomputing, where the universe of possibilities through a code
path tends to be smaller and manually manageable.

If scalability, workloads, and execution environments wildly vary, predicting
best performance via hand-tweaking and enumerating particular situational
decisions quickly diminishes returns and even ends up regressing performance.
Unfortunately, this reflects a lot of C family programming styles. I've moved
way too many systems (even in C++) away from such designs into a "just code
your application, let the system worry about how to optimize it" and have seen
performance improvements, as well as code size collapse and far better status
awareness.

~~~
jblow
Look, what you are saying just doesn't work. What happens when the pointers
are in registers? What happens when the loop is occurring in a thread running
on another core?

Yes, you can make GC work in these situations, but you are going to pay for
it. In perf.

I have to say frankly I do not believe any of the words in your last paragraph
at all.

~~~
the_why_of_y
It is conceptually very simple: the GC stops the mutator threads while
compaction is in progress and objects are moved around; throughput-focused GCs
generally do that, better ones do it in parallel with multiple GC threads.

This has an obvious impact on performance that needs to be compared with the
mutator calling malloc/free in a non-GC system. The function isn't called
"free" because it runs in zero time, you know.

