Hacker News new | comments | show | ask | jobs | submit login
The Future of the LispM (arrdem.com)
56 points by blue1 on Feb 7, 2015 | hide | past | web | favorite | 15 comments



LISP machines failed for a number of reasons, only some of which are LISP-related.

The Symbolics refrigerator-sized LISP machine was one of the most expensive single-user machines ever built. It was designed as the ultimate hacker toy, with a MIT Space Cadet keyboard.[1] Price/performance wasn't that great. Reliability was poor. Service was awful, because it needed on-site servicing and Symbolics didn't have enough local offices. The original garbage collector could take hours, because the virtual memory and the garbage collector did not play well together. Symbolics as a company was quite arrogant. They had the attitude "We are the one gateway to AI".

Soon, LISP compilers for UNIX workstations were written, and Symbolics lost their exclusivity. Later Symbolics products were better, but unnecessary. Around that time, the expert systems bubble deflated. So did the LISP industry.

(I once did a lot of work in LISP, but mostly on Franz LISP on Sun workstations. I used an early Symbolics refrigerator briefly. Cool, but inefficient.)

[1] http://en.wikipedia.org/wiki/Space-cadet_keyboard#mediaviewe...


Some basic misunderstanding here: "However, as a stack machine architecture, there is no opportunity for the instruction level parallelism exploited by modern architectures as the exact state of the stack which is depended on by every instruction changes with every instruction. This forces all program execution to wait during slow operations like main memory reads, ..."

This is of course nonsense. Even an in-order implementation with non-blocking caches will let execution proceed in parallel with a load until the result is demanded. An actual out-of-order implementation will rename the (implicit) registers and proceed as normal.

The _only_ issue with stack architectures is that they are awkward (but not impossible) for compilers to generate optimized code for.

EDIT: I agree with basic motivation for pushing all the way to hardware (just not for Lisp). That's why I work on https://github.com/tommythorn/Reduceron


Author here.

Oh. HN frontpage. That's cool I guess.

Well I have plans for today, so I'm sorry that I won't be sitting here defending my article.

However if you hit me up on twitter at the link at the bottom or email me at my listed email address I'd be more than happy to respond and debate this piece in due time.

Cheers! Reid


The cache locality of linked list cells is a red herring. Garbage collectors seek to keep chained cons cells consecutive in memory.

Also, does the author not understand that most Common Lisp implementations are already built on assemblers implemented in Lisp, generally with portable intermediate representation DSLs generating native machine code?


[deleted]


I'm not talking about cdr-coding, but simply the placement of regular 2-element cons cells and/or their contents next to each other in memory. This is often a "happy accident" side-effect when copying collectors fill their destination space based on the traversal order of heap objects.

This of course applies to any connected chain of objects which can freely coexist in a GC space, not just cons cells.


Which garbage collectors are you talking about, exactly? I presume you have particular Common Lisp implementations in mind, given your second comment. As the author discusses, there are more lisps than Common Lisp, and certainly more varieties of fundamental data structures than cons cells (regardless of the specializations certain implementations have towards optimizing them). The provided link to the Steele talk on optimizing parallel computations over diverse collections is quite relevant.

(Meta: is creating a sock puppet account really necessary to rag on a blog post?)


The big lesson of performance computing has been this: if you have a garbage collector, you lose! Once you surrender control of how memory gets used to a GC, you WILL take a performance it. If you need to crunch through lots of data items and do it fast, use C++ and std::vector.


If you have a constrained situation such that the iteration is known over a range of fixed-type consecutive data, any language should be able to generate the exact same optimized assembly as a manual C++ vector iteration.

Also, if it's large and you're bound by compute time rather than memory bandwidth, iterating a single vector will be slower than splitting it up between threads.

Also, if your data to work on is being streamed in, having to make the choices in managing the allocations & uses of std::vector buffers is much less useful than having the system heuristically balance in a more managed environment.

So, no, C++ std::vector isn't a silver bullet except for microbenchmarks, where other languages can (or should be able to) match. And vector iteration has nothing to do with GC.


"any language should be able to generate the exact same optimized assembly as a manual C++ vector iteration"

This is absolutely, massively untrue. If you try making compilers sometimes, you will see how very hard it is for compilers to be sure about anything.

For example: Are you calling a function anywhere inside that iteration? Is this a copying collector? Could anything that function does (or anyone it calls) possibly cause a GC, or cause us to be confused enough that we can't tell whether a GC might happen or not? Then you need read barriers on all your operations in this function, i.e. your iteration is going to be slow.

"Also, if your data to work on is being streamed in, having to make the choices in managing the allocations & uses of std::vector buffers is much less useful than having the system heuristically balance in a more managed environment."

Also absolutely, massively untrue. Your application knows more about its use cases than the generic systems it is built on (which must handle many many different kinds of programs). Because your application knows what is meant to happen, it can make much better performance decisions.


GC as such requiring read barriers in the mutator is nonsense.

You may need to require read barriers if you do concurrent compaction of the heap, as a lot of the "real-time" collectors do.

But state of the art collectors like G1 can do even concurrent marking (but stop-the-world compaction) without mutator read barriers.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63....


Regarding iteration:

First, if something within an iteration calls out and performs memory allocation, well the first thing is that microoptimizations are likely to be dwarfed anyway. :) But most of the GCs I've worked with do not use read barriers. If a GC occurs which moves the iterated structure, all pointers to that area are modified and the iteration continues unaware that the movement has happened.

There certainly are benefits to read-barrier systems, for instance in idealized fully concurrent non-pausing GC, but they're certainly not universal. In particular, they're the vast minority in terms of Lisp systems. (Of course, they can be zero-overhead in a LispM.)

Regarding heuristics:

Yes, I should convey a bit more context. In Lisp development, it's hard to draw a line where the environment stops and user code starts. Plus, my thinking is from heavy server deployments, not tightly fixed-path systems like games & supercomputing, where the universe of possibilities through a code path tends to be smaller and manually manageable.

If scalability, workloads, and execution environments wildly vary, predicting best performance via hand-tweaking and enumerating particular situational decisions quickly diminishes returns and even ends up regressing performance. Unfortunately, this reflects a lot of C family programming styles. I've moved way too many systems (even in C++) away from such designs into a "just code your application, let the system worry about how to optimize it" and have seen performance improvements, as well as code size collapse and far better status awareness.


Look, what you are saying just doesn't work. What happens when the pointers are in registers? What happens when the loop is occurring in a thread running on another core?

Yes, you can make GC work in these situations, but you are going to pay for it. In perf.

I have to say frankly I do not believe any of the words in your last paragraph at all.


It is conceptually very simple: the GC stops the mutator threads while compaction is in progress and objects are moved around; throughput-focused GCs generally do that, better ones do it in parallel with multiple GC threads.

This has an obvious impact on performance that needs to be compared with the mutator calling malloc/free in a non-GC system. The function isn't called "free" because it runs in zero time, you know.


> What happens when the pointers are in registers? What happens when the loop is occurring in a thread running on another core?

These are all ancient-solved problems. When any interruption occurs, CPU state is spilled onto the stack. Depending on the architecture, the current register usage is either part of the language ABI or determined by the code location indicated by the PC.

> Yes, you can make GC work in these situations, but you are going to pay for it. In perf.

I'm not a big fan of Java the language, but the JVM gets incredible performance doing stuff exactly like this. That and optimization features like inlining virtual function calls (a distinctly faster-than-C++ feature) should be a huge inspiration and impetus towards higher level machine optimization, rather than the human hand-holding the machine through its execution details.


Having GC does not mean that you need to surrender all of control. You just need to know which operations are allocating memory (consing) and which are destructive. In Common Lisp you often have both variants available - like reverse & nreverse.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: