The Symbolics refrigerator-sized LISP machine was one of the most expensive single-user machines ever built. It was designed as the ultimate hacker toy, with a MIT Space Cadet keyboard. Price/performance wasn't that great. Reliability was poor. Service was awful, because it needed on-site servicing and Symbolics didn't have enough local offices. The original garbage collector could take hours, because the virtual memory and the garbage collector did not play well together.
Symbolics as a company was quite arrogant. They had the attitude "We are the one gateway to AI".
Soon, LISP compilers for UNIX workstations were written, and Symbolics lost their exclusivity. Later Symbolics products were better, but unnecessary. Around that time, the expert systems bubble deflated. So did the LISP industry.
(I once did a lot of work in LISP, but mostly on Franz LISP on Sun workstations. I used an early Symbolics refrigerator briefly. Cool, but inefficient.)
This is of course nonsense. Even an in-order implementation with non-blocking caches will let execution proceed in parallel with a load until the result is demanded. An actual out-of-order implementation will rename the (implicit) registers and proceed as normal.
The _only_ issue with stack architectures is that they are awkward (but not impossible) for compilers to generate optimized code for.
EDIT: I agree with basic motivation for pushing all the way to hardware (just not for Lisp). That's why I work on https://github.com/tommythorn/Reduceron
Oh. HN frontpage. That's cool I guess.
Well I have plans for today, so I'm sorry that I won't be sitting here defending my article.
However if you hit me up on twitter at the link at the bottom or email me at my listed email address I'd be more than happy to respond and debate this piece in due time.
Also, does the author not understand that most Common Lisp implementations are already built on assemblers implemented in Lisp, generally with portable intermediate representation DSLs generating native machine code?
This of course applies to any connected chain of objects which can freely coexist in a GC space, not just cons cells.
(Meta: is creating a sock puppet account really necessary to rag on a blog post?)
Also, if it's large and you're bound by compute time rather than memory bandwidth, iterating a single vector will be slower than splitting it up between threads.
Also, if your data to work on is being streamed in, having to make the choices in managing the allocations & uses of std::vector buffers is much less useful than having the system heuristically balance in a more managed environment.
So, no, C++ std::vector isn't a silver bullet except for microbenchmarks, where other languages can (or should be able to) match. And vector iteration has nothing to do with GC.
This is absolutely, massively untrue. If you try making compilers sometimes, you will see how very hard it is for compilers to be sure about anything.
For example: Are you calling a function anywhere inside that iteration? Is this a copying collector? Could anything that function does (or anyone it calls) possibly cause a GC, or cause us to be confused enough that we can't tell whether a GC might happen or not? Then you need read barriers on all your operations in this function, i.e. your iteration is going to be slow.
"Also, if your data to work on is being streamed in, having to make the choices in managing the allocations & uses of std::vector buffers is much less useful than having the system heuristically balance in a more managed environment."
Also absolutely, massively untrue. Your application knows more about its use cases than the generic systems it is built on (which must handle many many different kinds of programs). Because your application knows what is meant to happen, it can make much better performance decisions.
You may need to require read barriers if you do concurrent compaction of the heap, as a lot of the "real-time" collectors do.
But state of the art collectors like G1 can do even concurrent marking (but stop-the-world compaction) without mutator read barriers.
First, if something within an iteration calls out and performs memory allocation, well the first thing is that microoptimizations are likely to be dwarfed anyway. :) But most of the GCs I've worked with do not use read barriers. If a GC occurs which moves the iterated structure, all pointers to that area are modified and the iteration continues unaware that the movement has happened.
There certainly are benefits to read-barrier systems, for instance in idealized fully concurrent non-pausing GC, but they're certainly not universal. In particular, they're the vast minority in terms of Lisp systems. (Of course, they can be zero-overhead in a LispM.)
Yes, I should convey a bit more context. In Lisp development, it's hard to draw a line where the environment stops and user code starts. Plus, my thinking is from heavy server deployments, not tightly fixed-path systems like games & supercomputing, where the universe of possibilities through a code path tends to be smaller and manually manageable.
If scalability, workloads, and execution environments wildly vary, predicting best performance via hand-tweaking and enumerating particular situational decisions quickly diminishes returns and even ends up regressing performance. Unfortunately, this reflects a lot of C family programming styles. I've moved way too many systems (even in C++) away from such designs into a "just code your application, let the system worry about how to optimize it" and have seen performance improvements, as well as code size collapse and far better status awareness.
Yes, you can make GC work in these situations, but you are going to pay for it. In perf.
I have to say frankly I do not believe any of the words in your last paragraph at all.
This has an obvious impact on performance that needs to be compared with the mutator calling malloc/free in a non-GC system. The function isn't called "free" because it runs in zero time, you know.
These are all ancient-solved problems. When any interruption occurs, CPU state is spilled onto the stack. Depending on the architecture, the current register usage is either part of the language ABI or determined by the code location indicated by the PC.
> Yes, you can make GC work in these situations, but you are going to pay for it. In perf.
I'm not a big fan of Java the language, but the JVM gets incredible performance doing stuff exactly like this. That and optimization features like inlining virtual function calls (a distinctly faster-than-C++ feature) should be a huge inspiration and impetus towards higher level machine optimization, rather than the human hand-holding the machine through its execution details.