Hacker News new | past | comments | ask | show | jobs | submit login
My history with Forth and stack machines (2010) (yosefk.com)
80 points by sph on Aug 6, 2023 | hide | past | favorite | 100 comments



People interested in Forth might want to also check out PostScript (similar and stack-based). I recently wrote a blog post with a very brief PostScript tutorial and an example program translated line-by-line to Python https://kenschutte.com/postscript-vs-python/


Well, my favorite stack based language was RPL on the HP. Simply because it fit well in the scope of the environment that it was used. It was really easy to develop and debug on the calculator (I know many did very large projects working with editors on the PC, but I did all my work on the machine itself).

It lacked any kind of composite structure outside of lists and arrays. No structs or records. But that was never a real barrier.

It was just a nice, high level stack machine with a cool library and toolset.


My favorite stack based language is dc. It's also the oldest language still found on Unix systems, predating even C (the original version was written in B on a PDP-11 - even before it had an assembler).

https://www.cs.dartmouth.edu/~doug/reader.pdf

> Almost everybody in the group has done languages. Thompson and Ritchie built assemblers from scratch. On the tiny PDP-7 the assembler was supplemented by tmg, Doug McIlroy’s version of Bob McClure’s compiler-compiler. Using it Thompson wrote B, the evolutionary link between Martin Richards’s (Cambridge University) BCPL and C. The PDP-11 assembler, a desk calculator dc, and B itself were written in B to bootstrap the system to the PDP-11. Because it could run before the disk had arrived, dc—not the assembler—became the first language to run on our PDP-11. Soon revised to handle arbitrary- precision numbers (v1), dc was taken over by Bob Morris and Lorinda Cherry. It now ranks as the senior language on UNIX systems.


huh, i didn't realize until just now that dc had arrays, the : and ; operations

i guess that makes it a far more practical platform for applications programming than i had always thought

unfortunately afaik it still can't take character input. i guess you could pipe its input through something that converts it into a stream of numbers?


As a big RPN and HP calc fan forever, and a recent Forth fan, I was a bit confused by my initial foray into RPL. It seemed like it allow infix notation and that threw me of at first glance.

Did I bail to soon?

I am a big stack machine convert and have had an HP RPN calc since 12.


The HP-48 certainly could do infix style calculations and formulas (the solvers work with these).

But from the prompt it’s not particularly natural. Algebraic entry is a second class citizen.

But do look at RPL. Anything that lets you push anonymous functions onto the stack can’t be all bad.


The first thing I ever wrote in Perl was a CGI script that printed out a form in Postscript. I wrote ps to draw boxes and fit the text the user had entered.

It wasn't pretty to mix perl and ps, and I don't think I have ever written anything as wacky as that code, but it was fun.

Folie de la jeunesse.


i don't think postscript is very similar to forth, although they are both imperative and stack-based, which gives their syntax a superficial similarity

what's more important is that (in the terminology of eopl) their expressed values and denoted values are completely different


Thanks, that's going on my reading list.

Question to the more educated among us: I read that Apple's UI was based on Display PostScript... how similar is it to original PostScript, and why was it abandoned?


Apple's UI is based on Display PDF since macOS 10 (Mac OS X, using the old branding), this is the royalty-free replacement for Display PostScript of OPENSTEP and NeXTSTEP which became the foundation for post-9 macOS after Apple absorbed NeXT.

@DonHopkins is the most knowledgeable person on this topic here since he worked on another PostScript-based graphics system NeWS for the competitor Sun Microsystems.


I have pinged Don (though there is a fair chance he's already sleeping).


At 1:30 am on a Monday morning? No way!


Hehe. Ok :) Good morning, Don.


NeWS is an interesting animal. I loved going through the original PDF manuals for it.


That’s interesting. I don’t know anything about it but maybe it means it was based on PS “drawing model” more than the actual PS language. (ie the basic primitives and how stroke, fill, etc work)


It was used as the display language for NeXT and some unix workstations. Just googling real quick it was a direct dialect of PostScript with the main changes being related to using efficient 32 bit integer ids in place of strings and similar performance optimizations.


There's an important distinction between the PostScript programming language, and the PostScript imaging model. You can have one without the other. There are a lot of problems using a full blown programming language as a graphics model.

PDF is like PostScript without the programming language, just the imaging model. (But with a lot of extra crap thrown in so haphazardly and indiscriminately that the Adobe PDF Viewer browser plugin traditionally requires downloading patches almost every day!)

But you can "distill" (partially evaluate) a PostScript program into a PDF document by executing it, and recording the calls to the PostScript imaging model that it makes (and you can also optimize them too -- see Glenn Reid's and Acrobat's "Distillery").

Apple made a library called Quartz, that was like PDF for the screen, without the programming language part that Display PostScript of NeWS supported. And Quartz evolved into Core Graphics in Cocoa, which is used today.

PostScript calls it the "Stencil / Paint Imaging Model". There are other libraries and APIs inspired by PostScript, like Quartz, Core Graphics, Keith Packard's Cairo graphics library, and the browser's Canvas API (often implemented on top of Cairo).

I wrote up a comparison of Forth and PostScript, and some history and links about PostScript and the languages that inspired it (Brian Reid's retrospective he posted to laser-lovers is especially interesting and first-hand) here:

https://news.ycombinator.com/item?id=21968175

>Kragen is right that PostScript is a lot more like Lisp or Smalltalk than Forth, especially when you use Owen Densmore's object oriented PostScript programming system (which NeWS was based on). PostScript is semantically very different and much higher level that Forth, and syntactically similar to Forth but uses totally different names (exch instead of swap, pop instead of drop, etc).

>But its most obvious similarities with Forth are based on generic Computer Science concepts like stacks, which is not something particularly unique to Forth itself, so saying PostScript was influenced by Forth ignores other much more influential languages like Interpress, JaM, and Design System, which directly influenced PostScript and were designed by the same people. Also, PostScript is not threaded like Forth. But it's homoiconic, unlike Forth and [like] Lisp.

[...]

>Brian Reid wrote about page independence, comparing Interpress' and PostScript's different approaches. Adobe's later voluntary Document Structuring Conventions actually used PostScript comments to make declarations and delimit different parts of the file -- it wasn't actually a part of the PostScript language, while Interpress defined pages as independent so they couldn't possibly affect each other:

https://groups.google.com/forum/#!topic/fa.laser-lovers/H3us...

>By now you can probably see the fundamental philosophical difference between PostScript and Interpress. Interpress takes the stance that the language system must guarantee certain useful properties, while PostScript takes the stance that the language system must provide the user with the means to achieve those properties if he wants them. With very few exceptions, both languages provide the same facilities, but in Interpress the protection mechanisms are mandatory and in PostScript they are optional. Debates over the relative merits of mandatory and optional protection systems have raged for years not only in the programming language community but also among owners of motorcycle helmets. While the Interpress language mandates a particular organization, the PostScript language provides the tools (structuring conventions and SAVE/RESTORE) to duplicate that organization exactly, with all of the attendant benefits. However, the PostScript user need not employ those tools.


> Acrobat's "Distillery"

Minor nit, it was called Distiller. I worked on it for about a year, not so much on the core processing code, but on the periphery - user interface, adding an API, etc.

When I first started work on it, one of my new colleagues walked by and asked what I'd be working on. I said "Distiller", and they gave me a sad look, like "you just got the worst assignment here."

> PDF is like PostScript without the programming language, just the imaging model.

Of course PDF does have a programming language, a very old version of JavaScript, with an API that Acrobat/Reader provides.

It's mostly used for simple stuff like calculating and updating one form field from your input in another.

You can tell when you look at the API that it was designed 20-25 years ago.

> (But with a lot of extra crap thrown in so haphazardly and indiscriminately that the Adobe PDF Viewer browser plugin traditionally requires downloading patches almost every day!)

Browsers don't use the Adobe plugin any more, thank goodness.

But on the "haphazard and indiscriminate" part, yes, guilty as charged.

One of my later projects at Adobe was to build a multimedia plugin that could embed any of the popular media players of the day: Real Player, QuickTime, Windows Media Player, one or two others I don't remember, and even Flash.

I built the main plugin which provided a JavaScript API you could access from your PDF and a way to add a "player plugin" for each media player, and teamed up with colleagues to build each of the individual player plugins.

No one at Adobe around 2002 thought Flash was important, so I kind of had to do a stealth minimal implementation.

It was definitely a fun project, and also a great example of the "haphazard and indiscriminate" additions you referred to.


"Distillery" was actually the name of the original PostScript version that Glenn Reid posted to usenet in the 1980's. Here is a copy of his "release 10 of the Distillery" from 10 Mar 1989:

https://donhopkins.com/home/archive/postscript/newerstill.ps...

It was actually John Warnock's original idea, which I learned about from Owen Densmore's paper about using NeWS as a "linguistic motherboard":

https://news.ycombinator.com/item?id=28115946

>Glenn Reid wrote a PostScript partial evaluator in PostScript that optimized other PostScript drawning programs, called "The Distillery". You would send still.ps to your PostScript printer, and then send another PostScript file that drew something to the printer. The first PostScript Distillery program would then partially evaluate the second PostScript drawing program, and send back a third PostScript program, an optimized drawing program, with all the loops and conditionals unrolled, calculations and transformations pre-computed, all in the same coordinate system.

>It was originally John Warnock's idea, that Glenn implemented. And it led to Adobe Acrobat's "Distiller". Acrobat is basically PostScript without the programming language.

[...]

https://news.ycombinator.com/item?id=33827923

>Owen Densmore recounted John Warnock's idea that PostScript was actually a "linguistic motherboard". (This was part of a discussion with Owen about NeFS, which was a proposal for the next version of NFS to run a PostScript interpreter in the kernel. More about that here:)

https://news.ycombinator.com/item?id=30558915

>Glenn Reid described his classic "Grandfather Clock" metaphor to comp.lang.postscript that really opened my eyes about writing code in interpreted languages like PostScript (and JavaScript, while JIT'ed at runtime, still makes calling native code expensive), and balancing optimization with readability. With modern JavaScript/canvas, JavaScript code that calls other JavaScript code runs really fast, but calling between JavaScript and built-in code is still slow, so it's good to have the built-in code do as much as possible when you do (like rendering a long string of text at once, instead of rendering it word by word like many PostScript drivers would):

>Glenn's post in a comp.lang.postscript discussion about PostScript programming style and optimization:

https://groups.google.com/forum/#!search/%22glenn$20reid%22$...

    From: Glenn Reid (Abode Systems)
    Newsgroup: comp.lang.postscript
    Subject: Re: An Idea to Help Make Postscript Easier to Read (and Write)
    Date: 10 Sep 88 17:26:24 GMT

    You people tend to forget that the PostScript language is interpreted.
    It is well and good to use tools to convert to and from PostScript,
    but it is not quite as "transparent" as we all might think.
    I like to think of a big grandfather clock, with the pendulum swinging.
    Each time pendulum swings, the PostScript interpreter gets to do one
    operation.  The "granularity" of the clock is nowhere near the speed
    of a microprocessor instruction set, and any comparison with assembly
    languages doesn't make sense.

    The difference between:

            0 0 moveto

    and

            0 0 /arg2 exch def /arg1 exch def arg1 arg2 moveto

    can sort of be measured in "ticks" of the interpreter's clock.  It's
    not quite this simple, since simply pushing a literal is faster than
    executing a real PostScript operator, but it is a rough rule of thumb.
    It will take about three times as long to execute the second of these
    in a tight loop, and about five times as long if it is transmitted and
    scanned each time.  My rule of thumb is that if you have roughly the
    same number of tokens in your stack approach as you do with your 'exch
    def' approach, the 'exch def' is likely to be much more readable and
    better.  Otherwise, I usually go with the stack approach.

    One other thing of note is that if you have too much stack manipulation
    going on, it may well be symptomatic of a problem in the original program
    design.

    Also, most procedures don't do any stack manipulation at all, they
    simply use their arguments directly from the stack.  In this situation,
    it is especially wasteful (and confusing, I think) to declare
    intermediate variables.

    Compare:

    % sample procedure call:
            (Text) 100 100 12 /Times-Roman SETTEXT

    % approach 1:
            /SETTEXT { %def
                findfont exch scalefont setfont moveto show
            } def

    % approach 2:
            /SETTEXT { %def
                /arg5 exch def
                /arg4 exch def
                /arg3 exch def
                /arg2 exch def
                /arg1 exch def
                arg5 findfont arg4 scalefont setfont
                arg2 arg3 moveto arg1 show
            } def

    Which of these is easier for you to understand?

    Anyway, I think the discussion is a good one, but let's not forget
    that PostScript it is an interpreted language.  And I don't think
    it is terribly hard to use and understand, if it is written well.

    Glenn Reid
    Adobe Systems


> "Distillery" was actually the name of the original PostScript version that Glenn Reid posted to usenet in the 1980's.

Of course! I would never dispute that, and thank you for all the interesting historical details.

I was only saying that when I worked on it some 10 years later, the "y" had disappeared from the name, and it was called Acrobat Distiller.

https://helpx.adobe.com/acrobat/using/creating-pdfs-acrobat-...


i've been fiddling with a series of prototype interpreters for a simple stack machine over the last year or so, and i've come to a few tentative conclusions:

1. it's probably a win to implicitly stick the arguments to a subroutine into local variables on subroutine entry, as smalltalk-80 and elisp and java do, rather than letting the subroutine do that itself if it wants to, as forth and postscript and hp calculators do. this is mostly for convenience and readability, but it also affects efficiency and safety. in the case of a straightforward stack interpreter, each stack operation takes about the time of 10 machine instructions. for example, in http://canonical.org/~kragen/sw/dev3/hadixmv.S, the addition bytecode instruction is implemented by the following arm assembly code

        ldmdb r4!, {r3} @ cargar segundo item de la pila r4 en r3 temporario
        add r5, r5, r3  @ sumar número cargado al primero en pila
        dispatch
here `dispatch` is an assembly macro that expands (in my un-checked-in version) to

        ldrb r3, [r6], #1         @ cargar próxima instrucción, post-index
        ldr pc, [r7, r3, lsl #2]  @ indexar tabla de bytecodes apuntada por r7
so this is four instructions, but three of them access memory, making them slower (on my target platform), and the last one usually causes a pipeline bubble, so this takes typically 10 clocks on my target chip

a few bytecode instructions are a bit faster than this, a few are much slower, and the average seems to come out to about 15 clocks, while an average instruction is maybe closer to 1.2 clocks, depending on your code. so there's a strong temptation to do anything that will allow you to waste less time in bytecode dispatch

i think glenn is strawmanning a bit here with his arg5. i think you'd write something like

    /settext [/text /x /y /size /font] {
      font findfont  size scalefont  setfont
      x y moveto  text show
    } fn
that's still 10 calls inside the subroutine instead of 6, but glenn only got down to 6 because he put the arguments in almost the most convenient order

2. probably a super dumb jit compiler, if well optimized, will never be much slower than a straightforward bytecode interpreter, and in common cases will be about three times faster, but the overall system performance will still be about three times slower than well-optimized assembly. and i can write the compiler in the language i'm implementing (though i can't optimize it that way) which is a lot more convenient than writing it in assembly. even arm assembly, which is super sweet. for this reason, for many years now, the jvm 'interpreter' (what it uses for interpretation of everything that hasn't yet been found to be a hotspot) is a super dumb jit compiler and not actually an interpreter

this tends to undermine the performance argument for automatically creating local variables like this, but i think not completely

3. stack machines have more compact code than register machines, but because you need about twice as many operations to do a given job, the interpreter dispatch overhead is twice as expensive. my primary objective is code density, and although i haven't compiled anything big yet, i think i can get about 2/3 the code size of thumb-2 or rv32c. but a register-based code is probably better when code density isn't the primary objective


Don, on any Lisp or Forth or other post on niche languages on this site, there is a long comment from you going into deep detail about its history and evolution.

Your wealth of knowledge is incredible, and it is thanks to hackers like you that I've been spending the past year rediscovering the roots of computing, the wonders of Lisp, Smalltalk and, this past week in particular, Forth ( HONK ) .

These days my ~/Desktop and printer spool are full of papers from the 70s and 80s that, compared to the clunky world of Javascript and UNIX clones, feel like science fiction.

So thank you for broadening our horizons, and teaching code monkeys like me the lost art of programming computers. Your contributions are invaluable.


Don is a veritable walking encyclopedia of whole chapters of computer history. Priceless.


Don, is there anything of NeWS or other toolkits that would work on modern-day X11 on Linux/BSDs?

Would a Display PostScript (like NeXTStep or the Solaris provided X11 server had) be needed to run NeWS?


Yeah, I've been working on Noticias, which is an implementation of NeWS in Typescript targeting the browser.

Initially, I used HTML5 Canvas to draw, but I'm now in the process of switching over to SVG. Once I have demos that I can show, I'll be making the repo public.

If you're interested in working it, please reach out to me, duncanmak@gmail.com.


Check out Duncan Mac's "Noticias - an implementation of NeWS for the Web":

https://github.com/duncanmak/noticias


Hi Don, terribly sorry but that URL doesn't load for me - does it load for you?


Whoops, it must be a private repo! I'll encourage him to announce it here when it's ready.


I once wrote a postscript script that took in dimensions and generated CAD drawings. It was quite fun!


Just fyi, typo on this line:

>it this case, it’s a built-in


Related:

My history with Forth and stack machines (2010) - https://news.ycombinator.com/item?id=26203518 - Feb 2021 (25 comments)

My history with Forth and stack machines (2010) - https://news.ycombinator.com/item?id=21153555 - Oct 2019 (33 comments)

My history with Forth and stack machines (2010) - https://news.ycombinator.com/item?id=8869150 - Jan 2015 (26 comments)

My history with Forth and stack machines (2010) - https://news.ycombinator.com/item?id=8146306 - Aug 2014 (17 comments)

My History With Forth And Stack Machines - https://news.ycombinator.com/item?id=3963896 - May 2012 (11 comments)


I always recommend the book Starting Forth [0].

It has the most charming illustrations I've ever seen in a text book.

[0] https://www.forth.com/starting-forth/


Also Thinking Forth by Leo Brodie is an excellent style and philosophy book!

https://thinking-forth.sourceforge.net/

http://prdownloads.sourceforge.net/thinking-forth/thinking-f...


Is there any reason stack-based CPUs fell out of favor?

I'm fascinated by RTX2010 [0], a radiation-hardened processor which uses Forth and has been deployed on several space missions.

[0] https://en.wikipedia.org/wiki/RTX2010


I've been reading into this today coincidentally. It seems the main idea was that register based machines are faster, mostly due to the use of local variables in many languages mapping to register and memory latency when the stack is in memory. But that seems to be false based on more recent research.

This is a good read on the subject: https://uwspace.uwaterloo.ca/bitstream/handle/10012/10810/La...

Also, check out the GA144: https://www.greenarraychips.com/

Also, it's not clear a super scalar stack machine would be possible. So it might not be suitable for replacing x86 soon ;)


> Also, it's not clear a super scalar stack machine would be possible.

The "Boost: Berkeley's Out-of-Order Stack Thingy" (Steve Sinha, Satrajit Chatterjee and Kaushik Ravindran) [1] effectively translates stack code into RISC as part of instruction decoding and issue. This is similar to how "Design of a Superscalar Processor Based on Queue Machine Computation Model" (Shusuke Okamoto, Hitoshi Suzuki, Atusi Maeda, Masahiro Sowa) [2] does the same thing but for a queue machine instead of a stack machine.

[1]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

[2]: https://ieeexplore.ieee.org/document/799499


Is the "belt" of the Mill CPU a queue architecture or is a queue architecture something else?


Yes. The Mill's belt is a queue. The Mill is a queue machine.


Interesting, thank you! More for the reading list :)


Can you give some insight on how the J1 fits in all this? The presentation explaining what drove its invention suggests it actually was faster than the register-based alternative, although I'm sure that is also partially because it only focused on the instructions needed for its task.

[0] https://excamera.com/sphinx/article-j1a-swapforth.html

[1] http://www.forth.org/svfig/kk/11-2010-Bowman.pdf


From how I understand the presentation, they ran into trouble because of code size. Basically stack code is much denser because you don't need arguments for registers and, given the 16kbytes -> 6 kbytes is more than just half from going 32->16bit, appearantly a bit more efficient for implementing their program too I guess.

The size was a concern because of the limited size (16kbyes) of bram modules on an fpga.

I can't say why it performs so much better though, maybe just because it's simpler & 16 bit so it's smaller and can be clocked faster on the fpga? That is just a guess though.

I'm not sure if it is fair to compare the J1 to the BOOM, the latter does a lot more I'd think. That being said, it shows that for such use cases a stack machine can be better suited and much less complex to design, maintain and work with.


the summary of laforest's dissertation says that a small stack machine is much faster (than a presumably in-order MIPS) "for deeply nested or recursive code", but "worse for iterative code"

quite aside from the new results it presents, it seems to be a much more comprehensive answer to the question of why stack machines fell out of favor than i've seen before


I've definitely seen that explanation before. I remain slightly skeptical, but that's mostly because I secretly hope stack machines could "win" some day.


i mean the rest of the dissertation, not its abstract


Compared to a register machine a stack machine needs to execute extra instructions to shuffle the stack around. If you tried to do something clever to make this superscalar you'd end up with something very much like shadow registers anyhow.

I too found Charle's Moore's cpu designs interesting. One possibility I thought about for a bit was a VLIW stack hybrid, where you'd have bundles of say 8 stack instructions each executing concurrently in their own lane, and some instructions for moving values between lanes. It'd be a weird thing to program for, but potentially quite fast for how minimal it'd be.


Here's Chuck Moore talking about the 144-core CPUs his startup is developing: https://youtu.be/0PclgBd6_Zs

That's an incredible design I would love to get my paws on.


Oh thanks, I'll check that out.

I was musing about this stuff back when reading Hennessy and Patterson in the early 00's. Moore's homepage was really interesting and inspiring, that someone could work idiosyncratically and largely individually and actually get chips to tape out. Cool to hear he's still going strong.


you can just order a ga144 from them last i checked


WASM is a stack based "CPU" and arguably the future of the Web. So is Python's. I think also Java and Erlang, but don't quote me on that. A stack based CPU is very easy to implement (or rather, registers are very complex hardware-wise) and a stack-based VM is relatively easy to optimise for a register CPU.

Seems to me registers are a fancy hardware implementation detail, that software at higher abstraction can and should avoid unless maximum performance is required. It was surprising for me to learn recently that on modern x86 CPUs, named registers actually reference hidden, dynamically allocated physical registers, a bit like virtual memory is an indirection layer over physical memory.

It's levels of indirection all the way down...


Here's some ideas about WASM forth I wrote about previously:

https://news.ycombinator.com/item?id=34374057

>This is a great article! I love his description of Forth as "a weird backwards lisp with no parentheses".

>Reading the source code of "WAForth" (Forth for WebAssembly) really helped me learn about how WebAssembly works deep down, from the ground up.

>It demonstrates the first step of what the article says about bootstrapping a Forth system, and it has some beautiful hand written WebAssembly code implementing the primitives and even the compiler and JavaScript interop plumbing. We discussed the possibility of developing a metacompiler in the reddit discussion.

[...]

Check out this beautiful hand written wasm code, a dynamic wasm based Forth compiler:

https://github.com/remko/waforth/blob/master/src/waforth.wat

It compiles new Forth words into teenie tiny little wasm modules, and hooks out through JavaScript in the browser to create and link them in to the main module, so it can call the code it just compiled. Wasm doesn't let you modify compiled code, but you can compile new code and link it in and execute it!


I’ve been wrestling with WASM bytecode and text format of and on for the last couple years for want of doing something assembly-ish but not wanting to write off doing it for the web.

Personally, I’d say it’s at least as much of a register machine as a stack machine and ways in which it is one vs the other at times sure had me running in circles.

I’ve seen plenty of quotes about how it’s not supposed to be a real assembly, how you’re not supposed to deal with its text format or bytecode, and how it and its documentation are for compiler writers not typical programmers.

Which is exactly the take that I think is so wrong with our status quo of opaque, overly abstracted technologies that are highly optimized in their parts yet so disappointing in the whole.

As a Forth substitute, I was sorely disappointed. I did learn a lot in my wanderings, and what WebAssembly is mostly is a an implementation to run a C architecture. Which is pretty much what it came from. Not that I’m against that, but I sure ran in circles trying to make it more Forth-like.

And given it’s raison d’etre to be more performant vs Javascript it seems to be faster in some benchmarks while slower in others. I think this is more a testament about how optimized V8 and its brethren are then a slam on WebAssembly.


I was hyped for WASM as well, but the more I read the spec and standardisation process, the less I am bullish about this as a long-term piece of tech. I would be surprised if it is still around in 30 years (compared to stuff like UNIX, C or even Forth)

It's closeness to Javascript and the fact that it's basically managed by companies that want to keep Web as complicated as possible so they can enjoy their monopoly on it, turns me off big time.

And lastly, the only thing it has in common with Forth, is being stack-based. Computing has become an unsustainable pile of buggy abstractions on top of one another, and WASM is simply another layer on top this pile of dung, rather than a return to the basics. I get why it wants to give programmers nothing more than a rounded pair of scissors, but then for a glorified VM it's incredibly over-engineered. Art has never been created by a committee of commercial entities.

I had in mind to write a OS that runs WASM natively, but these days I'd rather put a persistent, programmable Forth interpreter that gives you full access to the hardware at the lowest level, and build up from there.


JVM, Python and WASM are more like the SECD [0], not a plain stack machine, by use of a local environment. BEAM (for Erlang) is a register machine. And the JVM and WASM get compiled to code for register machines, with the VM stack not present (in a concrete form) at runtime.

> a stack-based VM is relatively easy to optimise for a register CPU.

You get SSA for almost free (just phi the stacks at control joins), but after making SSA everything is the same regardless of stack- or register- VM.

[0] https://academic.oup.com/comjnl/article/6/4/308/375725?login...


So the basic principle is that stack machines have smaller instructions, but require extra instructions to shuffle values around on the stack. This is why register machines won out with hardware, and stack machines are popular for virtual machine intermediate representations (that get compiled or JIT down to register code in the end on basically all machines today).

If you wanna learn about how register renaming works, look into Tomasulo's algorithm. State of the art chips are a lot more complex, but that's a good historic introduction to the fundamental ideas.


I worked for a company that was building a prototype using the predecessor (the 'NOVIX'), which was incredibly fast for the day.


I really wanted one of those! I held one in my hands one time when I was visiting the MIT-AI Lab -- it was Norman Margolus's, who made the CAM-6 with Tommaso Toffoli, which was programmed in Forth. He wouldn't let me have it, but he let my friend and I borrow a CAM-6 for a weekend science fiction convention to trip out on.

https://news.ycombinator.com/item?id=8860786

https://news.ycombinator.com/item?id=14469113

https://news.ycombinator.com/item?id=34561910

Rudy Rucker writes about his CAM-6 in the CelLab manual:

http://www.fourmilab.ch/cellab/manual/chap5.html

Computer science is still so new that many of the people at the cutting edge have come from other fields. Though Toffoli holds degrees in physics and computer science, Bennett's Ph.D. is in physical chemistry. And twenty-nine year old Margolus is still a graduate student in physics, his dissertation delayed by the work of inventing, with Toffoli, the CAM-6 Cellular Automaton Machine.

After watching the CAM in operation at Margolus's office, I am sure the thing will be a hit. Just as the Moog synthesizer changed the sound of music, cellular automata will change the look of video.

I tell this to Toffoli and Margolus, and they look unconcerned. What they care most deeply about is science, about Edward Fredkin's vision of explaining the world in terms of cellular automata and information mechanics. Margolus talks about computer hackers, and how a successful program is called “a good hack.” As the unbelievably bizarre cellular automata images flash by on his screen, Margolus leans back in his chair and smiles slyly. And then he tells me his conception of the world we live in.

“The universe is a good hack.”

[...]

Margolus and Toffoli's CAM-6 board was finally coming into production around then, and I got the Department to order one. The company making the boards was Systems Concepts of San Francisco; I think they cost $1500. We put our order in, and I started phoning Systems Concepts up and asking them when I was going to get my board. By then I'd gotten a copy of Margolus and Toffoli's book, Cellular Automata Machines, and I was itching to start playing with the board. And still it didn't come. Finally I told System Concepts that SJSU was going to have to cancel the purchase order. The next week they sent the board. By now it was August, 1987.

The packaging of the board was kind of incredible. It came naked, all by itself, in a plastic bag in a small box of styrofoam peanuts. No cables, no software, no documentation. Just a three inch by twelve inch rectangle of plastic—actually two rectangles one on top of the other—completely covered with computer chips. There were two sockets at one end. I called Systems Concepts again, and they sent me a few pages of documentation. You were supposed to put a cable running your graphics card's output into the CAM-6 board, and then plug your monitor cable into the CAM-6's other socket. No, Systems Concepts didn't have any cables, they were waiting for a special kind of cable from Asia. So Steve Ware, one of the SJSU Math&CS Department techs, made me a cable. All I needed then was the software to drive the board, and as soon as I phoned Toffoli he sent me a copy.

Starting to write programs for the CAM-6 took a little bit of time because the language it uses is Forth. This is an offbeat computer language that uses reverse Polish notation. Once you get used to it, Forth is very clean and nice, but it makes you worry about things you shouldn't really have to worry about. But, hey, if I needed to know Forth to see cellular automata, then by God I'd know Forth. I picked it up fast and spent the next four or five months hacking the CAM-6.

The big turning point came in October, when I was invited to Hackers 3.0, the 1987 edition of the great annual Hackers' conference held at a camp near Saratoga, CA. I got invited thanks to James Blinn, a graphics wizard who also happens to be a fan of my science fiction books. As a relative novice to computing, I felt a little diffident showing up at Hackers, but everyone there was really nice. It was like, “Come on in! The more the merrier! We're having fun, yeeeeee-haw!”

I brought my AT along with the CAM-6 in it, and did demos all night long. People were blown away by the images, though not too many of them sounded like they were ready to a) cough up $1500, b) beg Systems Concepts for delivery, and c) learn Forth in order to use a CAM-6 themselves. A bunch of the hackers made me take the board out of my computer and let them look at it. Not knowing too much about hardware, I'd imagined all along that the CAM-6 had some special processors on it. But the hackers informed me that all it really had was a few latches and a lot of fast RAM memory chips.


Toffoli and Margolus wrote a fine book about things you could do with their machine. It was more fun to read than Wolfram's later "New Kind of Science".


"Cellular Automata Machines: A New Environment for Modeling" is one of my favorite books of all time! It shows lots of peculiarly indented Forth code.

https://donhopkins.com/home/cam-book.pdf

CAM6 Simulator Demo:

https://www.youtube.com/watch?v=LyLMHxRNuck

Forth source code for CAM-6 hardware:

https://donhopkins.com/home/code/tomt-cam-forth-scr.txt

https://donhopkins.com/home/code/tomt-users-forth-scr.txt


Thanks for hosting those.

This isn't a CAM simulator, but I once wrote a little hack of a CA playground inspired by it. Table-driven in a similar way, user-scriptable with JS, includes neighborhoods like the Margolus neighborhood. https://github.com/darius/js-playground/blob/master/ca.js and the associated ca.html.


Neat! Did you know that Rucker is also an SF writer who wrote SF books about cellular automata? If not: check out 'the hacker and the ants'.


I love his work, both SF and math and CA! I met Rudy at the Hackers conference decades ago and gave him a demo of an early version of my CAM6 simulator running on my SparcStation 2, and he showed me Cellab that he and John Walker developed at Autodesk, and we've been exchanging ideas about CA since then. A few years ago he pointed out a very subtle bug in my CAM6 heat diffusion code that I fixed.

Flickercladding:

https://www.fourmilab.ch/cellab/manual/rules.html#Flick

>Flick is named after “flickercladding,” the CA skin which covers the robots in my books Software and Wetware. In Flick, we see an AutoShade®d office whose rug is made of flickercladding that runs the TimeTun rule. You can tell which parts of the picture are “rug” because these cells have their bit #7 set to 1.

        Flickercladding Interior Decoration

        Conceived by Rudy Rucker
        Drawn by Gary Wells
        Modeled with AutoCAD
        Rendered by AutoShade
        Perpetrated by Kelvin R. Throop.

        In this rule, we only change the cells whose high
        bits are on. These cells are updated according to
        the TimeTun rule.
http://www.technovelgy.com/ct/content.asp?Bnum=299

>Some looked humanoid, some looked like spiders, some looked like snakes ...All were covered with flickercladding, a microwired imipolex compound that could absorb and emit light.

https://en.wikipedia.org/wiki/Wetware_(novel)

>The plot goes disastrously awry, and a human corporation called ISDN retaliates against the boppers by infecting them with a genetically modified organism called chipmold. The artificial disease succeeds in killing off the boppers, but when it infects the boppers' outer coating, a kind of smart plastic known as flickercladding, it creates a new race of intelligent symbiotes known as moldies — thus fulfilling Berenice's dream of an organic/synthetic hybrid.

https://news.ycombinator.com/item?id=15546769

DonHopkins on Oct 25, 2017 | parent | context | favorite | on: Boustrophedon

The Floyd Steinberg error diffusion dithering algorithm can use a boustrophedonous scan order to eliminate the diagonal geometric artifacts you get by scanning each row the same direction.

https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dither...

"In some implementations, the horizontal direction of scan alternates between lines; this is called "serpentine scanning" or boustrophedon transform dithering."

I implemented some eight bit cellular automata heat diffusion rules with error diffusion, which accumulated an unfortunate drift up and to the right because of the scan order.

Rudy Rucker pointed out the problem:

https://web.archive.org/web/20180909074032/http://donhopkins...

"Rudy Rucker: I feel like you might have some kind of bug in your update code, an off-by-one thing or a problem with the buffer flipping. My reason is that I see persistent upward drift in the action, like if I mouse drag a blob it generally moves up. Also the patterns appearing in the blob aren't uniform. I mean...this IS supposed to be the 2D Rug rule, isn't it?"

So instead of scanning back and forth boustrophedoniously (which wouldn't eliminate the vertical drift, just the horizontal drift), I rotated the direction of scanning 90 degrees each frame ("spinning scan") to spread the drift out evenly in all directions over time.

https://github.com/SimHacker/CAM6/blob/master/javascript/CAM...

    // Rotate the direction of scanning 90 degrees every step,
    // to cancel out the dithering artifacts that would cause the
    // heat to drift up and to the right.
That totally canceled out the unwanted drifting and geometric dithering artifacts! That made it possible to cultivate much more subtle (or not-so-subtle) effects, like dynamically switching per-cell between different anisotropic convolution kernels (see the "Twistier Marble" rule for an extreme example).

http://donhopkins.com/home/CAM6

CAM6 Demo:

https://www.youtube.com/watch?v=LyLMHxRNuck

Demo of Don Hopkins' CAM6 Cellular Automata Machine simulator.

Live App: https://donhopkins.com/home/CAM6

Github Repo: https://github.com/SimHacker/CAM6

Javacript Source Code: https://github.com/SimHacker/CAM6/blob/master/javascript/CAM...

Comments from the code:

    // This code originally started life as a CAM6 simulator written in C
    // and Forth, based on the original CAM6 hardware and compatible with
    // the brilliant Forth software developed by Toffoli and Margolus. But
    // then it took on a life of its own (not to mention a lot of other CA
    // rules), and evolved into supporting many other cellular automata
    // rules and image processing effects. Eventually it was translated to
    // C++ and Python, and then more recently it has finally been
    // rewritten from the ground up in JavaScript.
    // The CAM6 hardware and Forth software for defining rules and
    // orchestrating simulations is thoroughly described in this wonderful
    // book by Tommaso Toffoli and Norman Margolus of MIT.
    // Cellular Automata Machines: A New Environment for Modeling
    // Published April 1987 by MIT Press. ISBN: 9780262200608.
    // http://mitpress.mit.edu/9780262526319/


did you ever try the cytocomputer

(i think your text means that rucker tried the cam-6, not that you did, but i'm not entirely sure)


No I haven't heard of that -- I'll check it out! Thanks for the juicy tip.

I played with the CAM-6 in Norman Margolus's office at MIT, and a friend of mine who worked for him brought one to a science fiction convention where we tripped out on it all night in a hotel room! I saved a copy of the floppies full of Forth code. (linked above)


it might be a little hard to check it out 45 years later but i bet chip morningstar would be happy to tell you about it


register machines need to run about half as many instructions on average to do the same thing, which means that at the same clock speed a 1-cycle-per-clock cpu will go about twice as fast with a register instruction set. countering that is that the stack machine doesn't need multiplexers on the inputs to the alu, so it may actually be able to run at a higher clock speed if those multiplexers were part of the critical path, and the stack machine will occupy less silicon area, so you may be able to have more of them (but usually ram takes a lot more area than your cpu)

for out-of-order and superscalar processing, register machines have an advantage in that you can more easily have successive instructions that don't interfere, simply by not using the same registers. generally in a stack machine instruction it's hard to avoid using the result of the previous instruction; you need an extra instruction to do it (`over` or `swap` or something)

but i suspect it's partly just path-dependence: koopman's new wave of stack machines were already struggling upwind against 30 years of ibm 360, intel 8080, dec pdp-11, motorola 68000, dg nova, sun/fujitsu sparc, etc., all register machines, and so a rich store of knowledge had built up about them in compiler backends, assembly programmers, and cpu designers. maybe if chuck had designed the rtx2010 in 01973 instead of 01988 the story would have been different

maybe worth noting that all of the jvm, microsoft's clone of it (the cil), and wasm (as sph pointed out) are stack-based, and arm for a while had a 'jazelle' instruction set which interpreted many jvm bytecodes directly (trapping to software for more complex ones), so in a sense stack-based programs are more widespread now than they ever were before; that's what you see inside a .class file. they just are usually translated into some other instruction set before execution


in https://news.ycombinator.com/item?id=37025644 boesboes links to https://uwspace.uwaterloo.ca/bitstream/handle/10012/10810/La... which is a bachelor's of independent studies thesis from several years back which covers the history of stack machines in considerable detail, including, most fascinatingly, zuse's z4, with its 64-word memory, 8-bit instruction tape, and 2-level stack

also several people have pointed out that the stack machines most commonly used in practice, including the ones i mentioned above, are hybrids; they have architectural registers (often called local variables) but you have to copy them onto the stack to operate on them


I really ought to read that. The author was a livejournal friend at the time. (Later work: http://fpgacpu.ca/)


hey, good to hear from you


Will answer email (sorry), I haven't forgotten.


no apology necessary, i'm delighted to hear from you whenever


For people getting a 503: https://archive.li/xh0aq


Netflix's Altas, a telemetry services, uses a stack language for querying metrics. The language has a relatively steep learning curve, but once one groks it, it is surprisingly easy to use to compose complex OLAP queries.


If anyone else is interested, https://netflix.github.io/atlas-docs/overview/#stack-languag... is the stack language for querying.


Interesting, I’ll check it out.

One intersection insight I had with stack languages is that they might lend themselves to an AST-like visual approach. The tree <-> postfix duality was an interesting aha for me that I’d long forgotten since that sophomore data structures class. :)


This looks very similar to RRDTool's query language. Is this a fork, or was it just inspired by RRDTool?


In that case, probably a fork, as Netflix used rrdtools extensively before Atlas.


I've started building a small desk clock project based on a Rockwell R65F11 embedded FORTH processor, in an effort to force myself to actually do a nontrivial project in FORTH. It's not progressing quickly, but it's also not a priority. Several folks whose opinions matter to me were big FORTH hackers in the late 70s thru the 90s, so I figure there must be something there worth hacking on!


My impression so far is (in general), Forth are practically limited to doing embedded/microcontroller development.

For us, web/mobile/desktop app devs, beside:

- 8th (https://8th-dev.com)

- Factor (https://factorcode.org)

Any suggestion which implementation we should look for?


    while(true) {
     switch(*ip) {
      //arithmetics (+,-,*...):
      case PLUS: ds.push(ds.pop() + ds.pop()); ++ip;
      //stack manipulation (drop,swap,rot...):
      case DROP: ds.pop(); ++ip;
    ...
Hmm... shouldn't each case contain a break? It doesn't seem like a fall through is a good idea here.


Well it's not stated to be C, and it isn't anywhere near complete.

I read it more as C ish pseudo code that just so happens to be 'valid' C.


No. This is a kind of duff’s device[0] although it is hard to identify. In essence, it’s attempting to manually unroll the loop’s - the while(true) - switch cases, so that if you have a PLUS then a DROP right after it can just run linearly(that is, fall through to case DROP), not having to go back to the top of the loop to go back into the switch statement and then check that ip != PLUS to get to case DROP, etc.

0: https://en.wikipedia.org/wiki/Duff%27s_device


You are probably over-interpreting this interpreter.


Mitch Bradley's CForth has an infinite while loop as the inner interpreter with a switch statement inside, that uses a macro "#define next continue" to jump to the next instruction in the infinite while loop, instead of break.

https://github.com/MitchBradley/cforth/blob/master/src/cfort...

Mitch's CForth is about as efficient (and well designed and refined) a Forth interpreter you can possibly implement in portable C.

Mitch's OpenFirmware is a more traditional industrial strength Forth implementation, extremely advanced, feature rich, and well designed, that runs on many different processors and hardware, has assemblers for different CPUs, and a metacompiler that supports many different CPUs, word sizes, byte orders, threading techniques, top level interactive control structures, headerless code, machine independent bytecode, read-only firmware, portable device drivers, hardware and driver configuration, and many hardware platforms.

https://github.com/MitchBradley/openfirmware

This is an implementation of Open Firmware, a processor-independent firmware design. Open Firmware is specified by IEEE 1275-1994, Standard for Boot (Initialization, Configuration) Firmware. The IEEE standard designation lapsed in 1999, when the Open Firmware working group declined to go through the rather time-consuming IEEE reaffirmation process, but the design lives on.

This implementation was written primarily by Mitch Bradley, the original author of Open Firmware. The bulk of the recent development was done at Firmworks by Bradley and colleagues. It traces its roots back to the original Open Boot firmware implementation that Bradley developed at Sun Microsystems in the late 80s and early 90s. That in turn had roots in Forthmacs, a Forth programming language implementation developed and marketed during the early 80s by Bradley Forthware. And Forthmacs, in turn, owes a debt of gratitude to the public domain Forth implementations F83, developed by Michael Perry, and MVP Forth, by Glen Haydon. Lilian Walter wrote the USB stack.

https://news.ycombinator.com/item?id=29279022

DonHopkins on Nov 19, 2021 | parent | context | favorite | on: Forth vs Lisp

The inner interpreter of Mitch Bradley's portable and efficient CForth is simply implemented with a big switch statement, in the function called "inner_interpreter".

The C compiler typically compiles that code with the big switch statement into an efficient jump table, behind the scenes.

https://github.com/MitchBradley/cforth/blob/master/src/cfort...

Properly speaking, that's a token threaded interpreter, since it switches on token numbers in the code field, instead of indirecting function through pointers to the C equivalent of "code" words.

https://en.wikipedia.org/wiki/Threaded_code#Token_threading

https://news.ycombinator.com/item?id=27484426

DonHopkins on June 12, 2021 | root | parent | next [–]

From https://news.ycombinator.com/item?id=21823081

I wrote malloc.fth for Mitch Bradley's ForthMacs, which ended up in OpenFirmware:

https://github.com/openbios/openfirmware/blob/master/ofw/cor...

[...]

https://news.ycombinator.com/item?id=21822840

DonHopkins on Dec 18, 2019 | parent | context | favorite | on: Thoughts on Forth Programming

Here's the interview with Mitch Bradley saved on archive.org:

https://web.archive.org/web/20120118132847/http://howsoftwar...

I've previously posted some stuff about Mitch Bradley -- I have used various versions of his ForthMacs / CForth / OpenFirmware systems, and I was his summer intern at Sun in '87!

Mitch is an EXTREMELY productive FORTH programmer! He explains that FORTH is a "Glass Box": you just have to memorize its relatively simple set of standard words, and then you can have a complete understanding and full visibility into exactly how every part of the system works: there is no mysterious "magic", you can grok and extend every part of the system all the way down to the metal. It's especially nice when you have a good decompiler / dissassembler ("SEE") like ForthMacs, CForth, and OpenFirmware do.

https://news.ycombinator.com/item?id=9271644

[...]


I spent a year at Harris Semi’s RTX group. A previous gig as a Forth programmer plus undergrad work in assembly programming came in handy.

I really wish RTX had caught on with commercial clients, but the lack of tools made it tough for most folks.


Anyone looked at WASM, especially its text code format?

I feel like it so badly wants to look something like forth but they couldn't make themselves do it. It would be so much more concise.


There was a wasm predecessor (by, well, me) with a bit more of the Forth flavor: https://github.com/darius/idel

From >20 years ago and my main regret is a failure to carry it through to be more than a demo.


They wanted to make it low friction to learn for their audience who are largely javascript programmers. I think it's a reasonable tradeoff even if I personally would prefer a more combinator style syntax.


Was that really the intent? I made a longer rant elsewhere in this thread but I sure got the impression it was really for compiler implementers and not for anyone really writing in it at all. It’s sort of a false Lisp and a false Forth when I really should have not been afraid of the ARM vs x86 split and brushed up on a real assembly. :)


WASM grew out of the earlier asm.js, which was explicitly designed as a subset of javascript: http://asmjs.org/spec/latest/ The core idea there was to identify a subset of JS that was high performance on browsers and could be used as a compile target. WASM appeared because once asm.js got some momentum, people naturally asked "hey if browsers could implement just a couple extra things how much better could we make this?"


wasm isn't compatible with asm.js or with js tho

they could have totally made it forthier if they thought that was a good idea, but their objective was rapid generation of efficient, secure code


Except that would be strongly against the familiarity of the community momentum already built.


what they actually came up with was just as strongly against the familiarity of the community momentum already built, wasn't it?

what features do you have in mind of the current wasm spec that are (1) in common with asm.js and (2) not in common with a hypothetical more-forth-like spec for efficient secure execution in browsers?


Is the pool of JS programmers who are just going to casually learn in assembly language - and are totally cool with s-expressions - really that large?

Seems like a waste of time to make your text format 3 times longer than it should be to cater to these unicorns.


I think the target audience probably has no choice. They need performance, but their background might be web.


I guess my point is there minds are going to be melted anyway by 1. low level details 2. a stack machine and 3. s-expressions. might as well use a text format that at least fits the domain.


Found in yesterday's https://ftp.funet.fi submission.

1984. Made with lex.

https://ftp.funet.fi/pub/languages/forth/c-forth.tar.Z


The most modern up-to-date version of Mitch Bradley's C-Forth is here:

https://github.com/MitchBradley/cforth

As Mitch's summer intern at Sun in 1987, I used CForth to make an interactive extension language for CADroid (a printed circuit design program from Lucasfilms that Sun used internally).

https://uspto.report/TM/73486026

Oh cool, the CADroid trademark's canceled -- that would be nice to recycle for a cellular automata robot!


i think i finally got to the point where forth was usable for me in 02019

https://dercuano.github.io/notes/forth-assembling.html#addto...

all it really took was for me to stop using stack manipulation operators and treat forth as an interactive user interface rather than a programming language


That's an incredible blog! I have two wishes now: that you would continue to write your notes because I've skimmed a few and they were deeply interesting to me, and that you would add an RSS feed, so I can add it to Feedbin.

But in any case, thanks. Lot of interesting ideas in that article.


glad you like it! it's more a book than a blog; its contents are final and immutable, so an rss feed would be of very limited interest. the sequels so far are derctuo and dernocua




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: