
The J1 Forth CPU (2010) - panic
http://excamera.com/sphinx/fpga-j1.html
======
Animats
Forth machines have been around since Moore designed one in the mid-1980s.[1]
He used to have a development system where the only input was three buttons,
used in a chorded fashion, and the CPU was generating NTSC video in software.
Now that's minimalism.

In the first hardware implementation, integer divide had been botched, and
gave wrong answers for odd-numbered divisors. There was an argument in the
data sheet that most divisors are even, and a subroutine for correct divides
when needed.

[1] [http://www.cpushack.com/2013/02/21/charles-moore-forth-
stack...](http://www.cpushack.com/2013/02/21/charles-moore-forth-stack-
processors/)

------
dewster
Or try my Hive soft processor core:

[http://opencores.org/project,hive](http://opencores.org/project,hive)

I need to update the paper and code, the core is at v9.03 and the simulator
can now parse assembly code as input.

IMO canonical stack processors aren't a good substitute for register-based
processors. Hive is a stack / register hybrid that makes the 2 operand
architecture more efficient. And I don't know why barrel processors aren't
taking over the world - they make too much sense I suppose.

~~~
david-given
About a year ago we had this thread:

[https://news.ycombinator.com/item?id=9920760](https://news.ycombinator.com/item?id=9920760)

It's a thesis describing a C compiler for stack machines. (The link in the
article is dead now. See [https://www-users.cs.york.ac.uk/chrisb/main-
pages/publicatio...](https://www-users.cs.york.ac.uk/chrisb/main-
pages/publications/ShannonThesis2006.pdf.1))

It achieves comparable code density with gcc's x86 (albeit for a fictional
architecture).

The really interesting bit is Appendix E, which proposes an architecture for a
fast stack machine. It uses a fairly traditional 16-bit instruction encoding,
except instead of registers each 4-bit operand slot contains an encoded stack
manipulation --- a rot or a pick. This should give even better code density
while still being easy to implement.

Unfortunately I've never tracked down a copy of his compiler...

~~~
throwaway000002
Thanks david for providing that reference. I missed the discussion the first
time.

Perhaps you, or someone here, would know: has anyone asked and tried to
analytically answer what primitive operations an ALU ought to have? I mean,
everything could be coded as a look-up-table, but given code "on average",
what should be available in hardware.

It's a strange question that requires you pose it properly to even begin to
answer it. For example, in the kind of stuff I do, popcnt and other "set-like"
bit operations, i.e. most-significant set bit, are important enough that I
just wish they were built in.

Mine you, I'm doing high-performance data processing, so perhaps it's not part
of the scope of "average" code, but I disagree.

~~~
dewster
"...has anyone asked and tried to analytically answer what primitive
operations an ALU ought to have?"

Very good question. I think this is where the "art" of computer design starts,
and it generally gets short shrift. For instance, the ARM lacks a leading zero
count instruction, which is a pretty wild omission because it has tons of uses
(particularly for floating point) and is fairly expensive to implement in
software using other primitives.

I know it's not scientific, but I learned what to put in the Hive ALU by
programming various algorithms I figured I would need at some point. I just
developed a bunch of floating point subroutines for Hive (cos, sin, sqrt, 2^x,
log2, 1/x, etc., not in the paper yet) and, short of adding a floating point
pipeline, could only really justify adding an opcode that returns +1 / -1
based on the sign bit (for use in storing sign and doing absolute value). My
seat of the pants rule is that if it has broad applicability, saves time and
code space, and isn't too costly in terms of hardware / speed, then I put it
in. Otherwise I leave it out.

IMO, more of computer engineering should focus on what to leave out.

~~~
throwaway000002
I had a glance at your Hive paper. Seems like very interesting work.

I don't have all the know-how, but if I had the resources, I'd like to build a
system where I push compute to a node that essentially is a M.2 SSD glued to a
custom core glued to 10G networking. Wire it all up in a Clos network. Use
IPv6 and treat it as the address space, throw in a few routing tricks, and
call the whole thing a computer.

One day I'll build it. Hopefully I can get to that scale at some point.

~~~
dewster
I'm not sure why more processors aren't tightly integrated into memory (and
vice versa). Physically separating processor and memory with slow package pins
means caches and all the real-time headaches and complexity and die size /
cost issues they bring with them.

When I assemble a new PC I almost never increase the memory down the road, so
it might as well be on the processor die.

~~~
noir_lord
They probably will be, Integration is the future once all the other low (and
not so low) fruit has been collected.

------
nickpsecurity
A tiny, open CPU is good. Running Forth is not so good. People should also
look at this one:

[http://www.jopdesign.com](http://www.jopdesign.com)

Their Java processor outperforms most of the others with small amount of
space. It's open. Doesn't necessarily have to run Java as Oberon or something
could probably be ported. The papers section is full of good stuff.

~~~
Volt
>Running Forth is not so good.

Why not?

~~~
nickpsecurity
It's an untyped, stack language. A strongly-typed language gives you more
readability, safety/security and optimization abilities. Dynamic, but still
safe, typing plus hygenic macros can be used to make that even higher-level
and more productive. Plus, there's tons of tooling, talent, and libraries for
a few, good ones. Forth is behind the curve in almost every respect.

Forth is the way it is mainly because Chuck Moore wants it that way. It's his
preference. Tiny cores, typeless, hand-optimized to near metal, primitive VLSI
tooling, 18-bit CPU's... nothing justified by scientific or engineering
argument. All surpassed by other work in cost-benefit analysis. Nice write-up
here by one person that delved deep into it to see good and bad of it:

[http://yosefk.com/blog/my-history-with-forth-stack-
machines....](http://yosefk.com/blog/my-history-with-forth-stack-
machines.html)

~~~
dewster
Thanks for the blog pointer, an interesting read.

As the blog author points out, local variables are a huge issue on a stack
machine because they lead to stack thrash. ANY stack manipulation must be seen
as fundamentally inefficient, and they make Forth a naturally obfuscated
write-only language IMO.

I understand complexity push-back, particularly in the processor / SW worlds
where so much of it seems actually harmful, but the cult they've formed around
Chuck Moore is kinda weird and it makes it hard for noobs to get a balanced
picture of computing.

~~~
progman
> local variables are a huge issue on a stack machine

Forth routines are usually short so they don't need local variables.

> ANY stack manipulation must be seen as fundamentally inefficient

Why? I remember developing Forth code on a 6502 machine, and the code was just
ten times slower than native assembler. That's not bad for a byte code
interpreter. And the byte code is extremely compact which makes Forth very
suitable for tiny systems.

~~~
dewster
> Forth routines are usually short so they don't need local variables.

Variables have to go somewhere. In a zero operand machine they go on the
stack. Good luck finding them amidst all the confusing rolling, picking,
duping, etc.

> Why?

Because the ALU is just laying there doing nothing while the stack is having
it's spine manipulated (see above).

~~~
astrobe_
> Variables have to go somewhere. Good luck finding them amidst all the
> confusing rolling, picking, duping, etc.

Of course you don't want to use roll or pick. That's tutorial-level knowledge
that you shouldn't use them.

What you do when you see that coming is that you offload something to a global
variable. The "something" is often the central topic of some part of the
program, like a file handle for instance. So it makes sense to put it in a
variable because it's needed in many places.

~~~
progman
> Of course you don't want to use roll or pick. That's tutorial-level
> knowledge that you shouldn't use them.

I think the opposite is true. Basic, Assembler and Forth were my first
languages. Forth was the one that was the most powerful, and it showed me best
on bare metal how a computer works.

------
AstroJetson
I have one of these on my Gameduino board. Since my early computing experience
was on Burroughs stack machines I've always liked stack machines and Forth. I
had Forth on my early 8080 systems. It didn't take long to collect a decent
collection of words that made programming go pretty fast.

The J1 was an interesting processor to work with. The speed was very
impressive. They produced a pretty nice dev environment to process the code
with the Arduino.

------
hacknat
Not to be "that guy", but can someone explain to me what about this
architecture is Forth specific? I would have thought that as a stack based
language that most architectures would align pretty closely with its primitive
commands/functions (i.e. Push-Pop-Jmp)

~~~
dewster
People who design stack processors like to say they're Forth or Java machines.
IMO it's more buzz-word than anything else.

I think we know now that tailoring a processor to a particular language isn't
particularly efficient (though the opposite i.e. assembly is).

~~~
pjmlp
Current processors are tailored to C line of languages as they kind of follow
up on the PDP-11 architecture.

And yet, given the way C exploits have plagued the industry, Intel, ARM, CHERI
are actually adding tagged instructions that allow C compilers to generate
code that validates array and pointer accesses at hardware level.

Just like the language specific processors that you state as not being
efficient.

~~~
dewster
It's hard for me to see that CPUs historically and substantially pander
specifically to C. IMO adding hardware crap to fix software crap should be
strenuously resisted.

Early on there were efforts to build custom Lisp machines, etc. which were
abandoned when CPUs became good enough and general purpose enough. Using a
stack processor as a stack language target seems natural until you see all the
inefficient stack gymnastics that go on at the lowest level.

~~~
pjmlp
Yeah, but unfortunately that software crap is going to stay with us for a long
time.

Even I that usually bash C here, do use it when customers or a specific
project requires it. I just try to follow all best practices to make it as
safe as I can, ignoring third party code.

However I can convinced that Lisp Machines, Xerox PARC and Burroughs micro-
architectures, Rational Ada Machines, i432, could have been much better.

Many times technology solutions fail not because they aren't good, rather the
people aren't willing to invest the time they require to become good enough.

For example JavaScript JITs, if it wasn't for the research money that Google
and other vendors are willing to invest, no one would believe they would
achieve the execution speed they have nowadays.

I also remember when Z80 coders could easily outperform C, Pascal, Basic and
Modula compilers for 8 bit micros.

~~~
dewster
Yes, but even with the most meticulously hand coded assembly, a canonical
stack machine will starve the ALU ~1/3 of the time.

Better to code however is the most efficient at the top level, implement the
HW however is the most efficient at the bottom level, and automate the middle
ground.

------
algorithm314
in action in ice40 fpga using icestorm
[http://www.excamera.com/sphinx/article-j1a-swapforth.html](http://www.excamera.com/sphinx/article-j1a-swapforth.html)

