
Robot Game: Comparing 6502 C, Assembly, and Forth - druzyek
http://calc6502.com/RobotGame/summary.html
======
compiler-guy
The 6502 is still doing hundreds of millions of units a year, and is still a
terrific introductory processor that is comparatively easy to wrap your head
around, especially if you are interested in board bring up from scratch.
(Compare the requirements to boot a 68000 to a 6502 into a nop loop, for
example.)

The 6502 is far from dead.

~~~
chris_st
Glad to hear it! I first learned assembly for 6502 on the Apple ][, and found
it indeed easy to wrap my head around. Great machine, great architecture, TED
II Editor/assembler was amazing...

~~~
dhosek
The great thing about the Apple ][ was that it was largely the work of Woz and
thus it was very much possible for a single person to fully understand the
whole thing. Even the addition of AppleSoft BASIC and DOS didn't keep it from
being comprehensible.

------
siraben
This nicely provides empircal evidence on the slowdown of using Forth, which
seems to be around 10 to 20 times slower than optimized assembly. I was
initially surprised by how poorly Forth performed on all counts of speed,
memory usage and development time. For something as complex as a game the lack
of a type system and postfix nature make Forth quite unsuitable.

When I use Forth, it's often for high-level applications[0] with the
performance critical words written in assembly. So this keeps the complexity
relatively low as opposed to writing the whole thing in Forth.

[0]
[https://github.com/siraben/zkeme80/blob/master/src/bootstrap...](https://github.com/siraben/zkeme80/blob/master/src/bootstrap-
flash1.fs)

~~~
asguy
If you program Forth like it's C, and then use a non-optimizing compiler,
you're going to have a bad time (e.g. 10/20x slowdowns).

I've written Forth compilers that generated output that was ~25% slower than
optimized assembly, without putting that much work into it. There are
companies that will sell you a much better compiler [0], at prices even
approachable to hobbyists.

[0] - [https://www.mpeforth.com/software/pc-systems/vfx-forth-
for-l...](https://www.mpeforth.com/software/pc-systems/vfx-forth-for-linux/)

~~~
exikyut
FWIW: The URL is being truncated right at the "...for-linux" part. I was
mildly surprised.

This particular distribution also ships with GTK+.

That being said, a simple "3 2 1 ?" (I was unsure what word would dump the
stack) SIGSEGVs quite consistently.

It's also 32-bit, and the backtrace handler produces "XXXX:XXXX" style
addresses (?).

~~~
asguy
You’re dereferencing an invalid pointer with a variable lookup; try “3 2 1 .s”
... there’re some good ANS Forth gems here [0].

[0] - [https://1scyem2bunjw1ghzsf1cjwwn-wpengine.netdna-
ssl.com/wp-...](https://1scyem2bunjw1ghzsf1cjwwn-wpengine.netdna-ssl.com/wp-
content/uploads/2018/01/Starting-FORTH.pdf)

------
james_hague
When I used to write games in 6502 assembly, I'd use a more native style
rather than modeling the code after a high-level language. For example, I'd
maintain global settings instead of passing lots of parameters, often having
these live across many function calls. I wouldn't pass more parameters than
fit in A/X/Y. The main loop was sometimes a big run of inline code instead of
having many routines that only get called once. And so on.

Nice project! Looks like it was fun to work on.

------
mwcampbell
I wonder if the author is familiar with the PLASMA [1] language and bytecode
VM for 6502-based Apple computers. That might be another interesting option to
add to the comparison.

[1]: [https://github.com/dschmenk/PLASMA](https://github.com/dschmenk/PLASMA)

~~~
druzyek
PLASMA is a really neat project! I considered it and a few other systems
before settling on C, assembly and Forth. Ultimately I left it out because I
was focusing on performance not size, so a byte-code VM would not be a good
fit.

------
orphean
Interesting that OP ignored ca65 and went to NASM for the second assemble
option especially since they were already using cc65.

Nice use of High Level Assembly with the macros!

------
Koshkin
Unfortunately, it seems that for all practical purposes, and as much as we
love it, 6502 in all its (still existing) incarnations is all but dead.Even
MIPS has been dying, albeit slowly and painfully. The unpleasant truth is that
as we now all know that ARM is currently the only sane choice for a hobbyist
project (see, for example, Color Maximite 2), and we can only hope that RISC-V
will someday replace it in the hands of CPU aficionados, but in the times when
for a few pennies we can have an 8-pin ARM microcontroller, this is a no-
brainer. (ESP32, being quite popular, due to its highly integrated nature and
the focus on IoT is a somewhat different proposition.)

~~~
LIV2
"What is sane for a hobbyist project" depends on the project and the wills of
the designer.

I've built a couple of 6502 based systems now and added VGA output and floppy
support which probably would've been easier with an ESP32 or something but my
goal was to learn how to do this the hard way.

------
microtherion
Is the mult5 example an actual piece of code of your program?

Any multiplication where one multiplicand is a constant should be much faster
with shifts and adds than with a loop. mult5, e.g. is: x+(x<<2)

~~~
druzyek
Right, I do this in the assembly versions everywhere I multiply by a constant.
mult5 is just an example.

------
Marazan
I don't know Forth on the 6502 but I think the statement that STC is the
fastest form of Forth is not always true. But it will lead to larger programs.

------
albertzeyer
It's kind of a funny coincidence that I also wrote a very similar looking game
which I also called "Robot Game" at some point, and I also rewrote it at
multiple points in time in different languages:

* Starting in Visual Basic 3 in 1999 ([http://www.az2000.de/projects/robot/](http://www.az2000.de/projects/robot/)),

* then later in Object Pascal / Lazarus in 2006 ([http://www.az2000.de/projects/robot2/](http://www.az2000.de/projects/robot2/)),

* and recently in Python in 2017 ([https://github.com/albertz/PyOverheadGame](https://github.com/albertz/PyOverheadGame)).

------
astrobe_
> This brings us to one of the main shortcomings of Forth. If you need to
> access more than three local variables at once (in the body of a loop for
> example) there is just no convenient way to do so. [...] Consider the
> variables used in this C function that draws 1 bit-per-pixel images with
> optional rounded corners: > int DrawTile1bpp

Looking at the C version, the argument "tile" is used once in the function, to
get the pointer to the tile. The pointer could be passed directly: one less
local. x and y are passed only to calculate an address, this address could be
passed directly - there are only three calls to this function, it might be
worth the DRY exception.

t0 seems to be always 0, except when it is undefined.

The t_height, t_with are just offsets in the tile structure. This locals can
go away too - probably eating the cost of fetching them each time is more
effective than the cost of all the stack juggling done to avoid it. This
reminds me a word from a Chess great master: "don't waste time trying to save
one".

trans_row only exists to set skip_pixel, it seems one of them can go away. The
logic seems to be "unless trans_row and something, do something". The C
version might actually be more verbose than necessary.

Finally, the edge_style complicates a lot the logic. Using one function to do
different things because it is so simple to "just add another parameter" is
typical in many HLL, and often result in awful spaghetti code.

What a Forth programmer would do is to write one function for each edge style
and see what the have in common. More often than not, they share a lot of
subwords, so that having one function for each case is not more expensive than
one function for all cases.

I suggest doing that with the C version and see what happens.

> Without parenthesis or commas, there is no way to tell just by looking at
> the code which of those are functions, which ones are variables, which of
> them are inputs to the others, or what any of them return. Forth fans will
> say that that information is available in the stack comment for the word's
> declaration, but that assumes that the programmer bothered to create one.
> Even if they did, you have to search in several places in the file to figure
> out what is instantly obvious if the same code were written in a language
> like C or Python. It's easy to see why people criticize Forth for being
> "write only." If you translated the line above into C, it could be doing any
> of the following:

Actually even in C some follow certain naming conventions - like #defines
being all caps, member things being prefixed by m_, etc. And you can do that
in Forth too. Once again, "write only" has more to do with the author than wit
the language.

~~~
druzyek
Hi astrobe_, interesting points! I disagree with you on a few things:

> Looking at the C version, the argument "tile" is used once in the function,
> to get the pointer to the tile. The pointer could be passed directly: one
> less local.

The argument "tile" is an 8-bit index, which is faster to pass than the 16-bit
pointer to the tile. More importantly, that index is used for things other
than looking up the tile pointer, such as finding color pairs to recolor the
tile. In other words, you could pass the 16-bit tile pointer directly, but
you'd still need a way to correlate that pointer with its matching color data,
and that system would be slower than using the same 8-bit index for both.

> x and y are passed only to calculate an address, this address could be
> passed directly

That would be more efficient and save a handful of cycles when the function is
called. This does not answer the original criticism, however. As I mentioned
on the page, x and y go away after the pointer is calculated, so I'm not
counting those as variables that need to be available in the loop body. You
still have over a dozen variables that you can't manage efficiently in Forth.

> t0 seems to be always 0, except when it is undefined.

Good point!

> The t_height, t_with are just offsets in the tile structure. This locals can
> go away too

I don't think I see your point. If you're going for performance, doing one
array access and saving the value saves a lot of cycles compared to indexing
into the array each iteration, which eats a lot of cycles.

> trans_row only exists to set skip_pixel, it seems one of them can go away.
> The logic seems to be "unless trans_row and something, do something". The C
> version might actually be more verbose than necessary.

You could get rid of one of them at the expense of readability. Or maybe there
is a good name that would indicate both purposes.

> Finally, the edge_style complicates a lot the logic. Using one function to
> do different things because it is so simple to "just add another parameter"
> is typical in many HLL, and often result in awful spaghetti code.

I don't think this function is awful spaghetti code. Yes, the logic is
complicated. I mention on the page that I encoded the data for some tiles as 1
bit per pixel to save room and decided to make rounded transparent corners in
code depending on a flag stored for each tile. This function is a compromise
to save memory. Otherwise, you could just encode the tile like any other and
have three colors including transparency but take up 8 times more memory.

> Actually even in C some follow certain naming conventions - like #defines
> being all caps, member things being prefixed by m_, etc. And you can do that
> in Forth too. Once again, "write only" has more to do with the author than
> wit the language.

I really disagree on this one! All caps and so on for constants is nice but
still does not tell you how many values, if any, a word pops and how many it
leaves behind. That alone earns the "write only" label. For example: CONST_A
m_foo CONST_B m_bar. What does m_foo return and what arguments does m_bar
take?

Thanks for the comments!

~~~
astrobe_
To be clear, I don't think this function is spaghetti code.

I believe you are victim of premature optimization. You knew Forth would be
slow because interpreted, so you feared for performance. Fear often makes you
do the wrong things.

The nice thing about Forth is that speed is often correlated to the number of
words in a definition, so you can start writing for source/memory compactness
all while partitioning the program into meaningful words as much as you can.
Then assess the situation on the performance side.

> I really disagree on this one! All caps and so on for constants is nice but
> still does not tell you how many values, if any, a word pops and how many it
> leaves behind

Just like in Lua or Go you cannot tell how many values a function returns, and
in most dynamically typed languages, what are the types of the input
arguments. Furthermore the name alone cannot tell you about the behavior for
corner cases either, like what happens if you pass a null pointer as a second
argument to a string concatenation function: no-op or crash? You cannot tell
until you check out the docs or read the source of the function.

Chuck Moore even claims that "stack pictures" (comments describing the stack
effects of a word) are unnecessary; the effects should be obvious from the
definition or if it needs more elaborate commentary, it should be documented
somewhere else.

Documentation is something people don't take seriously. When you see that the
whole documentation of a project consists in Doxygen files this is usually a
bad sign. It might be pretty and nice at first glance, but there is more to
documentation than documenting the interface of functions and dropping a few
lines of introduction on top of that. Documentation should tell a story. And
for this story to be well told, you cannot follow the layout of the code.

So to answer your question, what a word does should be obvious from the name
or from its definition. Someone said naming things is one of the two hard
problems in CS. This is where Forth shows you the real problems and demands
thoughtful solutions. 47 characters-long names are certainly not the solution.
Documented naming conventions and making sure a word does not do too many
things are.

~~~
druzyek
> I believe you are victim of premature optimization. You knew Forth would be
> slow because interpreted, so you feared for performance. Fear often makes
> you do the wrong things.

As I explain on the page, the Forth I'm using is not interpreted. It generates
STC, and if the body of the word is smaller than a particular size which you
set, it will inline the code. Someone measured the dispatch overhead for
fetching the next word in FIG-Forth for the 6502 at over 80 cycles. STC only
needs 12 for a JSR/RTS pair and potentially 0 if the word is inlined. In any
case, if I'm trying to make each version as fast as I know how, it wouldn't
matter if I knew that it was interpreted or not. On the other hand, I'm happy
to look at any place in the source you find where it seems fear made me do the
wrong thing.

> The nice thing about Forth is that speed is often correlated to the number
> of words in a definition, so you can start writing for source/memory
> compactness all while partitioning the program into meaningful words as much
> as you can. Then assess the situation on the performance side.

Speed is correlated with number of words in that it goes down when you do a
lot of factoring. Admittedly this depends on the architecture and penalty for
subroutine calls. I was all set up to factor all my words down to a line or
two after reading Starting Forth but switched to longer but much faster words
after starting this project.

> Just like in Lua or Go you cannot tell how many values a function returns,
> and in most dynamically typed languages, what are the types of the input
> arguments. Furthermore the name alone cannot tell you about the behavior for
> corner cases either, like what happens if you pass a null pointer as a
> second argument to a string concatenation function: no-op or crash? You
> cannot tell until you check out the docs or read the source of the function.

I've never used Lua or Go so I can't comment on that. This does not answer the
original criticism. Sure, there are pieces of information in every language
you don't get until you look at the documentation. The point still stands that
you can tell what arguments a C function takes and what it's returning at a
glance while you can't in Forth.

> Documentation is something people don't take seriously. When you see that
> the whole documentation of a project consists in Doxygen files this is
> usually a bad sign. It might be pretty and nice at first glance, but there
> is more to documentation than documenting the interface of functions and
> dropping a few lines of introduction on top of that. Documentation should
> tell a story. And for this story to be well told, you cannot follow the
> layout of the code.

Good point

> So to answer your question, what a word does should be obvious from the name
> or from its definition. Someone said naming things is one of the two hard
> problems in CS. This is where Forth shows you the real problems and demands
> thoughtful solutions. 47 characters-long names are certainly not the
> solution. Documented naming conventions and making sure a word does not do
> too many things are.

This is something different than what I'm pointing out. Yes, good naming will
tell you what it's doing, but it doesn't tell you HOW it's doing it. Even if
the word is kept short and doesn't do too much, the programmer has a lot of
freedom in how they choose to use the stack, so even in well-written Forth,
the word name gives no indication of what the word does to the stack. It's
understandable then that you don't want a word taking 13 arguments, but once
we get to the point where a valid argument could be made to have a word either
take one item off the stack or two, we are stuck in the same morass where we
can't tell what anything takes or returns.

astrobe_, I picked DrawTile1bpp as an example where Forth is especially
unwieldy. Would you like to rewrite it as an example? Maybe if someone
experienced like you took a swing at it, it would better reflect what Forth
can do on the 6502.

~~~
astrobe_
> Someone measured the dispatch overhead for fetching the next word in FIG-
> Forth for the 6502 at over 80 cycles. STC only needs 12 for a JSR/RTS pair
> and potentially 0 if the word is inlined.

I have implemented STC with cod inlining for 8086 a long time ago, as well as
DTC and bytecode, in assembly, C and Forth (sic), so I did my share of cycle
counting.

Sorry if I project my younger self on you, but cycle count is not the best
approach to factoring. Factoring in Forth is about spotting redundancies. This
is a bit different from what people mean today by "(re)factoring", which is
more about how one splits a task, a program, into functions or classes. This
should be called "restructuring" instead. Refactoring in Forth is really a
compression process, even when you could not care less about the size of your
object code.

The best approach is to write your definitions the way you like, mostly
ignoring speed and cycles and byte count. This should result in clean, elegant
definitions. This first step reveals some short words like your @+1 that could
be good candidates for an implementation in assembler. Some other words can be
inlined using IMMEDIATE. Some others can be de-factored as a result of this
refactoring-for-speed process. Doing it that would have made you realize
t0_ppp is always 1, just like t0 is in the C version.

The Python basis for this port needs some optimizations to begin with. If I am
not mistaken, all tiles are square. No need for a t_height or a t_width. No
need for both in the tiles structure either.

The C port of the Python program can be improved. Tiles are defined, then
there's a big array of pointers to those tiles. This is necessary because the
tile data does not have a constant length, but it seems to me preferable to
have an array of structures defining the geometry and colors of the tiles, and
a pointer to the pixel data. The size is the same, just a different layout,
but I expect that delaying the indirection from fetching the tile to fetching
the data of the tile to be more convenient for the code.

Spotting unneeded complexity is harder than one would expect. Past the point
of complexity being the consequence of laziness or lack of skill or lack of
time, you find out that complexity also results from (bad) habits. You need to
configure a bunch of paths for your application, you start writing routines to
read them from an initialization file. Then after a while you realize you are
using an interpreted language, so you could have sourced the configuration
file directly, duh. I'm still making that kind of mistake after years of
practice.

Simplification is a progressive process. Jeff Fox explains it better than I
would in the third chapter of his essay on Forth [0].

> we are stuck in the same morass where we can't tell what anything takes or
> returns

I am puzzled to hear this complain from someone who can program in assembly.
Forth and assembly are both un-typed, "no declaration needed" languages. What
makes those properties acceptable in asm but not in Forth? Your expectations
of Forth being a higher level language maybe?

In any case, all I can do is tell you that with practice, it is not as a big
deal as you think, just like the weird symbols in APL are not a big deal, just
like parenthesis in Lisp are not a big deal.

> Would you like to rewrite it as an example?

Sorry but no. I have no motivation to do that - I dislike coding challenges
and prefer to code things that are actually useful for me - and nothing to
gain from it.

[0]
[http://www.ultratechnology.com/forth.htm](http://www.ultratechnology.com/forth.htm)

~~~
druzyek
> The best approach is to write your definitions the way you like, mostly
> ignoring speed and cycles and byte count. This should result in clean,
> elegant definitions. This first step reveals some short words like your @+1
> that could be good candidates for an implementation in assembler.

Yikes! When I get into debates about this, writing part of the program in
assembly usually comes up eventually. This is a good example of how
inefficient the system is when you have to write assembly to do what would be
*ptr++ in C. Another example is something like "swap 5 + swap" which burns all
kinds of cycles. Someone recommended I rewrite this in assembly as well
although it would only be something like "x+=5" in C. The absurdity of that
speaks for itself.

>The Python basis for this port needs some optimizations to begin with. If I
am not mistaken, all tiles are square. No need for a t_height or a t_width. No
need for both in the tiles structure either.

Sure, but storing those as one value instead of two would also slightly speed
up the C and assembly versions. Forth would still be just as slow and
inefficient compared to the other languages if you made that change.

> The C port of the Python program can be improved. Tiles are defined, then
> there's a big array of pointers to those tiles. This is necessary because
> the tile data does not have a constant length, but it seems to me preferable
> to have an array of structures defining the geometry and colors of the
> tiles, and a pointer to the pixel data. The size is the same, just a
> different layout, but I expect that delaying the indirection from fetching
> the tile to fetching the data of the tile to be more convenient for the
> code.

No, the color data varies as well since some tiles have no color pairs and the
rest have a varying number. Also, different color pairs can be applied to the
same tile depending on the situation, so there is no 1:1 correspondence that
would make your suggestion make sense here. In any case, adding one level of
indirection to save memory and keep the system organized by passing an 8-bit
index instead of a 16-bit pointer adds a few dozen cycles to a function that
takes around 20,000 to draw the tile. Do you mean that if the C and Forth
versions were reorganized, there wouldn't be such a large speed disparity
between the two? That is highly doubtful.

>I am puzzled to hear this complain from someone who can program in assembly.
Forth and assembly are both un-typed, "no declaration needed" languages. What
makes those properties acceptable in asm but not in Forth? Your expectations
of Forth being a higher level language maybe?

The original criticism is that you can't tell what anything takes or returns.
If you look at the assembly, you'll see that the function calls there are just
as clear as in C since we have named arguments and can instantly see what
happens with the return value after the subroutine call. It's very telling
that it's a lot easier to implement a usable scheme for local variables in
assembly, primitive as it is, while Forth still lacks this.

> Sorry but no. I have no motivation to do that - I dislike coding challenges
> and prefer to code things that are actually useful for me - and nothing to
> gain from it.

Fair enough. How about an existing example you could point to then? There are
a lot of unbelievable claims about the performance of Forth compared to C and
assembly, but no one seems to be able to prove any of this. That's one of the
main motivations for doing this project.

