
Clocking a 6502 simulator to 15GHz - scarybeast
https://scarybeastsecurity.blogspot.com/2020/04/clocking-6502-to-15ghz.html
======
kabdib
I was told by Leonard Tramiel (who was my manager at Atari for a while) that
the world record for a production 6502 was 25Mhz. This was demonstrated one
Friday evening, some time after the beer fridge had been opened in one of the
labs.

I don't know if they applied any kind of external cooling, or what the
benchmark was. Probably it was "keep cranking up the clock until pins stop
wiggling or smoke comes out." Not very scientific, but quite entertaining.

~~~
paulmd
"until pins stop wiggling"?

~~~
variaga
The pins don't physically wiggle. "pins wiggling" is a common metaphor for
"the voltage level on a pin is changing".

As a signal driver is toggled at increasing frequencies ('cranking up the
clock'), the signal amplitude (voltage difference between the 'high' and 'low'
period) starts to drop. At a high enough frequency, the signal will be
indistinguishable from noise and 'stops wiggling'.

~~~
duskwuff
> At a high enough frequency, the signal will be indistinguishable from noise
> and 'stops wiggling'.

It's not that the signal will be indistinguishable from noise, but that the
CPU will stop working correctly, so its outputs will stop toggling (or will
toggle in unexpected ways).

------
segfaultbuserr
It's an interesting article, but...

Better title: Clocking a 6502 Simulator to 15 GHz. There are multiple efforts
to recreate the physical 6502 CPU on modern hardware, this is not one of them
and should not be confused with that.

~~~
arriu
I was a bit confused and expected to see some elaborate liquid cooling
nonsense to get the poor chip up to 15 GHz.

------
JshWright
I realize I'm late to the party, but I've really been enjoying Ben Eater's
series on building a simple computer with a 6502.

[https://www.youtube.com/playlist?list=PLowKtXNTBypFbtuVMUVXN...](https://www.youtube.com/playlist?list=PLowKtXNTBypFbtuVMUVXNR0z1mu7dp7eH)

~~~
louwrentius
Yes, it's awesome.

------
halotrope
After stumbling on Ben Eaters “Hello world from scratch” [1] I went out and
bought the cpu some parts and breadboards. The chip is only a few dollars. It
is highly recommended if you want to dive down into computers and digital
logic on first principles. Also great fun to get a break from all the screens
and layers upon layers of software that I have to deal with daily.

1\. [https://youtu.be/LnzuMJLZRdU](https://youtu.be/LnzuMJLZRdU)

~~~
dodo6502
Kind of plugging my own project here, but I too am a software developer that
found great joy from breaking away from all the layers of abstraction and
working directly with the hardware. I created a portable game system with the
6502:

[http://www.dodolabs.io/](http://www.dodolabs.io/)

~~~
halotrope
This is awesome!

------
__s
For more "very fast simple CPU" architecture, see _50,000,000,000 Instructions
Per Second: Design and Implementation of a 256-Core BrainFuck Computer_ :
[https://people.csail.mit.edu/wjun/papers/sigtbd16.pdf](https://people.csail.mit.edu/wjun/papers/sigtbd16.pdf)

------
russellbeattie
Huh... I hadn't considered it before, but Bender's brain _could_ actually be a
6502, just being run at an insanely high clock speed. A few petahertz should
be able to handle the AI involved, no?

Planck time is like 10^-43 seconds, so there's lots of room to divvy up a
second for more processing power given advanced technologies...

~~~
gregoryl
If the hardware is advanced enough to do that, the AI software is similarly
advanced, and a basic 6502 can produce a Bender like AI without breaking a
sweat!

~~~
hvidgaard
More advanced software would with all likelihood require significant
calculations, rendering a basic 6502 useless.

~~~
russellbeattie
Yeah... Maybe we'd need to bump up the clock to exahertz to make up for the
loss of precision and constant memory access. The top super computer is
already at 148 petaflops, so we'd need some more headroom for general AI.

Of course, petaherz (10^15 cycles per second) is _already_ the speed at which
an electron circles around a hydrogen atom, so we may not be able to use
electricity any more...

------
londons_explore
It would be interesting to compare this project to simply converting 6502
assembly into LLVM IR, and letting clangs optimization passes work their
magic.

Obviously self modifying code would be hard to handle, but every other case
ought to work, and the auto-vectorization ought to do amazing things to some
loop-heavy code.

~~~
woodrowbarlow
a 6502 backend for LLVM has been attempted a couple times[1][2], but the fact
that the 6502 only has three registers imposes severe limitations w.r.t.
LLVM's calling conventions.

[1] [https://github.com/c64scene-ar/llvm-6502](https://github.com/c64scene-
ar/llvm-6502)

[2]
[https://github.com/beholdnec/llvm-m6502](https://github.com/beholdnec/llvm-m6502)

~~~
azernik
Other way around - the idea is to create an LLVM _frontend_ for 6502 machine
code, and then transpile that code to run on a modern CPU architecture.

~~~
raverbashing
That sounds like an interesting project!

You would probably want to add some tricks directly there, maybe register
renaming (I don't know if LLVM does "variable renaming", let's put it this
way)

------
zentiggr
GeOS would have been much more responsive...

------
jsd1982
To solve the FF page wrapping problem, I wonder if it would work to double-map
each 6502 page to x64 host pages side by side. I assume the word read at FF
would straddle the two mapped pages effectively reading the second byte at 00.
You'd have to map to host page boundaries of course and probably offset all
reads/writes to the end of the host page at $3F00.

------
userbinator
I believe VMware without hardware support for virtualisation also falls back
to "binary translation" and similarly gets tripped by SMC - I don't recall the
details right now but one of the ways to detect it was to modify an
instruction in an obscure way that the developers had forgotten about.

------
PaulHoule
I want to see a 6502-alike clocked to 15GHz with an exotic semiconductor such
as GaAs, InP, SiGe, etc.

~~~
undersuit
Probably wouldn't be able to see it with current fabrication technologies and
historic transistor counts.

~~~
PaulHoule
Exotic materials, other than maybe SiGe use fabrication techniques less
advanced than Si, and the transistor counts are much less.

The department of defense funded an SBIR grant in the late 1990s to produce an
InP based microprocessor, given the limits of the time it would have been
closer to a 6502 than a Pentium. There has not been word of such a thing since
which leads me to conclude that the topic is classified.

The worst limitation a 6502-era chip has is that it has no instruction cache
so instruction reads are fighting with data for memory bandwidth. You might
even consider a Harvard architecture where the instructions go on a different
bus. Without an I-Cache there is no point in pipelining, but there is a lot of
pressure to implement CISCy instructions such as the string copy operation
from the 8086 line.

The other issue is that there is no DRAM replacement with exotic materials,
and all the difficulties with interconnect latency get a lot worse than they
already are. It's more clear how to make SRAM, so having somewhere between 64K
to 1Mbytes of SRAM on die seems likely for an exotic material CPU.

Of course, armchair CPU designers are more likely to make progress with
transition triggered architectures and FPGAs in 2020.

------
orionblastar
The Mega65 runs a 6502 at 50Mhz compatible with the Commodore 65 plus C64
mode. [http://mega65.org](http://mega65.org)

------
tasty_freeze
Now someone needs to write an x86 emulator in 6502 asm and boot windows.

------
fortran77
I'm not 100% sure where the 15 GHz equivalent speed calculation comes from.

~~~
segfaultbuserr
The author showed it at the end of the article. It's the "effective speed"
reported by some benchmark programs (including calling subroutines, running
for loops, iterating on a string, etc). These are simple and trivial programs
and can be highly optimized in a simulator on modern x86_64. Real-world
programs, like games, is slower, as acknowledged in the article.

~~~
scarybeast
A lot of BBC BASIC programs, doing real work (e.g. Mandelbrot drawing etc.),
should have a shot at 10GHz. Games are slower because they are hammering
hardware registers external to the JIT (sound, graphics, keyboard polling,
timing, etc.)

My laptop is an ancient 5th gen i5 with 2 keys having fallen off, so games are
down in the 2GHz - 3GHz range for me. (Perhaps the missing keys make all the
difference.)

~~~
dr_zoidberg
I understand that some people look suspiciously at the 15GHz mark, specially
considering this was run in a 4.5GHz processor. What I understand is that this
benchmarks are comparing how long it would've taken on a stock 1Mhz 6502, and
calculate the "clock speed" obtained as a ratio. So if I'm getting my result
10,000 times faster than a standard 6502, it means I'm at 10GHz.

I also understand that this is possible because the emulator is running on a
superscalar processor. Not sure if multicore has anything to do here (the post
specifically mentions the high performance of the single-core case for the
processor used). Still, considering that processors back in the 6502 era had
just one execution port, and superscalars this day have a lot (I think 8? I
really lost track of what's usual these days), then the figure makes sense all
right, and without involving any kind of multithreading.

Kudos to the authors of the emulator for having a super-optimized system that
can effectively and efficiently emulate its target!

~~~
scarybeast
I like the framing here, that of seeing this as a showcase of modern
superscalar improvements. And yes, it's about single core performance only.

What is particularly interesting to me is how thoroughly superscalar "wins".
Because of complexities with 6502 -> x64 mapping, and handling self-modifying
code in particular, some of the most common 6502 instructions explode to
multiple x64 instructions. Despite that huge extra instruction load, the
translation still manages to run at much greater speed than a 1:1 instruction
ratio.

Modern processors do not run on electrons. They run on unicorn tears and
magic.

~~~
saagarjha
Note that there is also a speedup from dynamic optimization.

------
RoutinePlayer
X86 ... not x64

~~~
ajross
The architecture never had a good name. AMD originally called it "x86-64" (but
not AFAIK "AMD64", even though lots of other people did), but "x86_64" is most
common in the open source world (I guess because the underscore makes it legal
as a C symbol). "x64" is what Sun and Microsoft decided to use. Intel has
called it "ia32e", "EM64T" and "Intel 64" at various times.

I think this article gets a pass.

