Indeed. I took a computer engineering class in undergrad. The capstone project was implementing from scratch a multi-pipeline RISC CPU (including the ALU and a very basic L1 cache) in Verilog that we flashed to FPGAs that we then programmed to play checkers using hand-compiled C. The FPGA was easier to flash and debug than the Spartan-6 mentioned in TFA but was significantly more expensive as well.
It was a brutal class, but it totally demystified computers and the whole industry in a way that made me feel like I really could understand the "whole" stack. Nothing scares me any more, and I no longer fear "magic" in software or hardware.
That was a difficult yet extremely rewarding class. My wife, then girlfriend, still remembers that semester because she barely saw me.
IIRC, bonus points were given to the team with the highest clock speed. I didn't win, but I seem to remember mine being somewhere in the 18MHz range and the winner in the low to mid 20s.
You sound exactly like my professor for computer architecture. “Computers are not magic!” He mentioned this at least once during every lecture and my experience was similar to yours
 Profiles of the Future, Arthur C. Clarke
The atoms below that are relatively straightforward, and them being made up of building blocks is fine I guess, increasingly irrelevant to the issue of a computer.
And going up everything starts to get very non-magical as you turn response curves into binary signals and then string gates together.
Btw I read your below word as above.
(And for difficulty making walls too thin lest electrons leak through, you don't need to invoke tunneling to explain that.)
Also I never understood things like slew rates, noise, gain bandwidth. Too high up the stack.
I've started with gwbasic, then QBasic and moved up to "Visual Basic for DOS" (ncurses-style UI stuff, similar to Turbo Pascal iirc — I don't think many people even knew it existed since Win 3.11 was already big, but I loathed it and only switched to it because of... Trumpet Winsock for the internet!), but that did not stop me from playing with driving serial (COM) ports or parallel ports (LPT) with printer escape sequences. Not really FPGA level low (not even close), but DOS-based stuff was really easy to start hacking up!
One day I was working at a different branch filling in for someone on sick leave, and I spent an hour trying to get a PC I just built to boot reliably. Every time it crashed after an minute or so. I didn't get it, everything seemed fine.
Eventually it turned out the "cheat sheet" they had of the jumpers was upside down over there. Someone had copy/pasted the pictures so it was matching the orientation of how they usually had the PC on the desk, rather than upside-down as I was used to. But the text was the right way up. So the total thing looked the same as in our branch except it wasn't.
It was a square block so I hadn't noticed the orientation was different. Turned out I had the 100Mhz pentium configured for 180Mhz. Oops. That wasn't even an officially supported speed of the motherboard but the BIOS messages indicated this (which I only noticed afterwards)
As we didn't want to sell this CPU after the torture I had put it through we decided to use it for a display box instead, and we tried to keep it running as long as possible by using compressed air cans upside down to blow dry ice :D It actually ran reliably until the can ran out. Only later I found out that liquid nitrogen extremeclocking was actually a thing :D
So during a long download, I didn't need all 40 MHz screaming along (and heating up the chip to the point that it needed a cooling fan -- a COOLING FAN, can you imagine a CPU running so fast it couldn't cool itself on ambient air?), so I decided to see if the clock generator jumpers were hot-pluggable.
Lo and behold, they were! I could reach in and seamlessly downclock the CPU to 8MHz (which was just one jumper-cap different than the 40MHz setting), which was still plenty to service the UART FIFO interrupt. Unplug the CPU fan too, which made the machine silent. Turn the monitor off, kick back in my chair, and take a catnap. The Telemate terminal software would play a little tune when a download finished, which would wake me up, I'd turn the monitor back on, open a DOS prompt, start unzipping the file, and then reach in and clock the CPU back up so the pkunzip process would finish in a timely manner.
It would do 50MHz but the upper half of RAM would disappear, so there weren't a lot of workloads appropriate for that configuration....
This feels super nitpicky but I'm curious about your setup and if you're either remembering the clock speed wrong or if the fan was actually completely extraneous, because in fact neither of the common 40 MHz 486 parts, the Cyrix Cx486DX40 or the AMD Am486DX40, required a fan. The Cyrix one came with a heatsink, which was a rarity at the time.
The first 486-class CPU that pretty much always ran with active cooling was the DX4/100. Even the DX2/66 could run fanless if you had half decent airflow.
But consensus among everyone _but_ the manufacturers was that additional cooling couldn't hurt. (A representative opinion can be found in Upgrading And Repairing PCs, whatever edition was current at the time.) Running right at the top of Tcasemax wasn't good for longevity in terms of electromigration within the chip itself, nor for the capacitors and other components in the neighborhood. Thermal goop wasn't commonplace yet, but the little heatsinks and fans sold like hotcakes (har!) at the local computer shows. Plain aluminum heatsink, clear (polystyrene?) fan, with a holographic "CRYSTAL COOLER" sticker on top. I still see the fans around, but without the shiny sticker.
The Am486DX-40 was my favorite chip. With a VLB video card (Trident 9400CXi) that worked well on the 40MHz bus, its pure pixel-pushing power ran rings around 33MHz-bus systems regardless of their core clock, and that included the P-75. I later got the impression that I lucked out with that Trident card, as almost everyone else with a 40 or 50MHz VLB machine had tales of woe and flakiness.
If you're not already familiar with it, you'll likely enjoy this trip down memory lane: https://redhill.net.au/ig.html
Yes, running a 40 or 50 MHz bus made a huge difference, especially if you could get VLB graphics running reliably on it. I'm into collecting and tinkering with 486-era machines for nostalgia's sake and often i see things like DX2 or DX4 systems with plain cheapo 16-bit ISA graphics cards and think such wasted potential...
Luckily I did not fry it and after adding a cooler it worked just fine.
I don't remember if AMD or Cyrix CPUs were worse though.
The speed on my Apple FastChip is adjustable in real time too. It's neat to just dial a speed appropriate for the application at hand.
for other interested parties:
If you are interested in programming your Apple in assembly, you can ask nicely for your FastChip to include a 65816 processor. It's going to act like a 65802, due to hardware limitations, but otherwise yeah. You get the 16 bit instructions to use.
I've not had any compatibility trouble with mine, which is a 65816.
It was fun to turn it on for games that used timing-loops for frame rendering to make games twice as fast :)
They originated with "Turbo XT" class machines which ran an 8088 but at 8, 10 or 12 MHz -- faster than a real IBM PC/XT. Turbo on meant a faster machine, and turbo off meant 4.77 MHz -- fully compatible with timing-sensitive PC software.
Later, in the 386/486 whitebox PC era, some machines had the buttons wired wrong and now it's a meme that turbo made the computer go slower, but that was never true for systems built correctly.
The game was intended to be on a 7Mhz stock machine pre-1990. And it was perfectly timed to that speed.
Sidebar: Ultima games are great. Did you see Nox Archaist?
I loved modifying CONFIG.SYS with a hex editor to translate MS-DOS 6.2 (?) boot menu.
> With the introduction of CPUs which ran faster than the original 4.77 MHz Intel 8088 used in the IBM Personal Computer, programs which relied on the CPU's frequency for timing were executing faster than intended. Games in particular were often rendered unplayable. To provide some compatibility, the "turbo" button was added. Engaging turbo mode slows the system down to a state compatible with original 8086/8088 chips.
I never had such a button in my PCs (first one was a 386SX) but I did see it on other PCs and always wondered what it did... => today I finally found that out :P
Why? What you did is basically a burnintest. All manufacturers torture their hardware by locking it in a hot room for several days at max speed to see if it fails. The basic theory is that if it's able to survive the torture test, then it's less likely to fail once it's been sold to the customer. Parts for things like space missions go through even more severe torture tests, where they're bombarded by radiation and every horrible thing you can imagine and that actually makes the price go up!
Just for fun, I loaded Red Hat 5.2 on to the machine and it ran just fine. The syslog was full of bizzare errors, chattering the whole time too.
Reasoning by MS was low quality of the countless low-end power supplies, and maybe voltage regulator modules on mainboards, being 'unreasonably' stressed by load changes that fast.
Does anyone know of good write ups or explanations of what makes the 6502 so reliable and what competition it had in being chosen for medical applications?
I taught myself BASIC, assembler, graphics programming and game programming on that machine over a period of about four years of hacking around on it (including hand-commenting some significant chunks of the ROM). By the time I retired it for a shiny new Amiga 1000 in 1986 I'd upgraded it to 256k of bank switched RAM with a soldered-in hack board, added four floppy drives, various I/O boards and learned OS/9 (a UNIX-inspired multi-tasking, multi-user OS) and hacked in my own extensions to the ROM OS (including adding my own new commands and graphics modes to the BASIC interpreter).
It started out as a lot of trial and error but, on later reflection, ended up being a surprisingly thorough grounding in computer science from which to launch my career. That 6809 machine was also the last time I really felt like I was aware of everything happening in a computer from interrupts to registers to memory mapping down to the metal.
Would be interesting to make a 32 bit Apple 2 style computer. Include a ROM for a means to boot, and leave everything else simple, with some nice slots. Could be a great development / learning machine.
One of the bigger challenges is integrating peripherals. I got bogged down trying to do SD Card interfacing. There are off the shelf bits of IP from Xilinx, etc. you can use to do this, but that sort of defeats the purpose of the exercise.
I think modern machines started their slide into mind boggling complexity when bus speed and CPU speed outstripped RAM speed. So much complexity and unpredictability is in the all the infrastructure built around cache.
Something like an Amiga or Atari ST was still not hard to understand all/most of, despite being 16/32 bits.
I have a couple of those. The HYDRA was a lot of fun.
A search of the interned didn't turn up what I was looking for, but I'm very new at hardware work. Perhaps newer chips have such tight timing requirements that you can't work with them without using a SOC?
If you're just looking to breadboard up a computer but don't want to go back to 8-bit processors, the Motorola MC68000 / MC68008 used in the original Apple Macintosh is a 32-bit processor in a DIP package running at a manageably low frequency and can be found on eBay inexpensively.
No. The busses used to access peripherals and memory are not suitable for off-die use. (This goes for all ARM cores, not just Cortex-M0.)
It's quite a bit above the Apple ][ in terms of power.
(and shameless plug, my own 'remix' with better performance and more features: https://floooh.github.io/visual6502remix/)
[Edit: It's webgl. I don't have webgl in QubesOS :(.]
I wonder if it's just a function of the time. I imagine anything designed new now would use an ARM based microcontroller but likely when many of these systems were originally designed those were much less common and more expensive.
ARM Cortex M0/M0+ blows AVR out of the water, and is usually cheaper except for the very lowest end AVR parts. Generally will use less power, too. And that's assuming your unit counts are so high that firmware developer time is free.
Of course, it's getting impossible to find 5V VCC ARM parts, so that's something that would steer you towards AVR if your system is really a bunch simpler by having a 5V micro.
I ported an AVR code base to a cortex M4 last year, and some of the inlined asm didn’t translate. I ended up having to use inlined C instead. So, my 120Mhz M4 chip struggled to do what a 90Mhz AVR did no problem.
AVR32 was neat, but has lost all commercial relevance.
There's an FPGA soft core called nextz80 that's supposed to do 4x more per clock cycle than a normal z80.
"Works at up to 40MHZ on Spartan XC3S700AN speed grade -4) - performances similar or better than a real Z80 running at 160Mhz."
What struck me is how lean early tools and applications really are. Many are just usable at 1 Mhz.
At 16, things are generally luxurious. Doing graphics, or writing, even running programs in BASIC all make sense and perform.
100Mhz is crazy! Frankly, one could add to the software and take advantage of the fast electronic storage available today and get real things done.
I wonder just what people will wnd up doing on a BBC Micro, or Apple 8 bit machine fitted with one of these.
I want one! Fun project.
The original article touches on this, the difficulty interfacing an Atari 8 bit, C64, etc.
Memory and peripheral access would be seriously wait-stated though. 50 cycles of action doesn't do you much good if memory is slow. Especially when you consider that programs for the 65xx made heavy use of zero page / direct page as an extra bank of pseudo-registers.
So you'd end up implementing some kind of cache, or memory mirroring, or just moving the whole of RAM in the FPGA... and then you start to wonder why you didn't just do the whole thing in FPGA as a C64 SoC.
All reads could be from fast RAM.
Then we have hardware registers and external DMA, those have to be handled specially.
And the Apple has slow RAM and fast RAM in a similar way. Really, to get the machine to run at 16Mhz, it's necessary to copy code into the fast RAM on board the card, leaving system RAM unused.
The Color Computer, Apple 2 and some others were made in a simpler way that did not interrupt the CPU for refresh and or video access cycles. That makes projects like this easier.
However for "highly integrated" home computers like the Atari 8-bitters and C64 I guess this wouldn't be of much use, because most games and demos depend on proper CPU timing, even when not accessing memory mapped IO regions (for instance in wait-loops to get to the right raster position before reprogramming the video output).
This ends up becoming a very fun design problem when you do it with integrated circuits!
Correct, and even more - the Commodore 64 used a 6510 processor, not a 6502 processor. They're similar but there are significant differences.
Beyond that, a 6510 isn't the only thing you really need to emulate a Commodore 64. You also need a SID Chip (MOS 6581) for sound and a MOS VIC-II for display and a number of other things.
It would be quite easy to modify the design to full 6510, which just has a few more pins dedicated to IO. The biggest issue is properly emulating the bank switching, which they have done for other hardware, but the 6510 has a more complicated scheme.
The bigger problem is that all the RAM is really used for "IO" (in theory anyway) on the C64, as the VICII can remap the character generator (font) location, where it pulls sprites from, and where the screen content is stored.
So a static memory map is insufficient if you want it to just plug into the CPU socket and work.
I always wondered in those days why the disk drives for 8-bit computers were so crazy expensive. In Holland they cost more than the computers they were meant for.
But only later I learned that they were basically another whole computer themselves. Plus the drive mechanism of course which also wasn't cheap (but not nearly as expensive to warrant the high price).
It was the same for the Atari 800XL I had, I never owned a commodore 64.
It came out before any of those other home machines, and yet had the cheapest floppy disk storage from 1978 onward. That was largely due to Steve Woz's brilliant disk controller design, which did away with everything but some simple glue logic and a couple ROM chips, lifting everything else in software.
Of course, the Apple II had real expansion slots, obviating the need for using a serial connection, too.
From what I can tell, while the Apple II family had a much higher up-front cost, the more serious you were about computing, the more the low-priced home machines with expensive peripherals worked against you in the long run.
OTOH, the time-critical hack that allowed it also made it nearly impossible for Apple to upgrade the II without breaking backwards compatibility. The only Apple II with a faster 6502 is the //c+, and that because it has the crazy Zip Chip acceleration logic on the motherboard.
The mechanics were also somewhat expensive. In Brazil, an Apple II drive was often as expensive as an Apple II clone.
What makes the intelligent drives a great idea is how easy it is to emulate them - you emulate a nice protocol. When you have to emulate, say, an Apple II drive, you need to emulate the delays the drive mechanics introduce, as well as the head electronics, because the Apple II's 6502 is reading the head and assembling the bits. That's also why accelerating an Apple II requires you to slow it down for a longer time every time it accesses the IO region - because the disk needs to revolve in the exact time the 6502 takes to run some amount of code. With an intelligent peripheral, it doesn't matter you don't wait several seconds between commands, as long as you only issue them at the required speeds.
Real hacker spirits alive.
They're compatible enough you can drop a 6510 into it with no problems (I tested that, to my parents great despair). You can also swap the IO chips I think, with various effects (you at least can drop the Amiga CIA chips into a C64 - you lose the realtime clock nobody uses, but gain timers).
Putting a 6502 into a C64 may or may not work for some values of work or not at all - I don't recall what the default for the bank switching would be, but the tape drive certainly wouldn't work (the gpio lines on the 6510 is used for bank switching the ROM, and for the tape). But it should be quite easy to make it work except for the tape drive. You just need to ensure the right voltage on 3 pins for the ROM bank switching (various software that expect to be able to change it will fail though)
A few games that glitches when there's too much stuff going on at the same time might run smoother.
Demos are likely to mostly not work because any remotely fancy effect tends to depend on much more precise timing, though.
It surprised the end users who’d been ignoring the original SysBeep(1) sounds the application previously used.
100MHz. The software you could run! Add a megabyte or so of full speed, pageable RAM expansion. Every computer language right up to C++ (if it works on the 8-bit Arduino it could be shoehorned into a fast 6502 - limited stack? Who cares, just do the big stack in software. Special zero page? Just use it as glorified CPU registers).
What's the point really? But awesome all the same.
Why wouldn't a faster machine with more RAM be a multitasking machine? (Obviously without extras you are limited regarding security etc, but plenty early multitasking machines didn't have that)
It's easy to write a scheduler for a 6502 as there's so little to save, though you'll need to be very careful about stack usage, and you might do better with a specialised scheduler (e.g. for C64 BASIC) as a lot of code you might want to run may store additional state in fixed locations.
The Game Boy Color has a 16-bit address space and almost all its games are 1MB or larger (although that's ROM rather than RAM). The largest game is 8MB in size - which is managed as 512 banks of 16KB each.
There were plenty of machines kitted out with that amount of RAM. S-100 bus and other multi-user systems in the 70's and 80's could handle dozens of simultaneous users. It's cooperative multitasking, not preemptive multitasking.
Someone who is more active in this field may have a more accurate and broader view than I do.
Is one of the most recent and - for me - significant developments. Note that for companies that use FPGAs none of the above is considered a hurdle, though their engineers may have a different opinion and that the hobbyist/hacker market for FPGAs is so insignificant compared to the professional one that the vendors do not care about catering to it.
I think the biggest development, though, is that there's enormously more off-the-shelf Verilog and VHDL, not just on OpenCores like 20 years ago, but also on GitLab, GitHub, and so on. Easy examples are CPUs like James Beckman's J1A, the VexRiscv design used in Bunnie's Precursor: https://github.com/SpinalHDL/VexRiscv (as little as 504 Artix-7 LUTs and 505 flip-flops), and Google's OpenTitan.
But from my POV the more interesting reason for using an FPGA is for things that aren't CPUs. For example, the SUMP logic analyzer and its progeny OLS https://sigrok.org/wiki/Openbench_Logic_Sniffer (32 channels at 200MHz), although I think both of these require the proprietary vendor tools to synthesize. I'm gonna go out on a limb here and guess that reliably buffering up data at 6.4 gigabits per second is not a thing that any CPU can do, even one that isn't a softcore; CPUs that run at speeds high enough to potentially do it invariably depend on cache hierarchies that monkeywrench your timing predictability.
As I said, though, I'm not active in the field, so all I know is hearsay.
I love people that throw themselves at a problem that has no real use but just want to master technology. Kudos!!
The board design is a masterpiece too. Really clean.
"Pentium overdrive": https://en.wikipedia.org/wiki/Pentium_OverDrive
Your suggestion is closer to a grid computer but even then I don't think an unmodified 6502 would be a great choice because the memory model (or lack thereof) would really restrict performance.
Intel did entertain a similar thought for a while, as far as I can understand: https://semiaccurate.com/2012/08/28/intel-details-knights-co...
The LAN controller used to make the Beowulf cluster would probably have more compute (and memory) than the 6502 itself.
The Intel cores in the linked article have a distinct L1 data and instruction caches inside them, and associated L2 caches, which makes a big difference in comparison to the 6502.
(link pulled from the references section of the post)
If you were to disable the use of the on-chip RAM it'd be stalled far more than half the time, as it'd be unable to fetch instructions fast enough.
EDIT: Actually you're right that there's a problem with the bank switching here since it tries to mirror the system RAM/ROM, and it won't be able to as it has only 64K on-chip RAM. You could conceivable get it to work by designating the entire address space as an "IO area" but it'd totally kill performance.
"It may be possible and worthwhile to also support some slightly later machines: The Acorn BBC Micro, Atari 400 and 800, and maybe the Commodore C64 come to mind."
So support is definitely under consideration.
If you disable the RAM mirroring, all you need to make it compatible is to map the 6 IO pins to address $1. That "solves" the bank switching, but at the cost of killing performance totally as the chip will be starved for memory access most of the time.
Judging from his pictures, he's using a version of the Spartan 6 (XC6SLX9) that has 72KB on-chip RAM, though, so unless he's using any of the RAM for anything else he could still mirror both the 64KB RAM + the KERNAL and BASIC ROMs. But he'd also need to keep track of various VICII registers to know which areas to designate as a "IO areas" to pass through writes for, given the VICII can address memory "everywhere" for sprites, fonts and bitmap data depending on what you write to different registers. Since that can at any time it'd involve a lot of "fun" logic to flush data from the on-chip cache to the C64 memory if a register changes.
The control and protocol handling part of the was a modified 6502 core with a sort of MMU and single cycle zero page registers. The whole thing was clocked at 33 MHz, probably making it the fastest 6502 in production at that time. Not that we sold that many of the devices...
As an old c64 scener, I really enjoyed being able to code the application SW using my favorite ASM tools. Though most of the code we actually compiled from C using cc65 and then hand tuned to fit the mem constraints.
Today a simple Cortex M0+ MCU (with internal AES core) would be able to do what we did, and probably be smaller and require less power.
Modern Cortex M0+ chips are probably manufactured using 90nm, 65nm process nodes (or possibly even smaller - but the die size will become I/O-bound. Though you can then add more memory easily without driving up the die size). They have much lower core supply, much better I/Os - and low power modes.
In our specific case, we used I believe 250 nm, or possibly even 350 nm ASIC process. And size in this specific case also related to the package. We used a QFN. Today you can get a M0+ based MCU with low number of exposed I/Os in a small BGA or WCP packages that is just a few mm2.
The idea we had at the start was a chip small enough to fit inside the connector of a serial wire, require so little power to not need external power (basically harvesting), be fast enough to not reduce bitrate. Add very low and fixed latency. And be transparent to (after configuration) be totally transparent as seen from the application. Basically a secure serial cable. But reduced to be an extra cable connector that is inserted between an IoT, SCADA device and its serially connected modem.
Due to the process node we didn't really get there. But today this is basically feasible using off-the-shelf MCUs.
I actually found a few of the chips and one of the cable connectors/dongles Yesterday. So I still have a few 33 MHz 6502s ;-)
You can buy them still:
I like this idea. This may bring a new level of repairability for devices whose chips will, one day, no longer be manufactured.