Windell and I talked about building a discrete 7400 (quad NAND gate), in theory you can build any logic you want out of enough NAND gates :-) but implementing the CPU on the PDP 8 was going to be a very , very big arrangement. at 13 in on a side this is pretty manageable. A lot of boards for the MicroVAX are larger than that.
Now the question is if you make a proportionally sized 40 pin package for it, how big is the Apple II motherboard? :-)
Nicely done, congrats on an awesome project. I must go to Makerfaire now to see this in person.
On the other hand something like a KIM-1 would be totally doable although there would have to be changes to the cassette interface and TTY interface software timing loops. Oh and I suppose if you clocked below 10 KHz or something then keypad debounce routines might need some work. But from memory there would be no hardware changes necessary to downclock a KIM-1. There is a hardware PLL in the cassette interface input but all it does is sniff if the incoming tone is above or below some frequency
A KIM-1 was a rather simple 6502 single board computer from the dawn of the microcomputer era.
As for a 2016 design in fewer transistors, that's unlikely. You need a certain number of transistors for each basic function, so there's a floor where if you want all the features of a 6502 (what little there were) you can't possibly do it without X transistors minimum.
But, what 2016's tech would bring is a 6502 that instead of being clocked at 1.5-2Mhz might be clocked instead at a couple Ghz. Performance wise such a chip would be nearly infinitely faster than an original 6502, but would pale in comparison to a modern Intel chip with all the extras (cache, instruction translation, out of order issue, branch predictor, etc.) included in those chips. The 6502 would also be quite acutely sensitive to the speed of memory, so the chip would not be able to run any faster than RAM could feed it data (it has only three user accessible 8-bit registers, and only one of those can be used for computations). It worked with 1970's tech because 1970's memory's were as fast as it was so it was not slowed down by a huge memory vs cpu speed differential. This lack of registers is where its design diverges with RISC tech. as detailed by Patterson.
What a modern 6502 design might do, however, is be extremely power efficient. A modern 2016 CMOS design clocked at 1970's speeds might use very little power. Whether it would beat ARM in that market is an unknown.
Its biggest limitation for a 2016 design that is true to the original is being only an 8-bit chip with only a 16-bit address bus. Having 64k of RAM max on one's CPU in 2016 is going to crimp what solutions it might be useful for vs. using an ARM chip for the same solution.
A new 65c02 clocks up to 20mhz (apparently) without any real issues. And like you said, WDC can provide custom cores that do much more.
For most applications in 2016 that could actually use a 6502, they probably don't need it to run that fast (and more speed would mean more power usage). New pin-compatible 6502s are still sold today for embedded applications: http://www.tomshardware.com/news/mouser-6502-motorola-6800-c...
But if not for pin-compatibility, instead of being 4.3mm x 4.7mm, a 6502 could be a speck of dust, with the only size constraints being connections to other components.
> What a modern 6502 design might do, however, is be extremely power efficient. A modern 2016 CMOS design clocked at 1970's speeds might use very little power. Whether it would beat ARM in that market is an unknown.
The 6502 design from 2012 that I linked above uses 300µA.
> The 6502 would also be quite acutely sensitive to the speed of memory, so the chip would not be able to run any faster than RAM could feed it data (it has only three user accessible 8-bit registers, and only one of those can be used for computations). It worked with 1970's tech because 1970's memory's were as fast as it was so it was not slowed down by a huge memory vs cpu speed differential.
The 6502 also only addressed 64kB of memory (though variants like the 6509 supported up to 1MB via multiple banks). With 2016 technology, you could easily supply all the RAM a 6502 could ever want or use as on-die SRAM that matches the CPU speed.
Well, a thousand times faster.
If you look at the instruction set, you can see that it is relatively horizontal and pretty straight forward to decode and implement with a simple sequencer.
Of course, it was implemented in NMOS like most of the contemporary microcontrollers, so it was a power hog by today's standards, but the transistor count per gate was lower. The 1802 was CMOS, but it was aimed at applications that required space-grade parts, so it was low power and had transistors the size of a cow turd in order to be more resistant to alpha-particle hits. It was also slow.
For comparison, at the time I was playing with 6502's as a hobby, I was at my day job a mainframe CPU logic designer working in 100K ECL -- I was working on a machine roughly equivalent to the Cray-1, and our gate count was roughly 250,000 gates.
The logic design for a CPU the size of a 6502 is really not too bad -- sort of the scale of a largish homework assignment as a semester final -- constructing something like it in a gate level simulator might be an interesting exercise for the motivated.
You're around an order of magnitude off. It's 3.5K transistors, and slightly less than half of those are NMOS pullups, with many of the gates containing 2-3 or more transistors, so it's less than 1K gates. That of course depends on what you count as a "gate", since many of them are compound-gates like AND-OR-INVERTs and there's also transmission-gate logic too. The decode PLA also has tons of large-input gates (hence many transistors-per-gate).
This article may be related:
The transistor count itself was very low even for its day, so while that isn't my forte, I would guess that gains in that metric wouldn't be all that great.
Why fewer transistors though?
The F18A is a very eccentric design, though: it has 18-bit words (and an 18-bit-wide ALU, compared to the 6502's 8, which is a huge benefit for multiplies in particular), with four five-bit instructions per word. You'll note that this means that there are only 32 possible instructions, which take no operands; that is correct. Also you'll note that two bits are missing; only 8 of the 32 instructions are possible in the last instruction slot in a word.
Depending on how you interpret things, the F18(A) has 20 18-bit registers, arranged as two 8-register cyclic stacks, plus two operand registers which form the top of one of the stacks, a loop register which forms the top of the other, and a read-write register that can be used for memory addressing. (I'm not counting the program counter, write-only B register, etc.)
Each of the 144 F18A cores on the GA144 chip has its own tiny RAM of 64 18-bit words. That, plus its 64-word ROM, holds up to 512 instructions, which isn't big enough to compile a decent-sized C program into; nearly anything you do on it will involve distributing your program across several cores. This means that no existing software or hardware development toolchain can easily be retargeted to it. You can program the 6502 in C, although the performance of the results will often make you sad; you can't really program the GA144 in C, or VHDL, or Verilog.
The GreenArrays team was even smaller than the lean 6502 team. Chuck Moore did pretty much the entire hardware design by himself while he was living in a cabin in the woods, heated by wood he chopped himself, using a CAD system he wrote himself, on an operating system he wrote himself, in a programming language he wrote himself. An awesome feat.
I don't think anybody else in the world is trying to do a practical CPU design that's under 100 000 transistors at this point. DRAM was fast enough to keep up with the 6502, but it isn't fast enough to keep up with modern CPUs, so you need SRAM to hold your working set, at least as cache. That means you need on the order of 10 000 transistors of RAM associated with each CPU core, and probably considerably more if you aren't going to suffer the apparent inconveniences of the F18A's programming model. (Even the "cacheless" Tera MTA had 128 sets of 32 64-bit registers, which works out to 262144 bits of registers, over two orders of magnitude more than the 1152 bits of RAM per F18A core.)
So, if you devote nearly all your transistors to SRAM because you want to be able to recompile existing C code for your CPU, but your CPU is well under 100k transistors like the F18A or the 6502, you're going to end up with an unbalanced design. You're going to wish you'd spent some of those SRAM transistors on multipliers, more registers, wider registers, maybe some pipelining, branch prediction, that kind of thing.
There are all kinds of chips that want to embed some kind of small microprocessor using a minimal amount of silicon area, but aren't too demanding of its power. A lot of them embed a Z80 or an 8051, which have lots of existing toolchains targeting them. A 6502 might be a reasonable choice, too. Both 6502 and Z80 have self-hosting toolchains available, too, but they kind of suck compared to modern stuff.
If you wanted to build your own CPU out of discrete components (like this delightful MOnSter!) and wanted to minimize the number of transistors without regard to the number of other components involved, you could go a long way with either diode logic or diode-array ROM state machines.
Diode logic allows you to compute arbitrary non-inverting combinational functions; if all your inputs are from flip-flops that have complementary outputs, that's as universal as NAND. This means that only the amount of state in your state machine costs you transistors. Stan Frankel's Librascope General Precision LGP-21 "had 460 transistors and about 300 diodes", but you could probably do better than that.
Diode-array ROM state machines are a similar, but simpler, approach: you simply explicitly encode the transition function of your state machine into a ROM, decode the output of your state flip-flops into a selection of one word of that ROM, and then the output data gives you the new state of your state machine. This costs you some more transistors in the address-decoding logic, and probably costs you more diodes, too, but it's dead simple. The reason people do this in real life is that they're using an EPROM or similar chip instead of wiring up a diode array out of discrete components. (The Apple ][ disk controller was a famous example of this kind of thing.)
Example 8051. Original had ~50K transistors, about 30K in cpu core. 50K transistors = ~17-25K NMOS gates. Performance was ~0.0095 Dmips/MHz, 12 steps per clock (microcode).
Current state of the art is http://www.dcd.pl/ipcores/56/
fast DQ80251 13K/20K gates (cpu core/whole microcontroller), 0.70579 Dmips/MHz = 75 times faster per mhz. can be clocked >300MHz when implementted in asic.
small DT8051 5600 gates = ~12-17K transistors in NMOS(of course nobody does that anymore) ~23-34K transistors in CMOS, 0.0763 Dmips/MHz = 8 times faster per mhz. can be clocked >200MHz when implementted in asic.
But this is legacy inefficient design. ARM laughts at it with M0 at 12K gates and >1 Dmips/MHz at 1/10 the power. Cortex-M4 at 65K gates reaches 1.9 Dmips/MHz
In those days if you were really interested in computers you tended to go lower level and learn assembler.
Let's not forget the Apple I
EDIT: and the Apple II, for that matter...
Mine still runs, not ran :)
I'm confused by this bit - does this mean that there's a bug in Visual6502?
Hardware is weird.
BUt I was thinking of gate capacitance, some voodoo impedence matching (or even mismatching) to prevent trouble from some reflection somewhere. In that case, the discrete board can't be expected to have the same set of problems.
But even if the transistors played
There are signs of this patching in other parts of the chip too, like this one:
(sorry for the useless comment, but really, I love this so much)
What would be really nice to see is a photo of it sitting next to a real 6502 for size comparison.
If this actually works I wonder if he'd ever consider doing a kit. Would make an awesome display piece.
> Is it expensive?
>It is definitely not cheap to make one of these. If we had to ballpark what one of these would sell for — assembled and working — it would certainly be larger than $1k and smaller than $5k.
I think this is out of range for many hobbyists and even schools and the like. Projects like the ErgoDox show that kits in the range of a few hundred bucks can sell well.
>While the circuit board itself is large and a little bit expensive, the cost is actually dominated by the component and assembly costs of an extremely large number of tiny components, each of which is individually quite inexpensive. Add to that the setup and test costs of building complex things like these in small batches, and you'll immediately see how it adds up.
So, the the only way to bring the price down below USD 1000 is, besides (possibly community driven) bulk buying, a kit version.
> Is there going to be a soldering kit version of this?
> No. (But on the other hand, "Anything is a soldering kit if you're brave enough!")
This brings me to my question: Is soldering this even realistic?
Did you solder the prototype yourself? How long did that take?
I soldered the SMD diodes of a few Ergodoxen (76 for a board) and it gets boring quickly.
Can't imagine doing 4304 parts.
Disclaimer, I'm getting old. In the old days we soldered our S100 computers, and something like a 18 slot backplane had 1800 connections, and generally worked. Not unusual for a single card to have maybe 500 or so IC pins, 50 decoupling caps (100) and lets say 100 pins for jumpers and connectors. So it would be equivalent to making an entire S100 computer. I would estimate many tens of hours total.
Also I never did it but no small number of people soldered up IBM PC clone motherboards. There were also clones of Apple-II and TRS-80 model 1 in kit form.
Surface mount is a lot easier because once you learn how (and after 1000 or so components you'll be pretty good) there is no more flipping the board upside down over and over or snipping off wires. After 0204 RF chokes and microwave capacitors its nice to slum with giant digital logic parts so big you can pick them up with your fingers. Why some of the larger IC packages are so large that the device is no longer affected by solder surface tension (around say 100 pin TQFP size)
You rapidly learn tricks like using the same brand of IC socket thru the whole board and keep a wooden board around the size of the PCB so you can stuff and cover and flip and solder all the IC sockets simultaneously. Another trick is to always remove the flux, not because it electrically matters but because you can't do it without 100% inspection of each joint, and you'll probably find one or two to clean up per board. Because you probably don't own a wave solder machine it also saves time to solder bypass caps on the solder side but beware of clearance issues the board might not fit anymore LOL.
On my infinite list of things to build is the transistor clock around 2700 soldering joints. Totally doable. That might be a good place to start.
P.S. I'd love to see a 4004 kit too.
What about an 8080 or 6800? Or Z-80.
52926 BYTES FREE
Welcome to BASIC, Ver. 1.3
<TDL Z-80 8-K VERSION>
Soon: how to win at Lunar Lander.
A 6800 or 8080 at the same density would be slightly larger, and a Z80 is more than twice as large. Even the first Pentium has almost 1000x more transistors.
It sounds like the process of analyzing the Z80 in a similar way is starting: http://www.visual6502.org/wiki/index.php?title=Z8400
Now that Intel and others are facing issues at ~5-10nm scale, we will be facing a similar "clock" problem again. There are a few paths forward including: i) smarter microprocessor design tuned to intended application use, and ii) increased parallelization of tasks across multiple cores/cpus/machines.
Edit: nanometer -> sub-micrometer
707 x .32m ~= 230m
evilmadscientist's datasheets are fantastic tools for learning
Are there higher drive strength transistor available that could make it faster?
PS. love this project.