I remember first time visiting this page when I was 11yrs old and I couldn‘t understand a thing. Revisited then multiple times and it contributed greatly to my understanding of CPUs and my first logisim self-designed CPU was heavily based on this :)
> Eight 74172s provide eight 16-bit registers in a three-port register file. This file may simultaneously write one register ("A"), read a second ("B"), and read or write a third ("C").
It really does do all that with just the 74172s. The 74172 is a register file containing 8x 2-bit words, with multiple ports for reading and writing, which are split up into a couple of sections.
Section 1 has independent read and write ports. The write port consists of data input DA[1..0], address AA[2..0] and write enable ~WEA. If ~WEA is low, data is written from DA to the register selected by AA on the positive edge of the clock. The read port consists of data output QB[1..0], address AB[2..0], and read enable ~REB. When ~REB is low, the contents of the register selected by AB are output on QB.
Section 2 has another set of read and write ports, but this time with a common address. Read port is DC[1..0], write port is QC[1..0], address is AC[2..0], and read and write enables are ~REC and ~WEC.
In the PISC, there are eight of these chips with all their control lines tied together, so you get a single 8x16 register file with all of the features described above.
> In a single clock cycle, the following occurs:
> a) one register is output to the Address bus and the ALU's A input;
...using section 1's read port.
> b1) another register may be output to the Data bus and the ALU's B input; or
...using section 2's read port.
> b2) data from memory may be input to another register;
...using section's 2 write port.
> c) an ALU function is applied to A (and perhaps B) and the result is stored in the first (address) register.
I guess that the 6502 has about four times as many NAND-gates as the Gigatron. I understand that some people are working on emulator for 6502 code for the Gigatron. The Gigatron is a bit (extreme) risc processor. It has no micro code and the instruction decoding consists of a matrix a simple diode matrix. This results in many instructions that perform the same operation or no operation at all. The Gigatron runs a program that generates the VGA signal (with reduced resolution) and an emulator for a 16-bits CIS processor. The actual programs executed by the Gigatron are written for the emulator.
A similar approach to demonstrate a GPU would be great. Any recommendations?
Well, let me rephrase. A GPU these days has two distinct features, graphics-processing, and GPGPU. I'm less interested in the graphics part (since that pipeline could be studied in software, and in hardware, it's very specialized/ASICy).
So I'm really interested in the massively-parallel GPGPU aspect of a GPU.
These kinds of projects always take you up to where CPUs were in the early 1950s on Large Systems or the 1970s in the home computing world: Single-issue processors with no memory protection or privilege levels. They work, in that you can write useful software for those systems, but taken as a way to explain how CPUs work in a holistic fashion they fall well short. They simply aren't complex enough to explain why Meltdown happens, for example: Since there is no concept of privilege to begin with, you can't use them to explain privilege level violations. More prosaically, you can't explain a "cold cache" when the processor doesn't have a cache which can be cold.
This is demoralizing for the poor sods who think they're going to learn how CPUs work and end up with a CPU design which is decades out of date and no way to extend it to even a thirty-year-old design. "You can't get there from here" is the bane of tutorials which explain the basics and then stop.
Still, understanding any general purpose CPU gets you most of the way. Virtual memory and memory protection isn't terribly far off, could be implemented using paging. A very simple feature complete system could be made. A very basic model, yet working. A union between theory and practice-
Heartbleed is because of memory and cache shenanigans. More like how things can get wrong if you optimize too hard. While important, it feels like another line of thinking.
That seems a bit harsh. You have to start somewhere, and this teaches basic stuff like an ALU, Clock, Accumulator, etc. Some homebrew CPUs have memory bank switching.
I don't intend to be harsh, but I am disappointed that there's apparently no ramp-up from the basics to a modern CPU design, that you get taken to some point where everything works but there's no path to stuff like out-of-order execution.
- H&P does a very good job of giving updating arch knowledge, but it's mainly "theoretical"
- In order to build/simulate modern arch features, a lot of work needs to be done.
So we have simulators like gem5. Do you think what's seriously lacking is, for some expert to sit down with H&P and gem5, implement a ton of modern features, analyze them, and write a big textbook about it along the lines of 'Practical Computer Architecture'?
> Do you think what's seriously lacking is, for some expert to sit down with H&P and gem5, implement a ton of modern features, analyze them, and write a big textbook about it along the lines of 'Practical Computer Architecture'?
That would certainly help. Another good step in the right direction would simply be taking a processor with a good mix of modern features and giving a guided tour. Everything is done by something, so show all of those somethings and how they're put together, and how they all fit together into a complete design.
Consider researching different CPU architecture college classes. My friend took one that involved building a CPU with functions like out of order execution etc, while mine did not
I am speaking out of ignorance, but my understanding is that modern graphics pipelines consist of software "shader" programs running on the GPGPU hardware. The GPU makers aren't including vast numbers of compute cores just as a bonus feature, they're how the graphics part works. Each little core runs the same shader program to render its set of pixels, reading and writing out to different offsets in shared memory buffers. The "general purpose" use basically boils down to writing a shader program that does useful math rather than draw pretty pictures.
2. Polygon rasterization (often limited to points and triangles).
3. Texture sampling. This is accessible in CUDA (I have ~zero experience with other GPGPU systems).
4. Afaik also for blending. This might have stopped now.
5. Video codecs (MPEG-2, H.263, H.264, H.265, VP8, VP9, soon AV1) decoding, and also encoding for some of them.
Nvidia RTX also include ray tracing hardware that handles that task more efficienly (I presume by using fixed logic for dispatching memory/cache-aware computations like e.g. content-addressable memory and such).
Most things are handled by the shader cores. They are 1024bit SIMD with lane-masking until Volta, and a more flexible/arbitrary fork/join since Turing (not all Turing has the ray tracing hardware), which also brought a scalar execution port with it (like amd64 getting traditional RAX/RDX/etc. with their opcodes after only having AVX instructions). AMD GCN afaik has a quite explicit SIMD architecture, with a scalar execution port since inception. Also 1024bit iirc.
Now that you mention, could you point to any reading resource for getting into the nitty gritty of the fixed hardware part of the GPU?
From my interactions with some folks, I'm under the impression that this stuff is kept proprietary and both nVidia and AMD don't reveal the details.
I'm just wondering if there are some general principles related to the fixed hardware aspect of the graphics pipeline of a GPU, that are compiled in a text or review paper or something.