Hacker News new | comments | ask | show | jobs | submit login
Clocks for Software Engineers (zipcpu.com)
434 points by mr_tyzic on Sept 19, 2017 | hide | past | web | favorite | 59 comments

One of my favorite undergrad electrical engineering classes [0] took an innovative approach to introducing this. Instead of learning about clocks/pipelines and HDL at the same time, we only looked at the former. We created our own simulators for an ARM subset, fully in C, where there was only a single for/while loop allowed in the entire codebase, representing the clock ticks. Each pipeline stage, such as Instruction Fetch, would read from a globally instantiated struct representing one set of registers, and write to another one. If you wanted to write to the same place you read from, you could only do so once, and you'd better know exactly what you were doing.

Because we didn't need to learn a new language/IDE/environment at the same time that we learned a new paradigm, we were able to keep our feet on solid ground while working things out; we were familiar with the syntax, so as soon as we realized how to "wire something up," we could do so with minimal frustration and no need/ability to Google anything. Of course, it was left to a subsequent course to learn HDL and load it on real hardware, but for a theoretical basis, this was a perfect format. Much better than written tests!

[0] http://www.cs.princeton.edu/courses/archive/fall10/cos375/de... - see links under Design Project, specifically http://www.cs.princeton.edu/courses/archive/fall10/cos375/Cp...

I happened to do something like this in my CPU design class too. In my case I knew that writing a simulator of our design in C would be trivial compared to actually making the CPU itself + it could be used to test code much more easily (we had to write a GCD routine to "prove" it works).

You're right that it helps a LOT when it comes to implementing the actual hardware.

(I also put it through Vivado HLS, but I wasn't able to sneak that past the professor, rats! :)

I wish I had known about verilator back then, I could have compiled the verilog into c++ and ran my simulator test suite against it!

That's a pretty interesting approach. I may have to borrow that idea.

I want to write a letter to my alma meter now :l

A very good idea.

You really should include a trigger warning with comments like this -- I still have nightmares about that class.

When I took Berkeley's EECS151 class (Introduction to Digital Design and Integrated Circuits), the first lecture actually did not go over clocks. Instead, it goes over the simple building blocks of circuits - inverters, logic gates, and finally combinational logic blocks that are made up of the previous two. These components alone do not need a clock to function, and their static functions are merely subject to the physical limitations such as the speed of electrons, which we package into something called propagation delay. It is entirely possible to build clockless circuits, otherwise known as asynchronous circuits.

From the perspective of an electrical engineer and computer scientist, asynchronous circuits theoretically can be faster and more efficient. Without the restraint of a clock slowing down an entire circuit for its slowest component, asynchronous circuits can instead operate as soon as data is available, while consuming less power to overhead functions such as generating the clock and powering components that are not changing state. However, asynchronous circuits are largely the plaything of researchers, and the vast majority of today's circuits are synchronous (clocked).

The reason why we use synchronous circuits, which may relate to the reason why many students learning circuits often try to make circuits without clocks, is because of abstraction. Clocked circuits can have individual components/stages developed and analyzed separately. You leave problems that do not pertain to the function of a circuit such as data availability and stability to the clock of the overall circuit (clk-to-q delay, hold delay, etc), and can focus on functionality within an individual stage. As well, components of a circuit can be analyzed by tools we've built to automate the difficult parts of circuit design, such as routing, power supply and heat dissipation, etc. This makes developing complex circuits with large teams of engineers "easier." The abstraction of synchronous circuits is one step above asynchronous circuits. Without a clock, asynchronous circuits can run into problems where outputs of components are actually wrong for a brief moment of time due to race conditions, a problem which synchronous circuit design stops by holding information between stages stable until everything is ready to go.

The article's point of hardware design beginning with the clock is useful when you are trying to teach software engineers, who are used to thinking in a synchronous, ordered manner, about practical hardware design which is done entirely with clocks. However, it is not the complete picture when trying to create understanding of electrical engineering from the ground up. Synchronous circuits are built from asynchronous circuits, which were built from our understanding of E&M physics. Synchronous circuits are then used to build our ASICs, FPGAs, and CPUs that power our routers and computers, which run instructions based on ISA's that we compile down to from higher order languages. It's hardly surprising that engineers who are learning hardware design build clockless circuits - they aren't wrong for designing something "simple" and correct, even if it isn't currently practical. They're just operating on the wrong level of abstraction, which they should have a cursory knowledge of so synchronous circuits make sense to them.

An asynchronous multicore chip http://www.greenarraychips.com

NO CLOCKS: Most computing devices have one or more clocks that synchronize all operations. When a conventional computer is powered up and waiting to respond quickly to stimuli, clock generation and distribution are consuming energy at a huge rate by our standards, yet accomplishing nothing. This is why “starting” and “stopping” the clock is a big deal and takes much time and energy for other architectures. Our architecture explicitly omits a clock, saving energy and time among other benefits.

saving energy and time among other benefits

Wouldn't (bitcoin) miners be a good fit for async circuits? Simple logic, low power, high performance.

The quote totes an advantage whilst waiting for input. I take that to mean having really good idle performance. Miner ASICs don't make money when idle.

I don't know the details of the circuitry needed for bitcoin hashing, but it can contain steps that are not executed continuously and hence being idle until they have some work to do (i.e. they have some inputs). Reducing the power consumption of a miner directly translates into making more money.

Pipelining would make those idle circuits useful rather than less costly

And as far as I know, most of these are heavily pipelined already.

I mean could doing away with the clock improve signal propagation time while reducing consumption while hashing in theory? Maybe even reduce chip size?

You're reading my thoughts. I'm thinking about usage of that chip for mining for 5 days already (since advertising it in another thread: https://news.ycombinator.com/item?id=15249289

This chip has other good properties in addition to "no clock".

I'm not surprised that software engineers find these concepts difficult to understand at first -- it's a very different way of thinking, and everyone has to start somewhere. But I do find it kind of odd that someone would jump straight into trying to use an HDL without already knowing what the underlying logic looks like. (My CS degree program included a bit of Verilog programming, but it only showed up after about half a semester of drawing gate diagrams, Karnaugh maps and state machines.)

Does this confusion typically happen to engineers who are trying to teach themselves hardware design, or is it just an indication of a terribly-designed curriculum?

How often do people jump into javascript coding without having the faintest idea of how a CPU or anything else works? When high-level facilities are made available, there are always going to be people who go to the high-level facility without having any understanding of the actual consequences of what they are doing. I'd recommend the software folks who want to get into hardware to check out the book 'Elements of Computing Systems' and the NAND2Tetris project that its based on. Its quite easy to follow along with building your own quasi-HDL things run in a simulator (with no generative for-loop stuff, you have a clock line and you use it), then build up from there to put together a CPU, memory, develop a basic OS, implement a programming language, and eventually play a game of Tetris! All with nothing but NAND gates and the convenience of being able to 'use' those tons of gates without actually having to deal with them physically.

Bit of both, really. It's certainly a different way of thinking. But I encountered fellow students struggling with this when we were undergrads.

It doesn't help that Verilog can be used as an imperative language executing statements in order, nor that it has two different types of assignment operator of which only one should be used, nor that the simulator does not enforce realistic restrictions.

Nor does it help that the practical way to write Verilog/VHDL is to decide what output you want the compiler to give then work backwards to the HDL input. (I've met actual design companies that turned this into a rigid waterfall flow - full block diagram before you write a line of HDL)

>Does this confusion typically happen to engineers who are trying to teach themselves hardware design, or is it just an indication of a terribly-designed curriculum?

Your curriculum seems very sensible to me, you learned the basics first. After all HDLs are just describing the underlying circuit, you need to know how to design it first.

The basis of most confusion in my experience is that for most people (including pure electronic engineers) their "first contact" with anything digital is through software programming which, despite the recent trend in multicores and parallel execution, is highly sequential.

When you try HDL for the first time after years of programming CPUs it's natural to approach it with the same thought process. It just takes time to adapt to a new "way of thinking", nothing more. I'm sure if someone learns HDL design without prior software experience will face the same issues when he/she later decides to jump to software engineering

I say "terribly-designed curriculum".

Maybe engineers need to be introduced to the synthesis tools at the same time as the simulator tools.

Simulating RTL is only an approximation of reality. So emphasizing RTL simulation is bad. You see it over and over though. People teach via RTL simulation.

Synthesis is the main concern. Can the design be synthesised into HW and meet the constraints? Because all the combinatorial logic gets transformed into something quite different in a FPGA.


> The reality is that no digital logic design can work “without a clock”. There is always some physical process creating the inputs. These inputs must all be valid at some start time – this time forms the first clock “tick” in their design. Likewise, the outputs are then required from those inputs some time later. The time when all the outputs are valid given for a given set of inputs forms the next “clock” in a “clockless” design. Perhaps the first clock “tick” is when the set the last switch on their board is adjusted and the last clock “tick” is when their eye reads the result. It doesn’t matter: there is a clock.

Put another way, combinatorial systems (the AND/OR/etc[1] logic gates that form the hardware logic of the chip) have a physical propagation delay. The time it takes for the input signals at a given state to propagate through the logic and produce a stable output.

Do not use the output signal before it is stable. That way lies glitches and the death of your design.

Clocks are used to tell your logic: "NOW your inputs are valid".

The deeper your combinatorial logic (the more gates in a given signal path), the longer the propagation delay. And the maximum propagation delay across your entire chip[2] determines your minimum clock period (and thus maximum clock speed)

There exist clockless designs, but they get exponentially more complicated as you add more signals and the logic gets deeper. In a way, clocks let you "compartmentalize" the logic, simplifying the design.

[1] What's the most widespread fundamental gate in the latest fab processes nowadays? Is it NAND?

[2] or at least clock domain

> but they get exponentially more complicated as you add more signals and the logic gets deeper

Clockless designs usually use some other mechanism for marking ready signals. Many of those mechanisms compose linearly and can even interconnect better than clocked designs, because they don't need to care about clock domains.

> What's the most widespread fundamental gate in the latest fab processes nowadays? Is it NAND?

Typical standard cell libraries have hundreds of cells, including cells representing logic such as muxes, full adders, and other frequently occurring clusters of gates. Logic is mapped to these in the way the tool finds most optimal. So I don't think it's right to say that there is a fundamental gate in modern processes.

Unless you are looking at it from the electrical perspective in which case the fundamental gate is the inverter.

Thanks! I'm probably confusing it with NAND vs NOR flash memory.

It's been a while since I last talked about this with my HW Eng friends :)

I finally understand why overclocking leads to an unstable system!

Thank you.

And increasing the voltage reduces the propagation delay, letting you shorten the clock period (increase frequency). This only works so far, however, before you run into other problems.

This is such an important notion.

Another I try to explain hardware design for people coming from a software background:

You get one choice to put down in hardware as many functions as you want. You cannot change any of them later. All you can do later is sequence them in whatever order you need to accomplish your goal.

If you think of it this way, you realize that the clock is critical (that's what makes sequencing possible), and re-use of fixed functions introduces you to hardware sharing, pipelining, etc.

But it's hard to grasp.

And here's "Clocks for Hardware Engineers": [1]

[1] http://lamport.azurewebsites.net/pubs/time-clocks.pdf

Not really related to clocks (other than the fact I was watching one while waiting for that to load) but that link seemed very slow to load, I haven’t seen a link to azurewebsites before but I’m assuming that’s some sort of static file hosting on Microsoft’s Azure platform?

Loaded fine for me, maybe it's been cached now, Azure end.

It is a pdf though which can be slower.

Reading this would actually tremendously help software engineers improve their concurrent/parallel software design skills as well. I never had a particular desire to do hardware (my degree is CS) but some of the best C/C++ programmers who were able to squeeze out every last ounce of performance truly understood not just software languages but also computer architecture and I might even go as far as saying understood physics to a large extent very well. The LMAX software architecture is a product of this kind of hardware+software understanding. Awesome article.

"The reality is that no digital logic design can work 'without a clock'. "

This is not true.

"HDL based hardware loops are not like this at all. Instead, the HDL synthesis tool uses the loop description to make several copies of the logic all running in parallel."

This is not true as a general statement. There are for loops in HDLs that behave exactly like software loops. And there are generative for loops that make copies of logic.

Also, the "everything happens at once" is not true either. In fact with out the delay between two events happening, synchronous digital design would not work. (specifically flip-flops would not work).

I guess a more accurate statement would be "no practical digital logic design can work without clock (unless you are doing seldom used, generally undesirable asynchronous design)"

HDL languages do have loops, but they are for testbench purposes only, in hardware implemention non generate loops would not be implemented!

I think by saying "Everything happens at once" the author meant that all your code executes at once. He is obviously trying to get the one line at a time sequential mindset out from people used with software.

I would have said synchronous digital logic requires a clock. Clearly combinational logic does not. The type of design you do is purely dependent on your needs. Asynchronous design is neither good or bad. It's what you need or it isn't.

Here is a perfectly fine, synthesizable parity generator using a non generative for loop (vhdl):

  library ieee;
  use ieee.std_logic_1164.all;
  entity for_parity is
    port (input  : in  std_logic_vector (7 downto 0);
          even   : out std_logic;
          odd    : out std_logic);
  end for_parity ;
  architecture for_arch of for_parity is
    signal even_parity : std_logic;
    calc: process (input) is
    variable temp : std_logic;
      temp := '0';
      for i in 0 to 7 loop
        temp:=temp xor input(i);
      end loop;
      even_parity <= temp;
    end process calc;

    even <= even_parity;
    odd  <= (not even_parity);
  end for_arch;

As for the last point, I make extensive use of sequential logic, a concept made possible by delays through transistors, and I can't think of a more confusing thing to say to someone than "it all happens at once" because it does not.

I don't think asynchronous logic is undesirable.

For an example, the GA144[1] is an example of a practical computer completely implemented asynchronous logic.

Its Asynchronous nature is one of the features, benefits include lower power consumption, faster speed, and lower electromagnetic interference


I meant undesireable in the sense that it takes longer time to develop and harder to debug, but you are absolutely right about the power consumption and speed

> Also, the "everything happens at once" is not true either. In fact with out the delay between two events happening, synchronous digital design would not work. (specifically flip-flops would not work).

I think the author means something more along the lines of something like this: all of your edge triggered state machines in the same clock domain execute at once. It is unlike a traditional programming language where statements where statements execute sequentially.

yep, there were some clockless ARM cpus (never really took off though)

asynchronous digital logic forms part of any reasonable digital circuit design course :)

I liked the article, but I feel like an argument for why you need a clock was really never made.

This is definitely how I felt as well. The discussion is kind of circular, "You need a clock because doing things without a clock doesn't fit well into a clocked paradigm."

It's not an argument against clockless design, but an observation on the limitations of doing everything in a single clock tick of an already clocked chip.

We can surely build a circuit without any clock, but what challenges do we run into? How does imposing discrete clock steps help? What exactly is a clock? I would have liked to see the discussion drop a level or two of abstraction to EE or something.

Right, so what the author hinted at, but never really said, is that without a clock it is difficult to prevent spurious intermediate values output by circuits from having unwanted effects; for example, unbalanced combinational circuits typically shift between multiple intermediate values as the signals propagate through. Such hazards are normally avoided, though I can imagine such hazards might potentially contain information that could have adaptive advantage (in evolutionarily derived circuitry)

I am guessing it's because the clock ensures that the delay is always the same for all components but that also means a single clock cycle must takes as long as the slowest component. With an asynchronous design every component has a different delay but you still have to ensure that all input signals to a component arrive at roughly the same time.


Learning to think in parallel, and understand and design for procedures that don't run sequentially, would be good practice for concurrent runtimes and distributed systems too. Not only for HDLs.

The zipcpu blog posts never ceases to amaze me, the content is so good. As a sw developer who plays around in verilog on my free time, the posts are extremely helpful to me. I just want to tip my hat to the author(s?), thanks!

Since we're getting meta on the post, I find the tone to be haughty and demeaning.

"I’ll spare him the embarrassment of being named, or of linking to his project. Instead, I’m just going to call him a student. (No, I’m not a professor.) This 'student'..."

Why was this needed? What purpose did it serve the greater article except for the author to espouse their own superiority?

This kind of stuff needs to get the hell out of engineering. It turns away potentially many brilliant people that could join the field but fear the rejection by peers.

Would you still want it gone if it caused a net gain in proficient engineers, but a net loss in total engineers?

> Would you still want it gone if it caused a net gain in proficient engineers, but a net loss in total engineers?

Do you have any reason to believe that arrogance and proficiency are correlated?

Oh, you're one of those.

Thanks for the complement!

I'm actually the only one who has posted on the site.

He also hangs out and is pretty helpful on some fpga related IRCs

Hi what are some of the FPGA related IRCs?

Freenode has several FPGA related channels. Among them are ##fpga, ##verilog, and ##vhdl.

BTDTBTTS. Way back when in uni, we had to deaign a working CPU with everything at the time but superscalar, MIMD and reservation/retire. Pipelined CPUs can get faster clock rates by splitting up hardware into more (smaller) stages, but at the expense of total latency (due to adding pipeline regisers) AND slower pipeline stalls on branch prediction misses (pipeline has to be emptied of wrong micro-ops). The overall CPU can only be as fast as the slowest stage.

It looks like this Si are the mostly combinational logic for a stage and Pi are the pipeline registers between stages (nearly all signals between stages should be buffered by pipeline regs). IO is omitted but it's the same overall architecture.

    Clk --------+---------------+---... ....
                |               |
                \               \
      +-> S0 --> |P0| --> S1 --> |P1| --> .... --+
      |                                          |

The Figure 5 in that article pretty much summarizes the main point - if you show that to the original (hypothetical?) student, then this should be sufficient to make them understand the downsides of their design.

How does one go about starting a project in an HDL? I have always wanted to design and build a CPU but I've never figured out how to set up the "build chain" for VHLD. How do you implement, compile, and test different features? Is there an IDE?

Understanding the basics is important but I'm held up before the basics even start mattering.

Conceptually, this is the same idea as concurrent network programming with futures, yes?

Immediately thought it was referring to this https://news.ycombinator.com/item?id=15282967

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact