
Clocks for Software Engineers - mr_tyzic
http://zipcpu.com/blog/2017/09/18/clocks-for-sw-engineers.html
======
btown
One of my favorite undergrad electrical engineering classes [0] took an
innovative approach to introducing this. Instead of learning about
clocks/pipelines and HDL at the same time, we only looked at the former. We
created our own simulators for an ARM subset, fully in C, where there was only
a single for/while loop allowed in the entire codebase, representing the clock
ticks. Each pipeline stage, such as Instruction Fetch, would read from a
globally instantiated struct representing one set of registers, and write to
another one. If you wanted to write to the same place you read from, you could
only do so once, and you'd better know exactly what you were doing.

Because we didn't need to learn a new language/IDE/environment at the same
time that we learned a new paradigm, we were able to keep our feet on solid
ground while working things out; we were familiar with the syntax, so as soon
as we realized how to "wire something up," we could do so with minimal
frustration and no need/ability to Google anything. Of course, it was left to
a subsequent course to learn HDL and load it on real hardware, but for a
theoretical basis, this was a perfect format. Much better than written tests!

[0]
[http://www.cs.princeton.edu/courses/archive/fall10/cos375/de...](http://www.cs.princeton.edu/courses/archive/fall10/cos375/descrip.html)
\- see links under Design Project, specifically
[http://www.cs.princeton.edu/courses/archive/fall10/cos375/Cp...](http://www.cs.princeton.edu/courses/archive/fall10/cos375/Cproject10.pdf)

~~~
aarongolliver
I happened to do something like this in my CPU design class too. In my case I
knew that writing a simulator of our design in C would be trivial compared to
actually making the CPU itself + it could be used to test code much more
easily (we had to write a GCD routine to "prove" it works).

You're right that it helps a LOT when it comes to implementing the actual
hardware.

(I also put it through Vivado HLS, but I wasn't able to sneak that past the
professor, rats! :)

I wish I had known about verilator back then, I could have compiled the
verilog into c++ and ran my simulator test suite against it!

------
Joking_Phantom
When I took Berkeley's EECS151 class (Introduction to Digital Design and
Integrated Circuits), the first lecture actually did not go over clocks.
Instead, it goes over the simple building blocks of circuits - inverters,
logic gates, and finally combinational logic blocks that are made up of the
previous two. These components alone do not need a clock to function, and
their static functions are merely subject to the physical limitations such as
the speed of electrons, which we package into something called propagation
delay. It is entirely possible to build clockless circuits, otherwise known as
asynchronous circuits.

From the perspective of an electrical engineer and computer scientist,
asynchronous circuits theoretically can be faster and more efficient. Without
the restraint of a clock slowing down an entire circuit for its slowest
component, asynchronous circuits can instead operate as soon as data is
available, while consuming less power to overhead functions such as generating
the clock and powering components that are not changing state. However,
asynchronous circuits are largely the plaything of researchers, and the vast
majority of today's circuits are synchronous (clocked).

The reason why we use synchronous circuits, which may relate to the reason why
many students learning circuits often try to make circuits without clocks, is
because of abstraction. Clocked circuits can have individual components/stages
developed and analyzed separately. You leave problems that do not pertain to
the function of a circuit such as data availability and stability to the clock
of the overall circuit (clk-to-q delay, hold delay, etc), and can focus on
functionality within an individual stage. As well, components of a circuit can
be analyzed by tools we've built to automate the difficult parts of circuit
design, such as routing, power supply and heat dissipation, etc. This makes
developing complex circuits with large teams of engineers "easier." The
abstraction of synchronous circuits is one step above asynchronous circuits.
Without a clock, asynchronous circuits can run into problems where outputs of
components are actually wrong for a brief moment of time due to race
conditions, a problem which synchronous circuit design stops by holding
information between stages stable until everything is ready to go.

The article's point of hardware design beginning with the clock is useful when
you are trying to teach software engineers, who are used to thinking in a
synchronous, ordered manner, about practical hardware design which is done
entirely with clocks. However, it is not the complete picture when trying to
create understanding of electrical engineering from the ground up. Synchronous
circuits are built from asynchronous circuits, which were built from our
understanding of E&M physics. Synchronous circuits are then used to build our
ASICs, FPGAs, and CPUs that power our routers and computers, which run
instructions based on ISA's that we compile down to from higher order
languages. It's hardly surprising that engineers who are learning hardware
design build clockless circuits - they aren't wrong for designing something
"simple" and correct, even if it isn't currently practical. They're just
operating on the wrong level of abstraction, which they should have a cursory
knowledge of so synchronous circuits make sense to them.

~~~
avodonosov
An asynchronous multicore chip
[http://www.greenarraychips.com](http://www.greenarraychips.com)

 _NO CLOCKS: Most computing devices have one or more clocks that synchronize
all operations. When a conventional computer is powered up and waiting to
respond quickly to stimuli, clock generation and distribution are consuming
energy at a huge rate by our standards, yet accomplishing nothing. This is why
“starting” and “stopping” the clock is a big deal and takes much time and
energy for other architectures. Our architecture explicitly omits a clock,
saving energy and time among other benefits._

~~~
a_imho
_saving energy and time among other benefits_

Wouldn't (bitcoin) miners be a good fit for async circuits? Simple logic, low
power, high performance.

~~~
rocqua
The quote totes an advantage whilst waiting for input. I take that to mean
having really good idle performance. Miner ASICs don't make money when idle.

~~~
ithkuil
I don't know the details of the circuitry needed for bitcoin hashing, but it
can contain steps that are not executed continuously and hence being idle
until they have some work to do (i.e. they have some inputs). Reducing the
power consumption of a miner directly translates into making more money.

~~~
firethief
Pipelining would make those idle circuits useful rather than less costly

~~~
15155
And as far as I know, most of these are heavily pipelined already.

------
teraflop
I'm not surprised that software engineers find these concepts difficult to
understand at first -- it's a very different way of thinking, and everyone has
to start somewhere. But I do find it kind of odd that someone would jump
straight into trying to use an HDL without already knowing what the underlying
logic looks like. (My CS degree program included a bit of Verilog programming,
but it only showed up after about half a semester of drawing gate diagrams,
Karnaugh maps and state machines.)

Does this confusion typically happen to engineers who are trying to teach
themselves hardware design, or is it just an indication of a terribly-designed
curriculum?

~~~
otakucode
How often do people jump into javascript coding without having the faintest
idea of how a CPU or anything else works? When high-level facilities are made
available, there are always going to be people who go to the high-level
facility without having any understanding of the actual consequences of what
they are doing. I'd recommend the software folks who want to get into hardware
to check out the book 'Elements of Computing Systems' and the NAND2Tetris
project that its based on. Its quite easy to follow along with building your
own quasi-HDL things run in a simulator (with no generative for-loop stuff,
you have a clock line and you use it), then build up from there to put
together a CPU, memory, develop a basic OS, implement a programming language,
and eventually play a game of Tetris! All with nothing but NAND gates and the
convenience of being able to 'use' those tons of gates without actually having
to deal with them physically.

------
AceJohnny2
TL;DR:

> _The reality is that no digital logic design can work “without a clock”.
> There is always some physical process creating the inputs. These inputs must
> all be valid at some start time – this time forms the first clock “tick” in
> their design. Likewise, the outputs are then required from those inputs some
> time later. The time when all the outputs are valid given for a given set of
> inputs forms the next “clock” in a “clockless” design. Perhaps the first
> clock “tick” is when the set the last switch on their board is adjusted and
> the last clock “tick” is when their eye reads the result. It doesn’t matter:
> there is a clock._

Put another way, combinatorial systems (the AND/OR/etc[1] logic gates that
form the hardware logic of the chip) have a physical _propagation delay_. The
time it takes for the input signals at a given state to propagate through the
logic and produce a _stable_ output.

Do not use the output signal before it is stable. That way lies glitches and
the death of your design.

Clocks are used to tell your logic: "NOW your inputs are valid".

The deeper your combinatorial logic (the more gates in a given signal path),
the longer the propagation delay. And the maximum propagation delay across
your entire chip[2] determines your minimum clock period (and thus maximum
clock speed)

There exist clockless designs, but they get exponentially more complicated as
you add more signals and the logic gets deeper. In a way, clocks let you
"compartmentalize" the logic, simplifying the design.

[1] What's the most widespread fundamental gate in the latest fab processes
nowadays? Is it NAND?

[2] or at least clock domain

~~~
PhaseLockk
> What's the most widespread fundamental gate in the latest fab processes
> nowadays? Is it NAND?

Typical standard cell libraries have hundreds of cells, including cells
representing logic such as muxes, full adders, and other frequently occurring
clusters of gates. Logic is mapped to these in the way the tool finds most
optimal. So I don't think it's right to say that there is a fundamental gate
in modern processes.

Unless you are looking at it from the electrical perspective in which case the
fundamental gate is the inverter.

~~~
AceJohnny2
Thanks! I'm probably confusing it with NAND vs NOR flash memory.

It's been a while since I last talked about this with my HW Eng friends :)

------
alain94040
This is such an important notion.

Another I try to explain hardware design for people coming from a software
background:

You get one choice to put down in hardware as many functions as you want. You
cannot change any of them later. All you can do later is sequence them in
whatever order you need to accomplish your goal.

If you think of it this way, you realize that the clock is critical (that's
what makes sequencing possible), and re-use of fixed functions introduces you
to hardware sharing, pipelining, etc.

But it's hard to grasp.

------
amelius
And here's "Clocks for Hardware Engineers": [1]

[1] [http://lamport.azurewebsites.net/pubs/time-
clocks.pdf](http://lamport.azurewebsites.net/pubs/time-clocks.pdf)

~~~
mrmondo
Not really related to clocks (other than the fact I was watching one while
waiting for that to load) but that link seemed very slow to load, I haven’t
seen a link to azurewebsites before but I’m assuming that’s some sort of
static file hosting on Microsoft’s Azure platform?

~~~
jamiek88
Loaded fine for me, maybe it's been cached now, Azure end.

It is a pdf though which can be slower.

------
martin1975
Reading this would actually tremendously help software engineers improve their
concurrent/parallel software design skills as well. I never had a particular
desire to do hardware (my degree is CS) but some of the best C/C++ programmers
who were able to squeeze out every last ounce of performance truly understood
not just software languages but also computer architecture and I might even go
as far as saying understood physics to a large extent very well. The LMAX
software architecture is a product of this kind of hardware+software
understanding. Awesome article.

~~~
ShroudedNight
Article:
[https://martinfowler.com/articles/lmax.html](https://martinfowler.com/articles/lmax.html)

------
DigitalJack
"The reality is that no digital logic design can work 'without a clock'. "

This is not true.

"HDL based hardware loops are not like this at all. Instead, the HDL synthesis
tool uses the loop description to make several copies of the logic all running
in parallel."

This is not true as a general statement. There are for loops in HDLs that
behave exactly like software loops. And there are generative for loops that
make copies of logic.

Also, the "everything happens at once" is not true either. In fact with out
the delay between two events happening, synchronous digital design would not
work. (specifically flip-flops would not work).

~~~
tonmoy
I guess a more accurate statement would be "no practical digital logic design
can work without clock (unless you are doing seldom used, generally
undesirable asynchronous design)"

HDL languages do have loops, but they are for testbench purposes only, in
hardware implemention non generate loops would not be implemented!

I think by saying "Everything happens at once" the author meant that all your
code executes at once. He is obviously trying to get the one line at a time
sequential mindset out from people used with software.

~~~
forthfifthsixth
I don't think asynchronous logic is undesirable.

For an example, the GA144[1] is an example of a practical computer completely
implemented asynchronous logic.

Its Asynchronous nature is one of the features, benefits include lower power
consumption, faster speed, and lower electromagnetic interference

[1][http://www.greenarraychips.com/home/documents/greg/PB001-100...](http://www.greenarraychips.com/home/documents/greg/PB001-100503-GA144-1-10.pdf)

~~~
tonmoy
I meant undesireable in the sense that it takes longer time to develop and
harder to debug, but you are absolutely right about the power consumption and
speed

------
jonnycomputer
I liked the article, but I feel like an argument for why you need a clock was
really never made.

~~~
xelxebar
This is definitely how I felt as well. The discussion is kind of circular,
"You need a clock because doing things without a clock doesn't fit well into a
clocked paradigm."

It's not an argument against clockless design, but an observation on the
limitations of doing everything in a single clock tick of an already clocked
chip.

We can surely build a circuit without any clock, but what challenges do we run
into? How does imposing discrete clock steps help? What exactly _is_ a clock?
I would have liked to see the discussion drop a level or two of abstraction to
EE or something.

~~~
jonnycomputer
Right, so what the author hinted at, but never really said, is that without a
clock it is difficult to prevent spurious intermediate values output by
circuits from having unwanted effects; for example, unbalanced combinational
circuits typically shift between multiple intermediate values as the signals
propagate through. Such hazards are normally avoided, though I can imagine
such hazards might potentially contain information that could have adaptive
advantage (in evolutionarily derived circuitry)

------
mzzter
Learning to think in parallel, and understand and design for procedures that
don't run sequentially, would be good practice for concurrent runtimes and
distributed systems too. Not only for HDLs.

------
kbeckmann
The zipcpu blog posts never ceases to amaze me, the content is so good. As a
sw developer who plays around in verilog on my free time, the posts are
extremely helpful to me. I just want to tip my hat to the author(s?), thanks!

~~~
dclowd9901
Since we're getting meta on the post, I find the tone to be haughty and
demeaning.

"I’ll spare him the embarrassment of being named, or of linking to his
project. Instead, I’m just going to call him a student. (No, I’m not a
professor.) This 'student'..."

Why was this needed? What purpose did it serve the greater article except for
the author to espouse their own superiority?

This kind of stuff needs to get the hell out of engineering. It turns away
potentially many brilliant people that could join the field but fear the
rejection by peers.

~~~
bendbro
Would you still want it gone if it caused a net gain in proficient engineers,
but a net loss in total engineers?

~~~
edmccard
> Would you still want it gone if it caused a net gain in proficient
> engineers, but a net loss in total engineers?

Do you have any reason to believe that arrogance and proficiency are
correlated?

------
trapperkeeper74
BTDTBTTS. Way back when in uni, we had to deaign a working CPU with everything
at the time but superscalar, MIMD and reservation/retire. Pipelined CPUs can
get faster clock rates by splitting up hardware into more (smaller) stages,
but at the expense of total latency (due to adding pipeline regisers) AND
slower pipeline stalls on branch prediction misses (pipeline has to be emptied
of wrong micro-ops). The overall CPU can only be as fast as the slowest stage.

It looks like this Si are the mostly combinational logic for a stage and Pi
are the pipeline registers between stages (nearly all signals between stages
should be buffered by pipeline regs). IO is omitted but it's the same overall
architecture.

    
    
        Clk --------+---------------+---... ....
                    |               |
                    \               \
          +-> S0 --> |P0| --> S1 --> |P1| --> .... --+
          |                                          |
          +------------------------------------------+

------
PeterisP
The Figure 5 in that article pretty much summarizes the main point - if you
show that to the original (hypothetical?) student, then this should be
sufficient to make them understand the downsides of their design.

------
gravypod
How does one go about starting a project in an HDL? I have always wanted to
design and build a CPU but I've never figured out how to set up the "build
chain" for VHLD. How do you implement, compile, and test different features?
Is there an IDE?

Understanding the basics is important but I'm held up before the basics even
start mattering.

------
gertef
Conceptually, this is the same idea as concurrent network programming with
futures, yes?

------
blackbear_
Immediately thought it was referring to this
[https://news.ycombinator.com/item?id=15282967](https://news.ycombinator.com/item?id=15282967)

