Hacker News new | past | comments | ask | show | jobs | submit login
The Tyranny of the Clock – Ivan Sutherland (2012) [pdf] (worrydream.com)
84 points by mr_tyzic on June 28, 2016 | hide | past | favorite | 29 comments



In the networking world Fulcrum built some very low latency switch chips used in switch routers using asynchronous logic. The Alta switch chip was the last of that generation.

http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/...

Intel acquired Fulcrum and has not had a new product. One can speculate that they were acquired in part for their experience and tool to design asynchronous pipelines.

In the DSP world Octasic makes DSPs that use asynchronous desisns:

http://www.octasic.com/technology/opus-dsp-architecture


The Fulcrum/Intel FM10000 was released but it didn't get much love.


>> Imagine what software would be like if subroutines could start and end only at preset time intervals. “My subroutines all start at 3.68 millisecond intervals; how often do yours start?”

Mine start at 50 microsecond intervals. I've worked on stuff with shorter and longer intervals. Sometimes we have lists of tasks that need to run at different rates, so scheduling becomes a real pain. Welcome to the world of real time embedded software in high performance systems. The same thing applies, we make sure the worst case execution time is within the allowed intervals and use a master clock to sync everything up.


I've done quite a few of these embedded systems with real time constraints and your summary is quite accurate. The good part for me is that once you have things nailed down they (usually, and if not then you're really in for a long night) don't shift and what works will continue to work reliably.

This in contrast to non-real-time systems which tend to just freeze for random periods of time (sometimes seconds or even minutes) without any apparent cause. That's something that really puzzles me about todays software+hardware. In theory it should all be faster than ever but in practice I spend as much or even more time waiting for my computers than I ever did in the past.

Maybe I'm just more impatient but I don't believe that's the reason here.

Real time should be the norm, not the exception, just like encrypted communications should be the norm, not the exception.

Computers should respond without noticeable latency to user input at all times.


Remember moving your cursor through menu options in an NES game? (or using a 8-bit word processor?) Computers should be like that. Consumers shouldn't accept anything else.

Responsive interfaces allow users to develop muscle memory.

I used to type in complicated sequences of commands into my commodore 64 to perform common actions. (wow, I didn't know I wanted a $HOME with some scripts in it. Now I know!)

When I'd make a typo in one of those sequences, it would commonly be quicker for me to reset the machine and start from the top.

If I performed the same action twice (with a reset inbetween) and got a different result, I could logically conclude that I had a hardware problem. (Not every HW problem is permanent. Heat and grounding problems can both be fixed, unless some threshold is crossed...)

Anyway, I figure that Wintel denied grandma that kind of computer because they learned from the hobbyists that neophyte users with quality hardware+software will exceed the creators in skill in less time than it takes to design the next gen rig--and a couple genius users will start building their own out of impatience!

There's no profit in quality hardware+software combinations.


Asynchronous logic is significantly more power efficient, so it may be one approach to "save Moore's Law" (for one generation perhaps). But it would probably require some company that really cares about power efficiency, doesn't care about industry best practices, and is willing to risk hundreds of millions in R&D.


Most of the money would be in development, not research. Elegant designs and methods exist, and the designs can be iterated very quickly. The very real problem has been interfacing with the rest of the world. Not at the hardware level (see the switch designs mentioned elsewhere in this thread), but in the interoperability of design software and other tools. Any company will need either their own design software, or to do a massive hack-job to get synchronous design software to work well without that bedrock assumption.


People have been poking at it for over 30 years now.

I'm scratching it up to "it's the future, and always will be ..."


People have been poking at every architecture improvement for 30 years. Moore's Law makes it clear that the only winning move is to go into the most popular path.

Now that Moore's Law is on its way out, people can actually try new things, and discover what pays off and what does not.


This could also be a chance to figuratively reboot computing tech, and start anew from different fundamentals: quaternary, biological, photonic...

What may have been hard to implement 30-40 years ago may be easier now with current technology. Some of these could definitely supplement existing binary/boolean silicon in certain domains if not replace it, like using actual brains in AI-as-a-Service, for image recognition and so on,


Achronix Semiconductor used it for cutting-edge, 1+GHz FPGA's. Check them out.


The industry might hit the point one day, but for now it seems to fall in the bucket of "modest payoff, very very very high cost".

The upsides are real but other avenues of development may still have higher payoff vs cost (effort).


This reminded me of one of Gustafson's reasoning for change in how numerical computations are done - currently used principles of hw architecture result in hw wasting lots of energy and time, mostly in the process where numbers get from RAM to CPU and back. It seems more people already realize this, which is good. I hope to see some general purpose hw inspired by these ideas of efficient computation.


The inefficiency you are talking about is not due to the fact that there is a synchronous clock (within each individual "block", since there needs to be some async logic going between the different clock domains of DRAM and the processor). The waste in getting numbers from RAM to the register file is primarily due to the hardware managed cache hierarchy, which we are addressing at REX Computing, along with John Gustafson as an advisor.


Waste? Or overhead in exchange for having a cache?


Obviously tooting my own horn here, but our solution is replacing hardware managed caches with software managed caches... You get to have larger, lower latency "caches" of SRAM in our solution, that also use less power in moving data from RAM to the register file through them since we very granularly control the hierarchy.


Handshake Technology: http://www.ispd.cc/slides/slides_backup/ispd06/8-2.pdf

Warning: Slightly commercial in nature. But some good information about how it works starting on page 4. Worth reading from there.


A few months ago there was another discussion here of an older article of his on the same topic [1].

Archive.org has some of their old FLEET architecture papers and slide decks: [2]

[1] https://news.ycombinator.com/item?id=11425533

[2] https://web.archive.org/web/20120227072220/http://fleet.cs.b...


Hm, I came up with this idea independently, 5 < years < 10 ago, after reading the first third of Code: The Hidden Language of Computer Hardware and Software.

Neat!

I just figured that you could redesign common ICs so that they had a new wire akin to the "carry" bit. I called it the 'done' wire, and I figured you could just tie it to the CLK of the next IC. Ya know? So 'doneness' would propagate across the surface of the motherboard (or SoC) in different ways depending on the operation it was performing. Rather than the CLK signal, which is broadcast to all points...

(I know that my idea is half baked and my description is worse. I'm glad I found this PDF!)

I knew the big advantage would be power savings. I called the idea 'slow computing', and I envisioned an 8-bit style machine that would run on solar or a hand crank and be able to pause mid calculation until enough power was available... Just like a old capacitor-based flash camera will be able to flash more frequently when you have fresh batteries in it.

You'd just wire the power system up with the logic. Suppose an adder fires a "done" at some other IC. Now, put your power system inline, like MiTM... When it gets the "done", it charges that capacitor (a very small one? :) ) and only when enough power is available does it propagate the "done". ...Maybe the "done" powers the next IC. I dunno.

As I said, half baked. Glad to find out that I'm not the only one that dreamed of 'clockless', though!


The big issue with the done signal you're referring to is how do you generate it? In other words, how does the circuit "know" that it's finished execution?

There are several options. One is to simply add a delay element to each circuit that is matched to the circuit's delay. Another is to use a circuit-level handshaking protocol, similar to that used in TCP.

It's not an easy thing to tackle and leads to performance loss in the long run relative to a synchronous design.


I don't understand. It's clear to me when an adder is 'done'. Hm, so I'm guessing it does get more complicated than that. :)


Yes you're right, in the case of an adder it's pretty straightforward. It can get complicated in other circuits.


For those interested in asynchronous circuit design, this group is one of the best in the world in the field.

http://www.cs.columbia.edu/async/


Any recommended links or papers to get a sense of the state of the art?


There doesn't seem to be any sign of recent activity on the asynchronous research center site affiliated with the article. Is anyone aware of currently active academic or industrial research groups in this field?


I would personally love try to design some fancy asynchronous stuff, but I got the impression the impression that current FPGAs would make this difficult.


do the ARM Amulet research efforts which came out of Manchester University fall under this? https://en.wikipedia.org/wiki/AMULET_microprocessor


Just forwarded this off to my EE colleagues.

I'm for approaches that may be superior overall.


I apprechiated the read.

Didn't understand it though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: