Hacker News new | past | comments | ask | show | jobs | submit login
Hadi Esmaeilzadeh on Dark Silicon (intelligence.org)
75 points by seventeenorbust on Oct 26, 2013 | hide | past | web | favorite | 50 comments

Super long story short (literally, the first question doesn't get answer for something like 1000 words later):

What is dark silicon?

"Dark silicon is the fraction of chip that needs to be powered off at all times due to power constraints."

Long and short is that because of a combo of energy use and heat dissipation at the scale of upcoming transistors, maybe upwards of 50% of the transistors on a chip may need to be off. This then begs the question, why continue developing smaller scales of transistors if the technology can't be utilized because of heat and power needs.

Even if you can't power every transistor simultaneously, shrinking allows you to layer on a whole bunch of specialized logic blocks. They're dark almost all the time, and when they're lit up they're displacing much more energy intensive general purpose computation.

Things like hashing/encryption functions, hardware encoders and decoders, even full blown FPGAs are all candidates.

You can also make more, slower cores, running at lower voltage, if you've got any embarrassingly parallel problems to work on.

It's not as attractive as just getting faster and more energy efficient across the board like we have been doing, but there's still room to move there.

Specialization is one of the ways forward mentioned in the article. But the ROI in this approach is much lower, and, as the article says, it could not justify the big investments that are required to create a new process.

>it could not justify the big investments that are required to create a new process.

Not yet, perhaps. New processes are effectively guaranteed to be needed at some point though.

Smaller transistors require lower voltage, and heat generated is related to the square of the voltage at a given size. So it's still useful to try to shrink the fab process.

However fab process shrink is a very expensive endeavor and the article argues that: "We may stop scaling not because of the physical limitations, but because of the economics."

So while for awhile now we knew we were going to hit the physical limitation of silicon at some point, the reality might actually be even worse because due to dark silicon and the economical constraints it imposes, there may not be enough incentive to keep shrinking.

To be fair to silicon, the human brain is very similar. It's never 100% active but at one time or another it all gets used.

To be fair, we have absolutely no idea how the human brain functions, and whether the analogy with silicon is relevant at all... Some have posited that consciousness is a function of a wave-like property of the entire brain, for example. That may be a load of bollocks, but the reality remains that we don't know much about how the brain does its magic.

We don't know how it works in detail but we can see which parts of the brain are doing stuff at any given time, and we've had a lot of progress associating different functions with different areas of the brain. http://en.wikipedia.org/wiki/Functional_magnetic_resonance_i...

No, we can detect changes in blood flow, and we've posited that there's a correlation between this blood flow and brain activity, and found that that is consistent with the way the blood flows alter.

This does not in any way rule out that the parts of the brain that are not showing up in the fMRI are participating in the thinking process at that moment. We have no idea whatsoever of how the thinking process itself unfolds.

That is not a good description of the concept of dark silicon. A better description would be, we cannot power all transistors at the same time because there is a power limit with regards to how much power a chip can consume.

Just because you may only be able to power x% of your chip, doesn't mean x% of your chip is useless. Some times you need to turn off parts of the chips anyways. Examples include cache misses or particular hardware flows. Other times you can decrease frequency due to program characteristics. For everything else we throttle because we cant power all the transistors at all times.

"I personally like to see the use of biological nervous tissues for general-purpose computing. Furthermore, using physical properties of devices and building hybrid analog-digital general-purpose computers is extremely enticing."

Curious how the author doesn't mention the upcoming Memristor [1][2] computing paradigm. Memristor computing will be based upon an entirely different architecture, which will likely revolutionize computing. There is much more to it than I can describe here, but here's at least one company commercializing this tech as we speak [2].

1. http://en.wikipedia.org/wiki/Memristor

2. http://iopscience.iop.org/0957-4484/24/38/383001/pdf/0957-44...

2. http://www.crossbar-inc.com/

Entirely different architectures are very unlikely to revolutionize any industry due to path dependence.

That path dependence is really an important concept to consider when reflecting about technical standarts, programming languages and frameworks.

The actual title of the article is: "Hadi Esmaeilzadeh on Dark Silicon".

The first question of the interviewer is where the (misleading) HN title comes from:

Luke Muehlhauser: Could you please explain for our readers what “dark silicon” is, and why it poses a threat to the historical exponential trend in computing performance growth?

The most interesting part was that to combat this lack of general-purpose performance increase, specialised computing units, fpgas, and Neural Processing Units (NPUs) have been proposed. Linked paper on npus: http://www.cc.gatech.edu/~hadi/doc/paper/2013-toppicks-npu.p... . Linked paper on "an architectural framework from the ISA (Instruction Set Architecture) to the microarchitecture, which conventional processors can use to trade accuracy for efficiency." http://www.cc.gatech.edu/~hadi/doc/paper/2012-asplos-truffle...

Tl,dr: Two things will cause that chips won't get that much better performance:

- energy efficiency at transistor level will cause hassles, it doesn't scales down well

- gains in performance from multiprocessors are smaller than from improving performance of cores

News at 11? Sorry for the negative, but isn't this what has been said for a decade now?

This whole interview seemed to be an exercise in lengthily recapping well-known things, like how the market for computers is different from the market for paper towels. Presumably the intended audience isn't from the tech industry.

Most of the comments here are talking about the perhaps? controversial analysis about processing capability. IMO, That's not really the most important point in this article, for the HN crowd.

It's a long article, so I will sum up what I think is the most important point: we are going to have to think of ways of creating value that no longer leverage exponential deterministic data processing capability. The next venues of profit and growth will be in figuring out how to deal with non-deterministic or open-ended questions where an exact answer is not necessary, or better yet, may not be optimal.

> Can you even imagine running out of Microsoft Windows?

This is a particularly poor example: Microsoft controls the supply of Windows licenses and installation media. We have all ran out of Windows 2000 several years back and I don't remember if the supply of Windows XP still exists.

Any free as in speech software would be a better example.

Like linux ISOs? Most of the world's bandwidth is used transferring linux ISO.

Folks, this is a HUGE problem that will affect our entire industry. There is no replacing an exponential growth function. Sure, we can have a few "one-offs" here and there, but the loss of the exponential growth of computer performance cannot be replaced.

What happens when computers no longer get any faster? People have been so used to the 40 year exponential trend that they can't even comprehend that it will end. We're talking about the next 4-5 years.

For one, we could start programming efficiently and natively again instead of going through layers of interpreted languages and abstractions. Also, shared-nothing architectures are still scaling well for many workloads.

we're already at the point where ppl don't buy new PCs because they are still fast enough.

getting to the point where silicon development stops and full-on quantum computing begins has been a long time coming. these d-wave machines are still a long ways from a proper quantum computer.

There is also the option of alternative substrates, such as graphene, carbon nanotubes or doped diamond.

Diamond could operate at higher temperatures and frequencies (giving us better single-threaded performance again). Graphene would be more suited to the current manufacturing process. I don't know much about nanotube processors other than that they are in research too.

I don't get it -- most of the transistors I administer are off at any given moment, way more than 50% of them. If there's heat-density issues, can't they just pack some nice relatively-cool DRAM and flash around each core and improve my bus bandwidth/latency situation?

Controller latency. Hierarchy of all memory and cache has a complexity/memory-size correlation in the controller that directly affects performance. Also, power density in hot-spots is the issue, and fundamentally you want all the fastest switching parts close to each other, which fights any effort to move them apart. There will always be fast-switching, constantly used transistors next to others. Also, at the small scale, the heat has to move linearly to the chip package before it can spread three-dimensionally at all, and it's basically linear at the center of the hot-spot. The heat is going through a straw instead of a block.

Multicore in essence is breaking up the hot spot and spreading it out. Our choices are limited simpler CPU's and more of them to flex this technique. GPU design is more geared toward this, and unified address space on newer AMD chips as well as potentially Nvidia's Tegra design evolution (project Denver?) both point to smaller, simpler CPU's or something like sub-CPU's (instructions are already translated to micro-opcodes and scheduled differently than they appear in the program text) that only do parts of the work instead of operating on whole threads. CPU's are binary code runtimes implemented in hardware, so this kind of abstraction is like changing the runtime without changing the bytecode fed to it. We might end up at a situation where CPU's work on 100 threads with a blurry definition of what a core even is anymore. It will happen slowly, as microprocessors retain major similarities over the years. Convergent evolution and too much engineering and experience to start anything from scratch. Can you find the FPU's in each generation? http://chip-architect.com/news/AMD_family_pic.jpg I always marveled at how much silicon is necessary for SIMD FP.

I somewhat doubt this is relevant at the chip scale, but we're at 10cm per clock cycle at speed-of-light to put the frequency into perspective.

Traditional DRAM process and cpu process are very different. Building eDRAM into your chip is very expensive, and raises costs significantly.

I guess we need diamond substrate. For real this time.

Among the potential solutions, he doesn't mention reversible computing, which I found weird, because the whole point of reversible computing is drastic reduction in power draw.

Maybe he doesn't think it would ever be fast enough, or maybe he thinks it can only apply at really small scales (i.e. nanotech) and there's no smooth incremental path there from where we are now?

We're a long way from needing reversible computing, silicon isn't going to get us anywhere near the theoretical efficiency the universe allows us in non-reversible computing. Maybe after we transition to photonics or ballistic electrons or DNA computing or something and plumb that technique to the limits of it's efficiency we can worry about reversible computing. In the mean time, we really only know how to make good computers out of silicon and dark silicon is really only a concern in a silicon context.

Are there any conceivable practical ways to implement reversible computing? I think it makes sense not to mention ideas that are super hard to fulfill.

Dark silicon isn't a problem, it's a partial solution to the real problem, power density.

Agreed. To be more specific, it's the end of Dennard scaling. Dark silicon is just a symptom of the problem.

I wish there was a legal body that could order the author to explain his core thesis in one paragraph.

That's called the "dark text problem." ;-)

i thought this would be about shady startups helping out the dark side. :\

I have the solution! Behold:


"Under more conservative (realistic) assumptions, multicore scaling provides a total performance gain of 3.7× (14% per year) over ten years."

so in 10 years haskell maybe as fast a C/C++ program without parallelism, not exactly my definition of future proof.

But what if single core performance increases stop completely? In that case, parallelism wins out in the end as you keep adding more and more cores, given a problem amenable to parallelism.

I believe the problem is at the chip level, not the software level

Well, software requirements and abilities drive chip design, and vice-versa. Many-core designs don't have power density constraints that are quite so restrictive, but they're hard to utilize to their fullest. Haskell can help with that.

I think the other side of the problem though is that if you can still get 100X performance from what we have today, but it costs 100X in power, you will still have a problem where few can really use it. Consider phones as one data point. Even quadrupling the power draw will make the chips unsuitable for use.

What does this have to do with anything?

Well if you read the article to the end (which, granted, is not an easy task) the author finally after a lot meandering reaches the conclusion that you may solve the problem of dark silicon by stopping the increase in clock-speeds while using the increase in transistors to build multiple cores. He then says that this will not result in proportionally higher performance because our software does not run well on multiple cores.

This is where Haskell comes in. Haskell allows you to program in a way that makes parallelization relatively easy.

Well for that matter so does Clojure, and both have had a while to pick up traction. Do you think it'll be easier to solve the primarily-social problem of getting traction for languages like that (and I say this as someone who adores Clojure and has some affection for Haskell) or to solve the primarily-technical problem of figuring out chip technologies where one can metaphorically jack up current software and slide the new chip technology in underneath?

+95% of the article is about physical/hardware constraints: energy efficiency, architectural efficiency, etc. Haskell isn't going to solve these problems. Let me quote you: "I have the solution!"

It seems like you want to advertise a tool, instead of actually considering the issues.

He's not saying software doesn't run well on multiple cores and we need to learn how to program applications that can utilize multiple cores efficiently. That's a software problem. This is a hardware problem.

His argument is that scaling silicon to multiple cores is facing physical barriers. You're no longer getting more cycles, improved transistor density, and decreased power consumption simultaneously. Exponential scaling of cores will not result in exponential scaling of computational throughput because we're facing energy-related tradeoffs.

Thing the answer ids Erlang, which makes parallelization easy. No relative qualification required.

There are two important points the comments are somewhat missing:

On a chip there is now lots of redundancy to ensure chip functions correctly. This gets worse as variation at smaller geometries increases. So over margining because no one wants another floating point bug.

Design effort and cost is a huge issue too. Margins at chip companies are shrinking - design automation license costs, large runtimes and much more difficult verification process from design to fab to test.

A new programming paradigm where there is a certain probability of error can be tolerated should go a long way in making the underlying hardware much more efficient.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact