Hacker News new | comments | ask | show | jobs | submit login
Chip advances require fresh ideas, speakers say (eetimes.com)
91 points by bcaulfield on Feb 13, 2018 | hide | past | web | favorite | 41 comments

I think computing technology stagnating will probably one of the unexpected stories of the next decade. Look at Intel. Canon Lake (10nm) was supposed to be released in 2016 and now it's pushed back to mid-2018. Will even be able to get the yields to ship in volume at that time? A weird consequence of this will be the rest of the world will catch up in semi-conductor fab technology.

Also, we are starting to see weird stuff happen with GPU/processor and memory prices going up as inflationary pressures are able to outrun chip technology advances. We might start to see commodity computer products hoarded as a store of value.

CPU performance stagnation has already been the (somewhat unexpected) story of the last decade: Computer have simply become fast enough for almost all users, so progress has mostly been invested into power saving.

Advances have mostly been made in storage speed (SSDs), battery performance, and display quality (resolution, colours, and lately refresh rate), plus vector processing for machine learning.

>Computer have simply become fast enough for almost all users

I think this is a misconception. Computers were "fast enough for users" since mid-eighties. Anything is fast enough if you cannot imagine it working better. What happened is that companies came up with more advanced software products for consumers and that drove the demand for faster consumer hardware.

The reality is that all the cutting-edge computing today requires a modern $600 GPU. The difference is that in the past we would have expected this kind of hardware to "trickle down" to normal users in couple of years. Today we don't. Instead of marketing better computers companies market cloud services. It's a really shitty phenomenon. We're seeing the reversal of the PC revolution.

> Anything is fast enough if you cannot imagine it working better

Exactly. Is memory cheap enough that I can load the entire contents of my hard disk into it at a reasonable price? Is optane (or something similar) cheap enough that syncs from memory back onto something persistent happen at close to realtime? Are processors fast/parallel enough that I can actually start running all the programs I will ever want to run and just switch to them when necessary? Do I have enough cores that even if most of my running processes start glitching out and using 100% cpu, the rest of them will still run buttery smooth? Add to that that resolution of monitors will increase, and details displayed even on lower resolutions will only get more intricate. And things currently on the high-end can only get cheaper.

I'd add GPUs and RAM sizes to that last. There are a huge number of computing tasks which fall into either the "throw more FPU at it" or the "throw more IO at it" categories. A lot of various techniques have been developed to optimize performance around certain bottlenecks but in the last 10 years we've gotten to a place where FPU cores cost a few cents each and thousands of them can be crammed into a single system, and where hundreds of gigabytes up to several terabytes of RAM are generally affordable and can be equipped into a single system. Going from a system that is hugely IO bound due to data living on spinning disks to a system where a huge multi-gigabyte database can just live in RAM 100% of the time and all other data is stored on SSDs is a several orders of magnitude speedup. And going from "you get one or two FPU ops per clock cycle" to "here's thousands of FPUs per clock cycle" has also translated into orders of magnitude improvements as well.

Additionally, software has gotten better. Nginx is just plain much more streamlined than Apache, and simple caching techniques have really increased the amount of boom for your buck you get with hardware these days, at least in the server space.

We are merely at the beginning of the CPU core arms race.

AMD will sell CPUs with at least 12 cores on a single die when 7nm arrives and maybe even up to 16 if we're lucky.

EUV is expected to scale to 1 nm at a minimum. It wouldn't surprise me if high end consumer desktops will have 64 cores and servers will have 2x 4x 64 cores for a total of 512 cores per server before the ride finally ends.


People have known that we were reaching the limits of Moore's law. It has also been, in my experience, the common sentiment that the rapid progress of fab technology had been killing potential for architectural innovation.

Moore's "Law"

I wouldn't call it unexpected, and I'd say those are relatively small examples. "The Free Lunch is Over" [0] was published in 2004 and predicted this exact trend.

[0]: http://www.gotw.ca/publications/concurrency-ddj.htm

Im intrigued by your second paragraph about prices going up. I’ve often wondered if the availability of cheap computing is somehow related to the turnover in consumer electronics due to a fast upgrade cycle, but I don’t know enough about economics to really draw any connections.

I did look at price trends in pcpartpicker (https://pcpartpicker.com/trends/), but didnt really notice clear upward trends for cpu prices. I saw an uptick in gpu prices that I imagine was due to the recent crypto craze. Memory definitely looked like prices were going up across the board. Maybe these windows are too narrow to really identify the kinds of trends you are proposing. I would love to hear more on this topic from someone who knows more about economics and the industry than I do.

Edit: autocorrect apparently doesn’t know about gpus

Also Meltdown/Spectre dragging down most of the incremental performance gain in the latest chip generation.

My prognosis as bit of an insider popping in and out of Shenzhen.

All guns are pointed at _memory_

Memory is an uncompetitive industry, a cash cow unseen in history, comparable only to oil. The SEL empire is built not on top of galaxy notes, but on a pile of memory chips.

The easiest way to get an order of magnitude improvement right away is to put more memory on die and closer to execution units and eliminate the I/O bottleneck, but no mem co. will sell you the memory secret sauce.

Not only that memory is made on proprietary equipment, but decades of research were made entirely behind closed doors of Hynix/SEL/Micron triopoly hydra, unlike in the wider semi community where even Intel's process gets leaks out a bit in their research papers.

SEL makes a lot of money not only selling you the well known rectangular pieces, but also effectively forces all top tier players buying their fancy interface IP if they want to jump on the bandwagon of the next DDR generation earlier than others: https://www.design-reuse.com/samsung/ddr-phy-c-342/ . This makes them want to keep the memory chip a separate piece even more.

Many companies tried to break the cabal, or workaround them, but with no results. Even Apple's only way around this was just to put a whopping 13 megs of SRAM on die.

Changing the classical Von Neuman style CPU for GPU or the trendy neural net streaming processor changes little when it comes to _hardware getting progressively worse_ at running synchronous algorithms because of memory starvation.

You see, the first gen Google TPU is rumored to have the severest memory starvation problem, as do embedded GPUs without steroid pumped memory busses of gaming grade hardware.

When PS3 came out, outstanding benchmark results on typical PC benchmark tasks were wrongly attributed to it having 8 dsp cores, while they were not used in any way. It was all due to it reverting back to more skinny, and synchronous operation friendlier memory. The amazing SPU performance was all thanks to that too. DSP style loads benefitted enormously from nearly synchronous memory behaviour.

David Patterson already figured out that separation of the memory and the processor was the real bottleneck in 1996-97. His IRAM designs for general-purpose computer systems that integrate a processor and DRAM onto a single chip were done in The Berkeley Intelligent RAM (IRAM) Project http://iram.cs.berkeley.edu/

Probably didn't work out for the reasons you mention.

Can you please tell me what is stopping big companies like INTEL and AMD from implementing a processor like IRAM project does? I am really curious to know more about this.

Completely different process technology. When Intel or AMD add memory into their chips it's static RAM (SRAM). SRAM cells are large, expansive and take lots of space. DRAM cells are small and dense (only one-transistor, one-capacitor).

Next question: What prevents Intel or AMD combining logic process and DRAM process or Samsung and others combining logic in their DRAM ships?

Integrating CMOS and DRAM might be impossible to do so that the price/speed is less than using separate chips. Combining two processes increases the price. CPU/GPU makers don't have the latest DRAM knowledge. Reverse is also true. Partnership is required.

Then there are technological problems: There are yield differences in different processes. CPU/GPU's operate in high temperatures. DRAM's technology needs to adjust or there needs to be halfway solution.

It's possible that in some time in the future new technology called STT-MRAM could replace replace low-density DRAM and SRAM and it could be integrated into logic because it can use existing CMOS manufacturing techniques and processes. It will take time. (STT = Spin-Transfer Torque)

Your arguments mixes up memory as in DRAM, where the market structure is as described, and memory as in CPU cache, which is something entirely different.

If Intel wants to add more cache, they simply paste the template again. What's limiting on-die caches is the competition for space. Chip yields sink roughly proportional to die size.

What's the reason why they can't make bigger chips? More costly?

Yields of functioning chips quickly drop off when you increase the size, because each single transistor has an independent risk of error. Individual error rates are extremely small, but so the number of possibly failing components is large.

You don't need every transistor to work though. If you can detect when a module has a broken transistor, you can simply disable that module and sell the rest of the chip. Divide the cache into multiple small modules, and it is not a big deal if you have to deactivate one because of a broken transistor. You would probably be deactivating it anyway for market segmentation.

Yeah! Here is one example of a company working on that:

> The REX Neo architecture gains its performance and efficiency improvements with a reexamining of the on chip memory system, but retains general programmability with breakthrough software tools.

https://insidehpc.com/2017/02/rex-neo-energy-efficient-new-p... (check out the linked video)

We at Vathys are as well for deep learning, check out our Stanford EE380 talk for more: http://web.stanford.edu/class/ee380/Abstracts/171206.html

Can you provide a source on the memory starvation for the TPUs? Plus are we talking generation 1 or 2 or both?

Did not seem to be an issue in the paper with the first generation.

https://cloud.google.com/blog/big-data/2017/05/an-in-depth-l... An in-depth look at Google's first Tensor Processing Unit (TPU ...

In the paper on the first-generation TPU, in section 7. They estimate that there would have been impressive gains in speed, both absolute and per-watt, if they'd had enough design time to give it more memory bandwidth:


This is what was intensively discussed by first trial users, and later admitted by google tpu team themselves on hot chips 29

Even with monsterous HBM2 memory, they still have it.

It is probably hard to predict what matrix set to prefetch when you deal with a neural net. So you have cache misses there too

What do you mean the secret sauce for memory? And who is prevent whom from using it? Certainly Intel, AMD, and Nvidia should have access to the technology. Also I'm not an expert but there are complexities for mixing DRAM and CPU design in the same die.

DRAM processes use slower, higher capacitance, lower leakage RCAT transistors. Not good for logic, but who cares when logic is only ~20% of your die?

The other mammoth problem however, is scaling the deep trench capacitors.

DRAM processes do not lend themselves well to high speed. But FPGAs show that low speed with wide busses can still accomplish a lot.

However if you have a great idea for logic and dram on the same die you are unlikely to be able to do so economically since you can't get the dram IP.

Far as bringing CPU's and memory together, here was one of my favorite attempts that used DRAM processes:


What is SEL?

With the mention of the Galaxy Note I'm assuming Samsung Electronics.

Ah, EL from ELectronics. Search was turning up nothing.


Thought one of the more interesting ideas is the Jeff Dean paper on using the TPU for more traditional CS operations like an index lookup.

Like to see this progress and get more data on the difference is power required using a CPU and traditional algorithms versus using a TPU and new algorithms like Jeff outlined.

https://arxiv.org/abs/1712.01208 The Case for Learned Index Structures

No shit. We're using a computer architecture envisioned in the late 50s. Our main processing units are made by a near-monopoly. The situation is so bad that to do any real computing we need to put another computer (GPU) inside of our computer. If you think about it, it's patently absurd.

As far as grand new ideas: fleet architecture by Ivan Sutherland.

As far as master-of-the-obvious ideas: make a hybrid unit that contains flash, RAM and an FPGa-based processor. Persistent by default. Connect a lot of them and get rid of disks, caches and memory bottlenecks. Do something similar to what XMOS does for peripherals (simulate them) to simplify hardware even further.

You could even combine the two ideas above. Make a computer where programming isn't just about instructions, but about re-configuring hardware and information routing within the system.

Maybe it's time to fuse the CPU and memory together?

Fuse an i7 processor with 32 GB of RAM. This should be far sufficient for all normal consumer needs, at this time.

You can still allow for additional memory, but this would function as a level 2 memory which is slower to access.

AMD has been shipping GPUs with HBM (on package RAM) for years. Recently, AMD and Intel have struck a deal to put that GPU+HBM in a CPU package.


When HBM first came out, I mused at how much faster/more efficient a CPU+HBM chip could be. I wonder what's stopping it from happening.

I'm also curious what ever happened to this. I know it's not a new idea [0] but haven't seen it in practice.

[0]: http://www.theregister.co.uk/2013/11/21/intel_converging_mem...

The chips of the future may be designed nothing like the chips of the past. This will call for more than just basement circuit rehashing. The future of chips will rest on novel solid-state and condensed-matter research.

Friendly reminder that "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" [0] was published in 2005 (~13 years ago).

Thinking aloud here-- Is the lull in chip design due to the shift in focus to software over hardware due to the boom of internet companies circa 2000. Just compare entry level EE vs. CS available jobs or pay for example. Or, maybe it's just that, like markets, R&D also operates in cycles? Regardless, it would be interesting to see a comparison of $$$ invested in R&D on each side over the last n years.

[0]: http://www.gotw.ca/publications/concurrency-ddj.htm

I suggest to look at Mill Computing. They are designing a new CPU which has the power of Intel, the security of a mainframe and the power usage of Raspberry Pi.

Interesting but I Don't see anything about physical hardware or timelines in the articles, can someone do a TLDR summary?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact