Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wafer scale integration, the stupid idea that just won't die :-)

Okay. I'm not quite that cynical but I was avidly following Trillogy Systems (Gene Amdahl started it to make super computers using a single wafer). Conceptually awesome, in practice not so much.

The thing that broke down in the '80s was that different parts of the system evolve at different rates. As a result your wafer computer was always going to be sub-optimal at something, whether it was memory access or an I/O channel standard that was new, changing that part meant all new wafers and fab companies new that every time you change the masks, you have to re-qualify everything. Very time consuming.

I thought the AMD "chiplet" solution to making processors that could evolve outside the interconnect was a good engineering solution to this problem.

Dave Ditzel, of Transmeta fame, was pushing at one point a 'stackable' chip. Sort of 'chip on chip' like some cell phone SoCs have for memory, but generalized to allow more stacking than just 2 chips. The problem becomes getting the heat out of such a system as only the bottom chip is in contact with the substrate and the top chip with the case. Conceptually though, another angle where you could replace parts of the design without new masks for the other parts.

I really liked the SeaMicro clusters (tried to buy one at the previous company but it was too far off axis to pass the toy vs tool test). Perhaps they will solve the problems of WSI and turn it into something amazing.



Given these chips are purely meant for machine learning the issue that "your wafer computer was always going to be sub-optimal at something" is less of an issue now than in traditional scientific programming setups, especially of the Trillogy / Transmeta days.

You have silicon in place to deal with physical defects at the hardware level.

You have backprop / machine learning in place to deal with physical deficiencies at the software level.

The programmer operates mostly at the objective / machine learning model level and can tweak the task setup as needed softly influenced by both the hardware architecture (explicit) and potential deficiencies (hopefully implicit).

The most extreme examples I've seen in papers or my own research: parts of a model left accidentally uninitialized (i.e. the weights were random) not impacting performance enough to be seen as an obvious bug (oops), tuning the size and specifics of a machine learning model to optimize hardware performance (i.e. the use of the adaptive softmax[1] and other "trade X for Y for better hardware efficiency and worry about task performance separately"), and even device placement optimization that outperforms human guided approaches[2].

Whilst the last was proven across equipment at a Google datacenter with various intermixed devices (CPUs, GPUs, devices separated by more or less latency, models requiring more or less data transferred across various bandwidth channels, ...) it's immediately obvious how this could extend to optimizing the performance of a given model on a sub-optimal wafer (or large variety of sub-optimal wafers) without human input.

Whilst reinforcement learning traditionally requires many samples that's perfectly fine when you're running it in an environment with a likely known distribution and can perform many millions of experiments per second testing various approaches.

For me working around wafer level hardware deficiencies with machine learning makes as much or more sense as MapReduce did for working around COTS hardware deficiences. Stop worrying about absolute quality, which is nearly unattainable anyway, and worry about the likely environment and task you're throwing against it.

[1]: https://arxiv.org/abs/1609.04309

[2]: https://arxiv.org/abs/1706.04972


I was being facetious:

I was working right next to andy grove and other legendary CPU eng's at intels bldg SC12....

I was young and making comments abt "why cant we just do this, or that"

---

it was my golden era.

Running the Developer Relations Group (DRG) game labe with my best friend Morgan.

We bought some of the first 42" plasma displays ever made.... and played quake tournaments on them...

We had a T3 directly to out lab.

We had the first ever AGP slots, the first test of the unreal engine...

We had SIX fucking UO accounts logged in side-by-side and ran an EMPIRE in UO.

We would stay/play/work until 4am. It was fantastic.

---

We got the UO admins ghosting us wonersing how we were so good at the game (recall, everyone else had 56K modems at best... we were on a T3 at fucking intel...

We used to get yelled at for playing the music too loud....

---

Our job was to determine if the Celeron was going to be a viable project via playing games and figuring out if the SIMD instructions were viable... To ensure theree wasa . capability to make a sub $1,000 PC. Lots of ppl at intel thought it was impossible...

GAMES MADE THAT HAPPEN.

Intel would then pay a gaming/other company a million dollars to say "our game/shit runs best on Intel Celeron Processors" etc... pushing SIMD (hence why they were afraid of Transmeta, and AMD -- since AMD already had won a lawsuit that required Intel to give AMD CPU designs from past....

This was when they were claiming that 14nm was going to be impossible...

What are they at now?


> Dave Ditzel, of Transmeta fame, was pushing at one point a 'stackable' chip. Sort of 'chip on chip' like some cell phone SoCs have for memory, but generalized to allow more stacking than just 2 chips. The problem becomes getting the heat out of such a system as only the bottom chip is in contact with the substrate and the top chip with the case. Conceptually though, another angle where you could replace parts of the design without new masks for the other parts.

I'm imagining alternating between cpus and slabs of metal with heatpipes or some sort of monstrous liquid cooling loop running through them.


Actually diamond is a really good heat conductor. Dave's big idea was that the actual "thickness" of the chip that was needed to implement the transistors was actually quite thin (think nanometers thin) and that if you put one of these on top of another one, you could use effects like electron tunneling to create links between the two, and even bisected transistors (where the gate was on one of the two and the channel was on the other).

So here is an article from 4 years ago on diamond as a substrate: https://www.electronicdesign.com/power/applications-abound-s... which talks about how great it is for spreading heat out. And one thought was that you would make a sandwich of interleaved chip slices and diamond slices, still thin enough to enable communication between the two semiconductor layers without having to solder balls between them.

In that scenarios the package would have an outer ring that would clamp contact the diamond sheets to pull heat out of everything to the case.

Of course the mass market adopted making chips really thin so that you can make a sleek phone or laptop. Not really conducive to stacks in the package. Perhaps that was the thing that killed it off.


Low-Cost 3D Chip Stacking with ThruChip Wireless by Dave Ditzel

https://www.youtube.com/watch?v=S-hBSddgGY0

Fascinating idea, no idea about the practicalities.


Wow, thanks for finding that!


Havent hear Tranmeta in a LONG time.

I recall when I was at intel in 1996 and I used to work a few feet from Andy Grove... and I would ask naive questions like

"how come we cant stack mutiple CPUs on top of eachother"

and make naive statements like:

"When google figures out how to sell their services to customers (GCP) we are fucked" (this was made in the 2000's when I was on a hike with Intels then head of tech marketing, not 1996) ((During that hike he was telling me abt a secret project where they were able to make a proc that 48 cores)) (((I didnt believe it and I was like "what the fuck are they going to do with it)))?? -- welp this is how the future happens. and here we are.

and ask financially stupid questions like:

"what are you working on?" response "trying to figure out how to make our ERP financial system handel numbers in the billions of dollars"

I made a bunch of other stupid comments... like "apple is going to start using Intel procs" and was yelled at by my apple fan boi " THATS NEVER GOING TO FUCKING HAPPEN"

But im just a moron.

---

But transmeta... there was a palpable fear of them at intel at that time....


A lot of the transmeta crowd, at least on the hardware side, went on to work at a very wide variety of companies. Transmeta didn't work, and probably was never going to work, as a company and a product, but it made a hell of a good place to mature certain hardware and software engineers, like a VC-funded postdoc program. I worked with a number of them at different companies.


I was at Transmeta. It was good for my career!


So what specifically are you doing now??


I was a QA Engineer at Transmeta, got an MBA, work in tech recruiting for several years, and now I'm a software engineer.


click on their profile


New improvements(zeno semi) do talk about 1T sram, and dram chiplet solution will have the same limits in memory size as wafer-scale, So maybe the sram vs dram gap will be close enough?

And as for IO, maybe it's possible(thermally) to assemble this on an IO interposer ?


When I speculated on brain = partly general-purpose analog, this wafer-scale project was one of the ones I found digging for research like that:

http://web1.kip.uni-heidelberg.de/Veroeffentlichungen/downlo...

Pretty neat. I don't know how practical.


For cooling maybe a phase change liquid would work such as Novec, then you would just need each chip to be exposed to the Novec and your packaging could be much smaller without heatsinks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: