
Deep Dive: Nvidia Inference Research Chip Scales to 32 Chiplets - rbanffy
https://www.tomshardware.com/news/nvidia-msm-inference-chip,39780.html
======
montecarl
When I read "inference" here I thought about probabilistic programming and
Bayesian inference, but the article says: "Each processing element contains
eight parallel lanes of 8-way multiply-accumulate (MAC) units, each with a
precision of 8-bit (as is getting common in inference)."

8-bit precision is good for neural network weights, but not for Bayesian
inference, where from my understanding 32-bit floats might be the minimum. I
wonder if anyone is working on hardware for accelerating probabilistic
programming. Some approaches are harder to accelerate, such as monte carlo
sampling (e.g. no u-turn sampler (NUTS)), while variational inference may be
better suited to running on this type of custom hardware.

~~~
nabla9
The world is used in the DL context where learning and inference can be often
separated.

Normal GPU's can be used for both learning and inference tasks. Dedicated
inference processors have limited numerical accuracy and are more suitable
only for inference, but have lower cost and are more power efficient.

------
Zenst
Chiplet design has so much going for it from a yeild/manufacturing/price and
scaling perspective.

But what would be the next step from there?

Could it be that memory chips as we know them get merged with processing and
in the case of graphics cards, maybe a logical progression down the line.
Instead of a row of ram chips, and a processor chiplet blob, you have those
chiplets merged with the ram and form a mesh of both processing and memory.

