
Microsoft Incorporates Graphcore AI Chips in Azure Cloud - JoachimS
https://www.eetimes.com/document.asp?doc_id=1335297
======
fxtentacle
The author seems to be completely clueless.

1\. "This is the first time any major cloud service provider has publicly
offered customers the opportunity to run their data on an accelerator from any
of the dozens of AI chip startups"

No. Google purchased an AI chip startup around 2015 and since February 2018
you could use TPUs on their cloud. The TPU is a hardware accelerator for AI
and matrix multiplication.

2\. The TensorFlow values in their diagram are "* estimated" which I guess
makes sense because TensorFlow & TPU is the biggest competitor to Graphcore.
Thanks to XLA, TF2 tends to be a few percent faster on GPU than PyTorch. For
example, see here: [https://wrosinski.github.io/deep-learning-
frameworks/](https://wrosinski.github.io/deep-learning-frameworks/)

3\. They compare Graphcore against GPU, but the real comparison target here
would be other accelerators, e.g. TPU. Graphcore comes out similarly fast as a
V100, meaning it would be roughly 50% slower than a TPU.

~~~
justicezyx
> 3\. They compare Graphcore against GPU, but the real comparison target here
> would be other accelerators, e.g. TPU. Graphcore comes out similarly fast as
> a V100, meaning it would be roughly 50% slower than a TPU.

Hmm, business wise, GPU is every AI accelerator startups biggest enemy.

BTW we are building software infrastructure for AI chips, ping me if
interested. info@nascentcore.com

~~~
deepnotderp
I'm interested, but you don't seem to have an email address I can ping you at.

~~~
justicezyx
Added an email.

------
orf
Interesting, in their benchmarks it shows that Tensorflow is nearly twice as
slow to train on a GPU compared to PyTorch. That's a surprisingly large
difference.

~~~
_coveredInBees
2X slower is unlikely, but Pytorch does tend to be faster than TF in most
recent benchmarks. See this link posted by another commenter for one example:

[https://wrosinski.github.io/deep-learning-speed-
vol1/](https://wrosinski.github.io/deep-learning-speed-vol1/)

------
Abishek_Muthian
I'm afraid these AI accelerators available only to the Big cloud companies
would further 'prevent' democratisation of AI/ML.

GPU is already a big bottleneck for an individual/startup from a poor economic
region. Considering, even entry level Software Engineering jobs now expect the
candidates to know ML; it is imposing a huge disadvantage over a large sector
of students from poor countries.

I understand this issue has largely to do with how semiconductor industry by
itself is structured. With only handful of fabrication plants capable of mass
producing semiconductors of this nature and they being integral part of 'soft
power' in Geo-politics; it's hard for a startup to enter this space and when
they do, they have no option to tie-up with these cloud companies.

But, without the hardware bottleneck being addressed not only 'AI powered end
product' will increase the in-equality; but the ML education/research
ecosystem is already raising the inequality.

~~~
solveit
I hate how people perceive unequal growth as a bad thing even when everyone
benefits on an absolute scale. In particular, it's a completely ridiculous
standard to apply to technology. The fact that Google can train a 12 billion
parameter model does not make my 2 million parameter model any less effective.
And when you're concerned with actually doing useful things to create value,
that's all that matters.

I will not go into how completely detached from reality the rest of the
comment is except to say there are approximately zero people who cannot learn
ML because they don't have access to a GPU.

~~~
Abishek_Muthian
> there are approximately zero people who cannot learn ML because they don't
> have access to a GPU

That sounds like an entitled, myopic view.

There are reputed, fully meritorious govt. run Engineering colleges in India,
where tuition fee is ~100 USD/Year. Many of the students studying there are in
poverty, govt. provides free laptop albeit obviously cheap one. Almost
everyone studying there are placed in top companies around the world.

In my previous startup I've conducted ~80 ML/DS interviews, fresh graduates
from above colleges perform very well in DSA & other CS aspects; but perform
poor with ML when compared to those from expensive private institutions.

When I enquired them, it came down to lack of proper access to ML hardware.
Their college labs are not equipped to provide ML training at scale, their
access to Internet is limited to make use of Google Colaboratory or similar
services.

Yes, they could run a CPU bound training for days, but it isn't practical and
many don't have consistent power supply.

Economic disadvantage in education is real, more pronounced when it comes to
ML in my experience.

------
mr_toad
The architecture of these seems quite different from GPUs and Google’s TPUs.
It’s interesting that they went with smaller amounts of on-core memory and
very high bandwidth, rather than a large amount of on-card memory.

------
duaoebg
Training ResNet with 8IPUs is roughly the same as 8GPUs. I doubt it was Tensor
Core optimized so I assume the GPUs can go way faster.

I'd love to spend some time and come up with Graphcore optimized models. But
at this stage fused Tensor Core optimized ops are much more interesting to me.

~~~
justicezyx
Yep, nVidia CEO claimed nVidia is really a software company citing the complex
software they built on top of their GPU.

But I think AI as a whole needs to make optimizing ML AI workloads on non GPU
accelerators equally well. Otherwise the industry will be stifled by the money
seeking shortsigtness of nVidia

~~~
sbierwagen
Easy enough to optimize a net against a new architecture:
[https://ai.googleblog.com/2019/08/efficientnet-edgetpu-
creat...](https://ai.googleblog.com/2019/08/efficientnet-edgetpu-
creating.html)

~~~
justicezyx
Seems this is to using automl to find NN architecture that fits well in
certain inference chips.

The orthogonal issue is given a trained model how to make it run faster on a
given inference chip.

Either way, a blog post on Google AI is a symbol that 1% of the work is done,
the rest 99% needs more investment from the believers.

