
NovuMind is developing a deep learning chip to “do inference efficiently” - baybal2
http://www.eetimes.com/document.asp?doc_id=1332226
======
0xbear
I wish I could bet money on this not happening "by February". Dude is
promising perf that's 25% higher than Titan X Pascal in a power envelope
that's 1/50th of Titan X Pascal, and he's promising to do it with a few tens
of millions of dollars and a handful of people.

Yeah, right. Like that's ever going to happen. Sounds like the usual grandiose
claims to get clueless moneybags to pony up the cash for series B.

~~~
jamiek88
Yeah, extraordinary claims and all that.

Especially as this guy has a credibility problem in the first place because of
the ML cheating scandal from a few years ago (1) he was in the middle of.

15 tflops of even 3x3 stacked in 5 watts? 'Taped out by Feb' \- I am sceptical
to say the least.

(1) [https://www.enterprisetech.com/2015/06/12/baidu-fires-
deep-i...](https://www.enterprisetech.com/2015/06/12/baidu-fires-deep-images-
ren-wu/)

~~~
dnautics
It's not impossible. I've seen a company go from zero to tapeout in six
months. However, they did wind up getting stuck due to bugs on the
motherboard. (the chip did work)

------
deepnotderp
Interesting, but inference only and convolutions only is kind of a non-starter
for a specialized chip, since the field is simply too dynamic for it to be
really useful to focus on only 3x3 convolutions. I'm guessing FWIW, that this
guy is implementing the Winograd filtering method or something similar in
hardware, hence his restriction to 3x3 kernels.

~~~
Iv
There is a slight risk but I think it does make sense. There has been papers
on the advantage of 5x5 and 7x7 convolution filters and the consensus seems to
be that 3x3 are probably the way to go.

If this was out, right now, it would have tons of use. The field moves fast
but it is credible that in one year the field will still uses 3x3 and it is
certain that being able to do todays applications for a few percents of the
energy used now to do them is going to have business applications.

Research moves fast but user usage move at a different speed. The first GPU
came out when people were not sure if we would be doing triangle-based
renderings, realtime raytracing, or something more funky.

~~~
deepnotderp
That's a fair enough point, but i don't really see why you need to restrict
yourself to only 3x3 when you can support others at a slight penalty. For that
reason, i suspect it's a winograd convolution ASIC or some similar algorithm.

~~~
baybal2
The penalty is not that slight. As said in the article, Google TPUs struggle
to feed themselves data, while their big convolution units simply ends up
being a waste of die area

