"Compared to leading GPUs , , ,the TSP architecture delivers 5×the computational density for deep learning ops. We see a direct speedup in real application performance as we demonstrate a nearly 4×speedup in batch-size-1 throughput and a nearly 4×reduction of inference latency compared to leading TPU, GPU, and Habana Lab’sGOYA chip."
It is challenging to directly compare a GPU vs an ASIC style chip like this. I would like to see more detailed performance comparisons vs something like Google's TPU.
Honestly if you look at any of the other AI hardware startups they all advertise much more significant speedups.
That's the big if that most people seem to miss. And I even had people complaining their training was slow on a GPU...
I'd agree with the sibling comment; this depends completely on what you are trying to do, and how fast you need to do it. There's nothing wrong with inferencing on a CPU, and I have no doubt it's much cheaper in certain ways, but it's also slower than what you can do on a GPU or custom ASIC, and there are reasons some people need it to go faster. One example that's in wide deployment would be Nvidia's DLSS for video games. It'd be pretty hard to run that in real time at 4k resolution on a CPU.
We at nascentcore.ai are looking at ways to reducing cloud training cost by enabling more alternative training asic chips available to the public.
Feel free to contact us at email@example.com
Funnily enough, the two services that my team is running on CPU are speech recognition and machine translation at real-time speeds, so that is definitely not true.
Heck, I can run an accurate real-time speech recognition service on my computer and only use like 5% of CPU.
For translation, there's really not that much to say - we run the transformers on CPU and they seem to be pretty quick. We have a little more tolerance for latency here than with speech.
Real-time deep speech recognition on CPU is a little trickier. wav2letter++ has the best performance we've found. it's implemented entirely in C++ and streaming inference is quick on CPU. Without a GPU (and even with tbh), it is not feasible to do real-time decoding with a transformer LM, so we use n-grams.
I never said anything about lower accuracy - we are running full size transformer models for translation.
And wav2letter++ inference models are SOTA on the Librispeech leaderboards, so try again. This is a completely different architecture than Kaldi and, frankly, conflating the two is wrong.
> have problem with people who make a blanket statement about running model on CPU.
What was my "blanket statement"? I said that the statement "For some task inference CPU can't be real time... speech recognitions and friends, machine translation etc" was false, because those tasks can be done in real time on CPU. The original claim seems to be much more of a blanket statement than my response.
To my eyes, deep learning asics generally are only meaningful in 2 separate scenarios: a high power high scale data center training chip; or a low power highly efficient edge inference chip.
TSP appears a throughput oriented high power inference chip. I don't know any decent size market can support such chip from a start-up.
They often have similar architecture in terms of the execution workflow, but added NN-flavored instruction units and instructions. That drives down cost and makes them easier to program with.
Drone, they are more energy limited than smartphone, as the propulsion system consumes more energy. Inference throughput seems a secondary problem to drones.