TensorFlow does lots of matrix multiplies. The Hexagon chip can do 8 multiplies each cycle, and runs multiple threads on each core. The benchmark isn't clear, but it's likely that _one_ Hexagon instruction can replace multiple normal ARM instructions for the inner loop.
You can see some more on how the Hexagon DSP works here: http://pages.cs.wisc.edu/~danav/pubs/qcom/hexagon_hotchips20...
The hard part is to implement it efficiently (power, area speed). So in theory with could have an open source design and the vendors could still compete with each other by providing the most efficient implementation.
I think there are big gains to be made in lower precision inference too. Lots of people doing interesting work in that area, check out these guys - https://xnor.ai/
Over a given time-slice, the DSP is able to take in and process/use more images of the object, allowing it to be more precise in it's predictions.
Can someone help me understand what is going on here?
Are we doing just doing prediction for a model on a mobile device instead of in the cloud? If so, for what kinds of scenarios is this useful?
Can the pre-trained brain (the one in the phone) flip to training mode? Can you teach it something and upload that new training result to the original?
Or for things it doesn't recognise, do you need to add the images and classification to the training data and create a 'new brain' and download it to the phone?
Is there one super organism (cloud based learning) that gives birth to millions of mini-minds. Each mini-mind asking it's parent to help it with things it doesn't understand. In 20 years time what will this say about consciousness? Where would it live? Is this a new way to think about minds, those that are distributed in many physicals devices?
And the precision of the hardware changing thought processes in subtle ways is very interesting. Upgrading a neural net to a new hardware platform would change how it works, how it thinks and makes decisions.
Harder operations and you need to do a lot more of them. Far more suited to having a single massive training system then send out the information just for inference.
Another thing that can be done is to train a large neural net then figure out which bits you can cut out without sacrificing much accuracy. The newer, smaller net is then faster to run and more likely to actually fit neatly into the RAM on your phone.
> Can the pre-trained brain (the one in the phone) flip to training mode? Can you teach it something and upload that new training result to the original?
Technically you probably could, but practically the answer is no for the types of nets used in this kind of thing. You'd want to be training the net on millions of images, and even if it were as fast as the inference on the phones that'd still take way too long.
[edit - interestingly this is not only technically possible but pretty much what is often done but on more powerful machines. You can start with a pre-trained network or model and then "fine tune" it with your own data: http://cs231n.github.io/transfer-learning/]
> Or for things it doesn't recognise, do you need to add the images and classification to the training data and create a 'new brain' and download it to the phone?
This is generally the approach, yes. It has other advantages though, the performance can be checked and compared once then re-used lots of times.
> Is there one super organism (cloud based learning) that gives birth to millions of mini-minds. Each mini-mind asking it's parent to help it with things it doesn't understand. In 20 years time what will this say about consciousness? Where would it live? Is this a new way to think about minds, those that are distributed in many physicals devices?
In many ways, sounds similar to delegating work to more junior / less well trained staff.
It can also be useful for basic "AI assistants" that process the data locally, so you get some extra privacy. For instance, you could get better image search on the device, without ever putting the photos in the cloud.
I also don't think any of those AI assistants that Google and Facebook are pushing with their messengers actually need to exist in the cloud. But of course Google and Facebook will continue to prefer doing it over the cloud because they actually want that data for themselves.
I think Huawei is also pushing for "smart notification management" to save battery life using such AI, although so far Huawei's solution has been pretty dumb. But I can see how this could improve in the future.
There should be at least a few more use cases where this is useful, and I think we'll see more smartphone makers take advantage of this.
Until we are surrounded by recording devices that have autoencoder-based speaker fingerprinting and audio transcription, combined with some NLP to make sure that if you say "Hello, I'm Tom Walker", it'll remember and fill that in in the transcriptions. Instead of having vague videos and maybe some confusing sounds that can be deciphered by the police if there's enough reason to put in the effort and personnel we'll now have direct audio transcriptions of everything we say and do everywhere available to a number of companies.
And the worst part of it is. This is useful. For security, for remembering things, for automated secretary, for ... People will want this, and the features it can bring, so it'll happen, and privacy will be eroded until it's entirely gone.
If you were using this for a real purpose, would you only consider it identified at a certain confidence? If you did then the CPU one is surprisingly more performant in some of these examples despite taking longer to get to the object at all.
I'd be very interested to know if there's any difference in the processing that should be taken into account however.
Recognizing faces? Voice? Handwriting? Captions for photos? Natural Language queries (like google's AI assistant)? Positioning by recognizing landmarks? Simple autonmous driving (say RC cars)? Flying (quad rotors or rc planes)? Cars?
Or I guess a better question... will this change anything except decrease your need for a good network?
I think nobody's bothered to code that stuff in, because, well, despite trying time and time again to make these hot-shit features for a couple of decades, these features end up unused. Those that pay attention to history see this, and figure "Probably not worth trying, even in this day and age."
More specifically, I had a Palm Pilot, and you had to write using weird letter shapes for it to work.
It was just bloody annoying because you had to wait about 15 seconds for it to figure everything out. It was far faster to just use the keyboard.
No it didn't
Systems Requirements for FaceIt PC
-- Microsoft Windows 95.
-- 90 MHz Intel Pentium compatible or higher.
-- 16 MB RAM/10 MB free hard disc space.
-- VGA or higher video display adapter.
-- CD-ROM drive.
-- Microsoft Video for Windows (VFW) compatible video capture system with
resolution 320 x 240, depth: 15 bit RGB, capture rate to memory:
Generally (yes there are exceptions) qualcomm produces the best flagship arm CPUs outside of apple.
That's, my personal email.
If it's a 100MB file, you'd basically have to ship it with the operating system.
Having Siri do local voice and image recognition would be killer. I hate the latency currently for the AI agents