One question is if it will architecturally be closer to a GPU or an FPGA. The field moves so fast that it might make sense to "future-proof" a bit with a live-reconfigurable FPGA.
From the manufacturers point of view phones not being future proof is a feature, not a bug, that way you'll upgrade to that new shiny item which will keep the profits rolling in.
I don't think a better AI chip will be a convincing argument to change a phone one year later.
With each iteration of phone, they add a shiny new feature that is ONLY available on the newer model.
With Steve Jobs gone, Apple is more and more seen as the emperor without clothes that it is.
 I.e. yet another rebrander of Chinese goods with marginal differences from other franchises
Aren't all the iPhone chips ASIC's with the main one custom by those hardware people they acquired? Seems to be default expectation for whatever they add next. They sure as hell have the money, too. :)
DSPs are not nearly as good at matrix math as a GPU and the phone already has one of those. DSPs are typically good at signal processing in a fairly limited domain, they would not be calling it an AI chip if it was just a DSP.
I foresee it as similar to their M series co-processors - the first one was pretty basic, and more sensors and jobs have been given to the newer ones each year.
I think a lot of people in this thread are making incorrect assumptions about FPGAs implementations of neural network applications.
(1) forward networks are constant multiplications, i.e. Fixed shift and add. FPGAs are very nearly optimal architecture for programmable constant-shift-and-add
(2) individual neurons in a network can be bitwidth optimized and huffman encoded for bit-serial execution, FPGAs are a very nearly optimal architure for variable bit-width operations in a bit serial architecture with programmable huffman decoders [edited: huffman encoding, not hamming]
(3) running a forward network requires multiple channels of parallel memory with separate but deterministic access patterns. Most fpga architurs are designed with onboard ram specifically to be used this way.
(4) fpga architectures can be designed inherently fault and defect tolerant, like gpus disabling cores, but with finer granularity. Especially if the compilation is done in the cloud, the particular defect / yield profile can be stored for placement optimization.
(5) anything optimized for ASIC design will be necessarily so close to an FPGA that it may as well benefit from the existing programmable logic ecosystem to be flexibly optimized for a particular trained network. You can't just tape out an asic for every trained network, but based my previous points, you most likely can optimize the logic for a specific forward network to run on an FPGA better than any asic designed to run arbitrary networks
In terms of AI there is little reason to run it on the phone unless it's heavily used or needs or be low latency. Consider if they add a 100$ of computing power to a phone that sits unused 99% of the time they can just build a server using those same 100$ worth of parts that can then serve 100 phones saving 90+$ per phone including upkeep etc.
PS: This is the same logic why Amazon Echo is so cheap, it simply does not need much hardware.
And based on that they figured privacy was a good thing to aim for. Play to their strengths and differentiate based on that.
Facebook has Cafe2Go. Apple is working on this (and already has bindings optimized to use the ARM vector unit for DNN evaluation).
Running on device, if it can be done with reasonable power, is a win for everyone. Better privacy, better latency, and more robust operation in the face of intermittent connectivity.
I.e. Bitcoin mining went GPU->FPGA->ASIC, each with more investment required to design but higher overall performance in Hash/W. But that workload is known exactly.
Maybe Apple aims on those ? it will certainly achieve a much higher WOW factor, which is key here.
The only thing similar I can find is at about 1:22:00 in the keynote here https://youtu.be/Y2VF8tmLFHw but all he actually says is "silicon specific accelerators".
So honestly at this point it could mean anything.
"Google has clearly committed to this vision of AI on the phone. At I/O, the company also unveiled a custom-built chip for both training and running neural networks in its data centers. I asked Google CEO Sundar Pichai if the company might build its own mobile chip along the same lines. He said the company had no plans to do so, but he didn’t rule it out either. “If we don’t find the state-of-the-art available on the outside,” he said, “we try to push it there.”"
"Companies such as Intel are already working on this kind of mobile AI processor."
Edit - also:
"There’s already one mobile processor with a machine learning-specific DSP on the market today. The Qualcomm Snapdragon 835 system-on-a-chip sports the Hexagon DSP that supports TensorFlow. DSPs are also used for providing functionality like recognizing the “OK, Google” wake phrase for the Google Assistant, according to Moorhead."
(I think what we are seeing here is the usual thing where Apple's software/product/design people decide the iPhone hardware roadmap, rather than the hardware people.)
I have confidence that attempting a command I've never tried before with Google Assistant will work, with Siri it's potluck.
It was astonishing how right Assistant can be
It has gone from "it just works" to a source of annoyance. Although nothing can top my work computer (Windows 7) for irritation.
Siri is a bag of hurt. I bet Apple has a very useful library of recordings of people saying "damn it, that's not what I said."
Just thinking about how terrible Siri is makes me hope and pray that they really aren't working on any self-driving car software.
If it's right, is Apple adopting a more decentralized model, with AI (or more AI) running locally? Could that compete with cloud-based AI's advantages? Obviously it would be better for offline usage, for responsiveness when transmission to the cloud is a significant part the latency, and for confidentiality.
Also, are they doing training for the local user or for Google's 'general' systems or for both?
This is interesting indeed, although I suppose it was somewhat inevitable.
I'm definitely interested in the architectural details of the chip, but I doubt Apple will open up. Apple has control of the software stack and by extension, what models will run on this chip, so I expect that it will be a little bit more special purpose than general purpose.
So you rent or borrow from a bigger company while you can (Cloud TPUs), or your specialize in doing things that big companies with inflexible purpose-built hardware can't.
Is this a real IC/processor for arbitrary software or an abstraction of an underlying GPU/DSP?
> Exynos 8895 features VPU (Vision Processing Unit) which is designed for machine vision technology. This technology improves the recognition of an item or its movements by analyzing the visual information coming through the camera. Furthermore, it enables advanced features such as corner detection that is frequently used in motion detection, image registration, video tracking and object recognition.
> New Vision Processing Unit (VPU) paired to the Image Signal Processors (ISP) that provides a dedicated processing platform for numerous camera features, freeing up the CPU and GPU and saving power.
I was also hoping that with Google's high-efficiency for the TPU, they would make a version for mobile as well, at least for their Pixel phones. I guess they still might, but the fact that the TPU2 does both training and inference makes me think they won't do that anytime soon anymore.
The biggest reasons why I like this "mobile AI chips" trend is that they can give you back some privacy, if the data can be analyzed locally without going to the vendors' servers, and I think they will also boost the capabilities of computational photography. No more spying toys for kids, etc.
It's mainly moving memory around, matrix multiplication, convolution, and applying activation functions (sigmoid, tanh, relu, etc.). Very simple, high-level stuff. This has the handy side-effect of making timing very predictable, which makes the latency a lot more deterministic.
At Google IO there was a slide during the keynote that they're working with Mediatek to produce a mobile TPU
Except that now I'm pretty baffled, since I've seen an article a few months earlier, that says Apple is massively investing in AI, and already using it in several places in their products.
So what am I supposed to believe now? :/
Apple is investing massively into AI and is using it in their products. However, Google has been working with AI longer and has much more experience. (The article praises Amazon's AI chops as well, I dunno about that one.)
But I can't resist making a parallel between evolving TPUs and how the CPU found in the arm of the T800 changed history (negatively) forever in the Terminator universe.
Also, inference is usually happy with int8s whereas graphics workloads are mostly float32s. So you can save a lot of hardware that way too.
I don't think it's wild speculation to say Apple is looking for efficiencies they may not have been able to get with GPUs especially performance-per-watt since so many of their devices are mobile focused
There was an article last year discussing customizations Apple had made to Imagination's GPU that made it more suitable for this purpose.