Hacker News new | past | comments | ask | show | jobs | submit login
Apple Is Working on a Dedicated Chip to Power AI on Devices (bloomberg.com)
299 points by coloneltcb on May 26, 2017 | hide | past | web | favorite | 116 comments

This is probably going to be a hyper parallel fixed point / integer engine like TPU gen1. Doing fast matrix multiply over really small fields is very subpar on CPUs and GPUs. That was the initial reasoning behind TPU gen1 - improving runtime performance.

One question is if it will architecturally be closer to a GPU or an FPGA. The field moves so fast that it might make sense to "future-proof" a bit with a live-reconfigurable FPGA.

I'd bet on this being an ASIC, doing this on an FPGA with any serious size matrix would require a very expensive FPGA, whereas an ASIC would allow more gates in a smaller volume and would consume less power to boot.

From the manufacturers point of view phones not being future proof is a feature, not a bug, that way you'll upgrade to that new shiny item which will keep the profits rolling in.

> From the manufacturers point of view phones not being future proof is a feature, not a bug, that way you'll upgrade to that new shiny item which will keep the profits rolling in.

I don't think a better AI chip will be a convincing argument to change a phone one year later.

Not if you put it that way. Apple can simply make new AI features exclusive to newer phones with updated versions of the chips. If they open the chip up to developers, this effect can spill out to the app store. That would then provide the impetus to push consumers to upgrade.

You nailed the Apple plan, they did exact same thing staring from Introduction of Siri (4S?), ... then Force Touch etc.

With each iteration of phone, they add a shiny new feature that is ONLY available on the newer model.

Siri was only available on new iPhones from then on, but it's now available on regular Macintoshes going back like 7-8 years.

i don't see how force touch can be added to older hardwares...

It can't be. Night Shift, on the other hand...

I think you underestimate the power of marketing.

> I think you underestimate the power of marketing.

With Steve Jobs gone, Apple is more and more seen as the emperor without clothes[0] that it is.

[0] I.e. yet another rebrander of Chinese goods with marginal differences from other franchises

"I'd bet on this being an ASIC"

Aren't all the iPhone chips ASIC's with the main one custom by those hardware people they acquired? Seems to be default expectation for whatever they add next. They sure as hell have the money, too. :)

I don't think so. I believe Apple is basically going to do what Google and Qualcomm do and use a DSP's.

'A DSP's'? (sic)

DSPs are not nearly as good at matrix math as a GPU and the phone already has one of those. DSPs are typically good at signal processing in a fairly limited domain, they would not be calling it an AI chip if it was just a DSP.

You don't put an FPGA in a device you're going to sell 200M+ of. The cost per unit would be way higher than an asic, and your just going to come out with a better version next year anyway so why bother?

I foresee it as similar to their M series co-processors - the first one was pretty basic, and more sensors and jobs have been given to the newer ones each year.

I work on FPGA embedded vision.

I think a lot of people in this thread are making incorrect assumptions about FPGAs implementations of neural network applications.

(1) forward networks are constant multiplications, i.e. Fixed shift and add. FPGAs are very nearly optimal architecture for programmable constant-shift-and-add

(2) individual neurons in a network can be bitwidth optimized and huffman encoded for bit-serial execution, FPGAs are a very nearly optimal architure for variable bit-width operations in a bit serial architecture with programmable huffman decoders [edited: huffman encoding, not hamming]

(3) running a forward network requires multiple channels of parallel memory with separate but deterministic access patterns. Most fpga architurs are designed with onboard ram specifically to be used this way.

(4) fpga architectures can be designed inherently fault and defect tolerant, like gpus disabling cores, but with finer granularity. Especially if the compilation is done in the cloud, the particular defect / yield profile can be stored for placement optimization.

(5) anything optimized for ASIC design will be necessarily so close to an FPGA that it may as well benefit from the existing programmable logic ecosystem to be flexibly optimized for a particular trained network. You can't just tape out an asic for every trained network, but based my previous points, you most likely can optimize the logic for a specific forward network to run on an FPGA better than any asic designed to run arbitrary networks

But what about unit cost?

IIRC there's already an FPGA in the iPhone.

There is a tiny one in the iPhone 7. But, that's for flexibility on current tasks not future proofing.

In terms of AI there is little reason to run it on the phone unless it's heavily used or needs or be low latency. Consider if they add a 100$ of computing power to a phone that sits unused 99% of the time they can just build a server using those same 100$ worth of parts that can then serve 100 phones saving 90+$ per phone including upkeep etc.

PS: This is the same logic why Amazon Echo is so cheap, it simply does not need much hardware.

Privacy is another reason to keep the computation local.

I believe that's actually the main reason to choose a "local" AI.

Good luck getting any of the big players to acknowledge that. >.<

That's exactly why Apple does things like analyze your photo library locally on the phone - for privacy.

Yes, surely this has no relation to their inability to build reliable cloud products ....

It could be both. Perhaps Apple concluded that 1) they're subpar with cloud services and will have difficulty competing, 2) there's a growing need/demand for more privacy and less 'cloud', and 3) Apple's products are already, on the whole, recommended when it comes to privacy.

And based on that they figured privacy was a good thing to aim for. Play to their strengths and differentiate based on that.

I believe many of them do. Google has TensorFlow Lite: https://techcrunch.com/2017/05/17/googles-tensorflow-lite-br...

Facebook has Cafe2Go. Apple is working on this (and already has bindings optimized to use the ARM vector unit for DNN evaluation).

Running on device, if it can be done with reasonable power, is a win for everyone. Better privacy, better latency, and more robust operation in the face of intermittent connectivity.

You are correct, but we're talking about an entirely different magnitude of fpga for AI purposes than the tiny little one in the iPhone 7.

It'll be an ASIC so more GPU than FPGA. The real reason to upgrade the chip would be to add more transistors rather than any real instruction set upgrade so the FPGA doesn't really get Apple anything other than cost and wasted space.

How do FPGAs compare on space and power? Would differences be enough to matter for mobile?

FPGAs are pretty bad space-wise and power-wise compared to straight up ASICs. Apple could make some blocks highly configurable, but even an FPGA designer wouldn't use FPGA fabric to do multiplication if they cared about performance. FPGAs are a mix of general purpose logic blocks (the fabric) and dedicated blocks like multipliers, dividers, PLLs, memory, serializers and deserializers, etc.

That's what I'm thinking - some sort of configurable FPGA like fabric around a bulk of TPUv1 style cores, maybe for routing outputs around so you can do some nice pipe-lining like you might want with CV on video.

I don't think space is an issue, but an ASIC designed exactly for a workload will always beat an FPGA on power. But if you don't know the workload exactly or don't have the money to fab an ASIC then an FPGA will be superior if the workload is a bad fit for CPUs or GPUs. So if you can save (2-10)x power on some unknown ML workload in the future that might be preferable to (10-20)x on some fixed workload with a fixed-point ASIC.

I.e. Bitcoin mining went GPU->FPGA->ASIC, each with more investment required to design but higher overall performance in Hash/W. But that workload is known exactly.

I doubt they'll do an FPGA. Devices are too concerned with battery life to be running that, plus their margins would suffer or it'd be even more expensive.

Space is only an issue because it is directly correlated with unit cost.

Probably worse on power consumption as well as price per MIP.

Fpgas are generally power hogs, which makes sense considering how they work.

Mythic is working on analog/digital neural network chips. Some others also. Much better power-consumption.

Maybe Apple aims on those ? it will certainly achieve a much higher WOW factor, which is key here.

Could this mean on-device ANI? My deal breaker with Amazon, Google, Microsoft and even Siri is their role in normalising the hoovering up of sensitive data.

More work needs to be done on training models with less data, differential privacy, and unsupervised learning, but so long as supervised learning continues to be the main path forward for the current set of "AI" centralizing the data into ginormous data sets will continue to be the norm.

I don't see how unsupervised learning makes this any better? That data you're training on in an unsupervised manner is still collected somewhere, and could contain as much private information as a labeled dataset.

Yep, whether it's supervised or not has no bearing on privacy. What counts is where the data is processed and who has it.

Labels for supervised training tend to come from humans in the loop. I think many would consider another human looking at their photos, searches, etc. To be a loss of privacy albeit with a small surface area.

I think so- they acquired a company a few years back, percept.io, that did on device learning. I wouldn't be surprised if they were starting to put it in production.

what is ANI?

Artificial Narrow Intelligence or "weak" AI. Like Siri.

Why is Bloomberg not mentioning that Google announced it was working on the same thing? They mentioned vaguely that Amazon and Google both were working on AI, but nothing about the seemingly similar TPU and how Google announced they were going to bring it to phones at I/O just a bit ago. Am I wrong to be thinking that's pretty relevant here?

Because Google's TPU is for servers and not for Mobile.

Pretty sure they announced during I/O that they were working on making a TPU for mobile.

Do you have a link? I don't remember this, and I didn't find anything through googling.

I'm definitely doubting myself now. I remember watching the keynote and thinking how they didn't make a bigger deal out of the on device chip.

The only thing similar I can find is at about 1:22:00 in the keynote here https://youtu.be/Y2VF8tmLFHw but all he actually says is "silicon specific accelerators".

So honestly at this point it could mean anything.


"Google has clearly committed to this vision of AI on the phone. At I/O, the company also unveiled a custom-built chip for both training and running neural networks in its data centers. I asked Google CEO Sundar Pichai if the company might build its own mobile chip along the same lines. He said the company had no plans to do so, but he didn’t rule it out either. “If we don’t find the state-of-the-art available on the outside,” he said, “we try to push it there.”"

"Companies such as Intel are already working on this kind of mobile AI processor."


Edit - also:

"There’s already one mobile processor with a machine learning-specific DSP on the market today. The Qualcomm Snapdragon 835 system-on-a-chip sports the Hexagon DSP that supports TensorFlow. DSPs are also used for providing functionality like recognizing the “OK, Google” wake phrase for the Google Assistant, according to Moorhead."


And TPUs are not for sale, just for rent in the cloud, and that is only since recently.

Knowing them this will be pretty good. The A10 is a beast.

Knowing Apple's (software) prowess in AI the end-result will still likely be shit compared to Google.

(I think what we are seeing here is the usual thing where Apple's software/product/design people decide the iPhone hardware roadmap, rather than the hardware people.)

What is your basis for this? Am curious.

Siri vs Google assistant. Google assistant being way better in understanding voice and performing tasks. I've used both iPhone and Pixel and I can confirm that Google assistant is way more smart and intelligent than Siri at this point.

Don't know why you're being downvoted, this has been my experience too. Siri feels more like a toy with its infuriating jokes and stunted capabilities, Google Assistant feels much more polished.

I have confidence that attempting a command I've never tried before with Google Assistant will work, with Siri it's potluck.

Yeah it's not even up for debate. Google is way ahead of Apple in Ai and just about nearly another we could name.

Wired recently did a test on major AI assistants with different accents and Google Assistant won every round (they gave the first round to siri for some reason but if you watch the video Assistant clearly won that round too)


It was astonishing how right Assistant can be

OS X has degraded over the last several years in terms of bugginess. Anecdotally.

It has gone from "it just works" to a source of annoyance. Although nothing can top my work computer (Windows 7) for irritation.

Siri is a bag of hurt. I bet Apple has a very useful library of recordings of people saying "damn it, that's not what I said."

Just thinking about how terrible Siri is makes me hope and pray that they really aren't working on any self-driving car software.

My assumption has been that Google, Amazon, and Microsoft run the heavy-duty AI in the cloud when possible, benefiting from huge scale and easier updates. Maybe that assumption is wrong?

If it's right, is Apple adopting a more decentralized model, with AI (or more AI) running locally? Could that compete with cloud-based AI's advantages? Obviously it would be better for offline usage, for responsiveness when transmission to the cloud is a significant part the latency, and for confidentiality.

Google's been working on distributed training as well.

Why? What is the benefit to Google?

Also, are they doing training for the local user or for Google's 'general' systems or for both?

So they can do on device AI/ML with TensorFlow Lite, through the use of specialized neural network DSP's, as discussed during the keynote at I/O 2017.



This is interesting indeed, although I suppose it was somewhat inevitable.

I'm definitely interested in the architectural details of the chip, but I doubt Apple will open up. Apple has control of the software stack and by extension, what models will run on this chip, so I expect that it will be a little bit more special purpose than general purpose.

I have been worried about this trend: if they don't open it up, things like this introduce a disparity between startups that can only have access to GPUs and big companies that make their own proprietary ASICs for their proprietary software, such that startups cannot complete.

It's weird to me that you somehow assume no chip makers will move into the market for mobile-ready AI processors, if this really becomes a thing. Apple certainly won't open up its designs, assuming they exist and ship. There's strongly negative incentive and cultural inclination for them to become a chip vendor.

This disparity has always existed, though. Big companies can throw money at things that start-ups can't.

So you rent or borrow from a bigger company while you can (Cloud TPUs), or your specialize in doing things that big companies with inflexible purpose-built hardware can't.

The flip side is that there is pretty obviously a market for such a product. If it isn't released by google or apple, it will be released by someone else. If it isn't, then that is a pretty good idea for a startup.

Only well funded startups will make ASICs.And most of them will fail. This is very different from many small startups programming general purpose computers.

So then maybe the key is a start up that's in the business of raising the chance of success that other players having in this endeavor?

The big companies can make their own. The next smaller companies will gather together and create a company (this is essentially how ARM started) to create one chip that works for all.

Won't it likely be in their interest to make those capabilities open to 3rd parties? My guess: the platforms that don't will suffer in experience.

What about the server side? Google executing Tensorflow on ASICs versus a startup restricted to GPUs or Google Cloud/TPUs?

> . The chip, known internally as the Apple Neural Engine,

Is this a real IC/processor for arbitrary software or an abstraction of an underlying GPU/DSP?

Most likely some kind of dedicated deep learning accelerator. This is coming with or without Apple:

> Exynos 8895 features VPU (Vision Processing Unit) which is designed for machine vision technology. This technology improves the recognition of an item or its movements by analyzing the visual information coming through the camera. Furthermore, it enables advanced features such as corner detection that is frequently used in motion detection, image registration, video tracking and object recognition.


> New Vision Processing Unit (VPU) paired to the Image Signal Processors (ISP) that provides a dedicated processing platform for numerous camera features, freeing up the CPU and GPU and saving power.


I was also hoping that with Google's high-efficiency for the TPU, they would make a version for mobile as well, at least for their Pixel phones. I guess they still might, but the fact that the TPU2 does both training and inference makes me think they won't do that anytime soon anymore.

The biggest reasons why I like this "mobile AI chips" trend is that they can give you back some privacy, if the data can be analyzed locally without going to the vendors' servers, and I think they will also boost the capabilities of computational photography. No more spying toys for kids, etc.

What does the instruction set look like on that? Is there another example out of this type of chip?

No idea what instruction set the Apple device uses but Google just announced alpha access to their Tensorflow Processing Unit: https://cloud.google.com/tpu/ on Google cloud

Google has their Tensor Flow. chips, I would imagine it is similar.

Is there released documentation on the instruction set of the Google Cloud TPU?

They list the main ones on pages 3-4 of this paper:


It's mainly moving memory around, matrix multiplication, convolution, and applying activation functions (sigmoid, tanh, relu, etc.). Very simple, high-level stuff. This has the handy side-effect of making timing very predictable, which makes the latency a lot more deterministic.

> I was also hoping that with Google's high-efficiency for the TPU, they would make a version for mobile as well

At Google IO there was a slide during the keynote that they're working with Mediatek to produce a mobile TPU

I wish they would have called it Apple Neural Technology, so we could start referring to the devices as ANTs and Hives and Colonies as we build out Richard Hendrick's new decentralized internet.

I'm more excited for the dedicated chip that performs real time hot dog detection!

Mine seems to only detect Little Smokies :-(

Neural Engine = General Matrix Multiplier?

It's nice that the article is trying to deliver an intro that explains that Apple clearly has some catching up to do.

Except that now I'm pretty baffled, since I've seen an article a few months earlier, that says Apple is massively investing in AI, and already using it in several places in their products.

So what am I supposed to believe now? :/

Believe them both.

Apple is investing massively into AI and is using it in their products. However, Google has been working with AI longer and has much more experience. (The article praises Amazon's AI chops as well, I dunno about that one.)

As I understand it, Apple simply hasn't been able to provide the same quantity of training data that other companies use.

I think the general availability of TPUs is an important inflection point in the path to AI popularization, abd, who knows, some type of singularity event. Definitely a milestone in 21st century history.

But I can't resist making a parallel between evolving TPUs and how the CPU found in the arm of the T800 changed history (negatively) forever in the Terminator universe.

That's what we need! A bunch of GPU's and computers carrying a car battery haha. Man that would be crazy. Pre trained before it leaves the factory. (don't know what I'm saying) but I do imagine a man-sized humanoid robot with a bunch of GPU's, hard drives.

I'm not too familiar with the concept of ML specific chip designs, but isn't most AI done on servers and the results returned to the device? What kind of applications involve local ML code execution?

If you want to protect user privacy it's a very relevant selling point to be able to say that the data never leaves the device.

Apple partially offloading Siri from the cloud to client devices in order to reduce datacenter costs?

Why not use a gpu. A lot of AI stuff is linear algebra: Multiply accumulate etc.

Just as going from scalar to vector instructions provides a speedup so does going from vector to matrix instructions. If you've got big vectors than the amount of parallelism exposed for more hardware execution resources used on isn't too big but the reduction in register file read port usage is pretty significant.

Also, inference is usually happy with int8s whereas graphics workloads are mostly float32s. So you can save a lot of hardware that way too.

Why are graphics workloads float32? 32bit (1million+alpha) which is higher color resolution than most eyes can see - "true color" - is 3 8-bit ints + an 8 bit alpha channel (sometimes)

GPUs are not (only) about representing pixels, they are (mostly) about geometric computation.

Because before you can see a color, you have to compute it. For example, you need to calculate what color would result from the interaction of a light source of a given color / intensity and a surface of a given color / reflectivity / glossiness etc. There's no way to reasonably compute that using just 8-bit ints without getting terrible banding / quantization artifacts.

I am 100% positive that they considered that before starting to work on this ASIC. Some possible reasons: GPUs are insufficiently specialized, or use too much energy on the subset of work that Apple wants to enable with this chip. GPUs are too large. GPUs are busy doing other things, etc.

They considered it sure - but why did they conclude they should go with an ASIC? That's what grandparent asked and it was a reasonable question. "They considered that" isn't a suitable answer.

I think when you add "why try to reinvent the wheel?" to the end, it is less of a question and more of a statement. Similar to saying, "Why would you do that?" after someone does something silly. You aren't actually asking them why they'd do the thing. You're saying they ought not have.

But they didn't say that?

They did, but they've since edited the comment.

That is exactly what they said.

The rest of the reply seems to be an answer (as far as you can get with Apple)

I don't think it's wild speculation to say Apple is looking for efficiencies they may not have been able to get with GPUs especially performance-per-watt since so many of their devices are mobile focused

To be fair to nsxwolf, I did not originally explain. I tend to gradually expand my comments. The first iteration was just lashing out at this trend I see on this site which I highly disapprove of: facile reactions to any work that the commenter does not understand. I really detest this reaction that boils down to, "I once heard about a tensor, so clearly I have a better idea of whether this chip should be invented than the experts working at Apple."

Basically just look up Google's TPUs and the reasoning behind them.

despite gpus being fairly "general purpose" these days, there is still a lot of circuitry built for graphics pipeline like workloads. If you just want to do linear algebra you just need a high bandwidth interface to memory and lots of math units.

Are these kind of chips used to accelerate the NN training process only?

Good. But Google beat them to it already.

How so? Google developed a server class TPU for Datacenters. Apple is trying to build on-device low powered custom chip.

Google announced it at I/O. They're using DSP's, specialized for neural network processing, on the SoC with TensorFlow Lite on the device.


Apple added a neural networking API that can leverage the GPU or CPU of a given device in the last version of iOS.


There was an article last year discussing customizations Apple had made to Imagination's GPU that made it more suitable for this purpose.


Oh wow, thanks. I feel like Apple is a step behind in pretty much all things AI.

Yeah SIRI has been lame compared to Google's hey Google for a long time.

Very interesting...

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact