
Google Edge TPU Devices - walterbell
https://aiyprojects.withgoogle.com/edge-tpu
======
walterbell
In Nov 2018, Google engineers who designed the TPU gave a presentation about
the use of open-source Chisel for designing the ASIC,
[https://youtube.com/watch?v=x85342Cny8c](https://youtube.com/watch?v=x85342Cny8c)

From [https://techcrunch.com/2018/07/25/google-is-making-a-fast-
sp...](https://techcrunch.com/2018/07/25/google-is-making-a-fast-specialized-
tpu-chip-for-edge-devices-and-a-suite-of-services-to-support-it/)

 _> Google will have the cloud TPU ... to handle training models for various
machine learning-driven tasks, and then run the inference from that model on a
specialized chip that runs a lighter version of TensorFlow that doesn’t
consume as much power ... dramatically reduce the footprint required in a
device that’s actually capturing the data ... Google will be releasing the
chip on a kind of modular board not so dissimilar to the Raspberry Pi ...
it’ll help entice developers who are already working with TensorFlow as their
primary machine learning framework with the idea of a chip that’ll run those
models even faster and more efficiently._

~~~
fizixer
So my guess is that Chisel is one of the many responses to the two horrors
that are VHDL and Verliog.

Unfortunately, Chisel is built on Scala, and I have no interest in learning
Scala. Though I'm intrigued by the claim of using generators and not
instances, and would be interested in a white paper that explains it in PL-
agnostic terms (PL: programming language).

Also have on my to-do list MyHDL [1], a Python solution to the same problem.
(has anyone tried it and found to be better than VHDL/Verilog?)

[1] [http://www.myhdl.org/](http://www.myhdl.org/)

~~~
pedroaraujo
I get the impression that people who talk about the "horrors" that are VHDL
and Verilog for hardware design are software developers who have little to no
knowledge about hardware design processes.

There are reasons why VHDL/Verilog are still in use in the industry and why
high-level synthesis hasn't taken off.

VHDL/Verilog for hardware design is not broken. I won't claim that there isn't
space for improvement (because there is) but there isn't anything
fundamentally broken in them. They are fit for the purpose and they fulfill
all of the needs we have.

What could be massively improved is actually the functional verification
languages we use, SystemVerilog for verification is in serious need of an
overhaul.

~~~
dnautics
OK. I'll bite. I only have experience with verilog, but it's basically
uncomfortable to work with in the sense that there are absolutely no developer
ergonomics. We're well into the 21st century and you'd think that our HDLs
would learn from everything that the software world has learned.

1) the syntax is very finicky (slighttly more so than C, I'd say). Most
software languages (thanks to more experience with parsers and compilers) have
moved on from things like requiring semicolons, verilog has not.

2) writing tests is awful. Testbenches are crazy confusing. Much better would
be some sort of unit testing system that does a better job of segregating of
what constitutes "testing code" versus the "language of the gates". You would
have a hard time doing something like, say, property testing using verilog.

3) there isn't a consistent build/import story with verilog. Once worked with
an engineer that literally used perl as a verilog metaprogramming language.
His codebase had a hard-to-find perl frankenbug which sometimes inserted about
10k lines of nonsense (which somehow still assembled a correct netlist!) but
caused gate timings to severely miss and the footprint to be bloated. It took
the other hardware developers one week to track down the error.

None of these things have anything to do with the fundamental difference
between software and hardware development.

For chisel: at least to some degree, you can get some developer ergonomics
from the Scala ecosystem, and do most of your unit, functional, in integration
testing outside of verilog, in chisel, and last-minute autogenerate verilog
and do a second round of testing to make sure chisel did everything right.
It's the same reason why people do things like "use elm to develop frontend,
compiling down to javascript" and it's a perfectly valid strategy.

~~~
pjmlp
I really don't get why some people get so much hyped up on typing semicolons.

Maybe we should write in our native tongues without any kind of punctuation.

~~~
dnautics
"I really don't get why some people would want sub-10-second completion of
their unit tests. Why not just wait a 30s to a minute to test everything?"

~~~
yjftsjthsd-h
And yet, you typed the punctuation marks in your comment even though all of us
would understand you without them.

~~~
dnautics
I didn't type out the word "second"

------
lambda
Am I missing something, or are they not only not documenting the chip, they
are also not even releasing the compiler, but requiring you to use a cloud
based compiler:

> You need to create a quantized TensorFlow Lite model and then compile the
> model for compatibility with the Edge TPU. We will provide a cloud-based
> compiler tool that accepts your .tflite file and returns a version that's
> compatible with the Edge TPU.

This seems like a new low in software freedom, and pretty risky to depend on
as Google is known to shutter services pretty often and could just decide to
turn off their cloud-based compiler at any time they feel like.

~~~
jacquesm
Chips without a public toolchain are not worth investing your time in. It is
bad enough if your work is tied to specific hardware for which there may at
some point not be a replacement but to not even have the toolchain under your
control makes it a negative.

~~~
narrator
AI is too powerful a technology to let it out there to the masses. People
might use it for killer drones after all. All users of AI must be tightly
controlled and registered with the authorities!

This is the problem with certain kinds of technology that are bumping up
against the edge of innovation. They're too powerful and if these technologies
get in the hands of the DIY set, governments will lose control so they have to
DRM and regulate everything. Heck, it's a problem with old technology. Many
weapons aren't that complicated technologically, but their production and use
are tightly regulated.

Edit: I'm not saying this is a good thing, I'm just deconstructing their
though process for tight control over AI tech going forward.

~~~
imtringued
>People might use it for killer drones after all.

For some reason drones are perceived to be completely different from all
weapons that have existed before them. Those killer drones have existed for
half a century. They are called missiles. Also the reason why UAV based
fighter jets are not viable is because a cruise missile can be launched from
1000 miles away and for the cost of a global hawk you can send out more than a
hundred of them.

If terrorists have access to explosives then it doesn't matter how they
deliver them because most lucrative targets (= lots of people in a small area)
are stationary or predictable. A simple bagpack filled with explosives was
more than enough to injure hundreds of people during the Boston Marathon.

~~~
adrianN
I can buy a drone on Amazon for relatively little money. I can't do the same
with a rocket.

~~~
heyjudy
You can make an unguided, explosive-filled rocket that can harm people for
cheap from scrap. Insurgents throughout the world have done so for the past 40
years. That may not be as simple as Add To Cart, but it is well within the
economic means of almost everyone.

------
potatofarmer45
This is a long time coming. I'm normally not a big fan of large companies
building products in the embedded space that could potentially destroy
competition and future innovation but this is needed.

Nvidia's embedded boards are EXPENSIVE. So expensive it limits the
applications dramatically. They also require a different skillset in people to
set up which drives up the cost.

We did an analysis for a security project that required visual inference. It
turned out all the extra costs to setup with TX boards meant it actually made
more sense to have mini desktops with consumer gtx cards.

I am excited to see the performance of the inference module. If it's decent at
a good price, that opens up so many pi/beagle/arduino applications that were
limited by both cost and form factor of existing options.

~~~
Eridrus
Note that these chips only support TFLite, which is still pretty spartan atm.

~~~
tsbinz
What are you missing? Not involved with the project, just using it, and so far
we've been able to work around the limitations (for computer vision).

------
joshvm
This looks cool.

Currently the only real options for amateur off-the-shelf (accelerated) edge
ML are the Nvidia boards (but small carrier boards for the TX2 cost more than
the module itself) or the Intel NCS which inexplicably blocks every other USB
port on the host device due to its poorly designed case. There is the Movidius
chip itself, but Intel won't sell you one unless you're a volume customer. The
NCS also does bizarre things: the setup script will clobber an existing
installation of opencv with no warning, for example.

There are various optimised machine learning frameworks for ARM, but I'm only
counting hardware accelerated boards here. I'm also not including the various
kickstarter or indiegogo boards which might as well be vapour ware.

There are no good, cheap, embedded boards with USB3 that I can find. There are
a few Chinese boards with USB3, but none of them have anywhere near the
quality of support that the Pi has.

Then camera support. The Pi has a CSI port, but it's undocumented and only
works with the Pi camera. The TX2 is pretty good, but you need to dig through
the documentation to figure things out. USB is fine, but CSI is typically
faster and frees up a valuable port.

Finally another issue is fast storage. It's difficult to capture raw video on
the Pi because you can't store anything faster than about 20MB/s. There are
almost no boards that support SATA or similar (the TX2 does), so the ability
to use USB3 storage would be welcome too.

If this is offered at a reasonable price point, it could be a really nice tool
for hobbyists. It looks like they're trying to keep GPIO pin compatibility
with the Pi too.

~~~
zozbot123
The Raspberry Pi provides documentation for their GPU architecture, so it
_would_ be possible to provide support for that within open source machine
learning frameworks. It would involve quite a bit of work, though, and the RPi
is not really competitive with modern hardware in performance-per-watt terms,
even when using GPU compute.

~~~
joshvm
There are some well optimised libraries, for example a port of darknet that
uses nnpack and some other Neon goodies. You can do about 1fps with tiny yolo.
Not sure if it used anything on the gpu though.

~~~
zozbot123
NEON is the CPU SIMD feature, it has nothing to do with the vc4 GPU.

~~~
joshvm
Yes, I know. My point was that CPU-only deep learning is possible on the Pi if
you don't need real-time inference. What I wasn't sure of is whether that
specific port does anything on the GPU at all, or if it's only using NEON
intrinsics.

------
justinjlynn
> You need to create a quantized TensorFlow Lite model and then compile the
> model for compatibility with the Edge TPU. We will provide a cloud-based
> compiler tool that accepts your .tflite file and returns a version that's
> compatible with the Edge TPU.

I seriously hope that's not the _only_ way they're expecting people to compile
models for this particular TPU.

~~~
dejv
As a person with access to their documentation I can confirm that this is
currently the only way to compile model for this TPU. Also list of supported
network architectures is very very short.

Well this is just begining I am sure they are going to expand its capability.

~~~
justinjlynn
I'd rather they enable independent development of models on the hardware
they're selling. This is about as useful as a high performance electric car
you can only charge at authorised dealerships. Dealerships which have the
unfortunate habit of closing down after a few years or so.

~~~
dejv
I bet there is some technical reason behind allowing just few architectures
and I hope they will fix it in future releases. Currently you can't even run
ResNet type of networks on it.

~~~
justinjlynn
Well, one can hope.

------
gtm1260
This has been around for a while but has been stuck at 'Coming Soon' forever.
Does anyone know what the status of this project actually is? I suspect that
it has been stalled for some reason or the other.

~~~
mondoshawan
One of the Google engineers on the project here. No, we haven't stalled. Keep
an eye out. :-)

~~~
Aduket
do you have any price estimation?and performance wise, what is its equivalent
in the market?

~~~
alvar0
75 / 150 usd. usb module / full soc board

------
dwrodri
A related edge computing AI accelerator: [https://www.crowdsupply.com/up/ai-
core-x](https://www.crowdsupply.com/up/ai-core-x)

------
pilooch
What is the use of 100fps vision models other than being the input to a
controller (e.g. driving, flying etc) ? A raspberry can hold up to 3fps with
standard open source frameworks, and this is enough for many applications,
e.g. construction site surveillance etc... Not criticizing, rather a genuine
interest to understand the edge ML vision market.

~~~
dejv
I did work on optical sorting machine: you have stream of fruits on very fast
conveyor belt and then machine vision system scans the passing fruits, detects
each object and reject (by firing stream of air) those that don't pass: those
can be molded fruits, weird color or foreign material like rocks or leaves.
100 fps might be enough, but faster you go, faster your conveyor belt could
be.

------
Rafuino
How would someone compare the Edge TPU Accelerator with a Movidius Neural
Compute Stick? [https://software.intel.com/en-us/movidius-
ncs](https://software.intel.com/en-us/movidius-ncs)

~~~
m0zg
Movidius is not a TPU. It's more like a GPU, but with SIMD, DSP and even VLIW
capabilities and with a _very_ wide memory bus (and massive throughput). It's
rather impressive actually, but probably serious overkill for what really
needs to be done during inference:
[https://en.wikichip.org/wiki/movidius/microarchitectures/sha...](https://en.wikichip.org/wiki/movidius/microarchitectures/shave_v2.0).
Whereas a TPU is highly specialized for just, you guessed it, processing
tensors, which basically means matrix and vector multiply. It's a systolic
architecture, so it also (purportedly, since I don't have insider knowledge)
stores the weights for the computation for the duration of the computation.

~~~
sanxiyn
We hardly know anything about Edge TPU. Yes, TPU uses systolic array, but
there is really no reason to believe Edge TPU will use it. Edge TPU is not
TPU.

~~~
m0zg
It has "TPU" right its name. If that's not a dead giveaway, I don't know what
is. :-)

------
bpye
I don't know enough about this but how do these devices compare to the Sipeed
MAIX devices [1] I saw mentioned on HN the other day? They seem to both
support TensorFlow Lite but that's where my ability to understand their
capabilities end.

[1] - [https://www.indiegogo.com/projects/sipeed-maix-the-world-
fir...](https://www.indiegogo.com/projects/sipeed-maix-the-world-first-
risc-v-64-ai-module/x/20407509#/)

~~~
joshvm
The risk with crowd funded chips is support and longevity. The reason that the
Raspberry Pi wins every time against technically superior competition is that
it's well supported and there are reasonable supply guarantees.

The same goes for big companies of course. Intel has a habit of releasing Iot
platforms and then killing them. Let's hope the TPU lasts a bit longer.

As for a comparison, it's impossible to say until Google releases benchmark
information on the edge TPU, or some kind of datasheet for the SOM.

~~~
ocdtrekkie
Given Google's tendency to kill products and shift priorities rapidly, I think
building a product or service dependent on a supply of their hardware is
probably a pretty risky choice.

I definitely have been shocked how fast Intel maker boards have come and gone
though. It feels like Intel has written them off before anyone's tried to
build a project using one. I have one sitting around here somewhere that's
never so much as been powered on.

~~~
joshvm
It's very hard to beat the traction that the Pi has. I think because it's
explicitly targeted towards people without any embedded experience, there's
been a lot of pressure to make things work and to make the documentation
somewhat organised.

Intel made some nice little boards, but there wasn't much publicity and
actually getting started with them wasn't easy at all because the docs were
buried. They were usually modules designed for integration, not standalone
devices.

With the Pi you can buy a kit, plug in the SD card and boot ot desktop in
minutes.

------
AndrewKemendo
Are they going after the Movidius [1] with this?

[1][https://software.intel.com/en-us/movidius-
ncs](https://software.intel.com/en-us/movidius-ncs)

------
petra
So they will sell develoent boards, without selling IC's.

Seems like a nice way to gather ideas and data about new products.

------
xrisk
How expensive is this likely to be? Feasible for a hobbyist to purchase?

~~~
neuromancer2701
The NXP® i.MX 8MQuad board is available for 150$ and has USB3 and PCIe. The
TPU would probably be attached through one of those buses. I would bet around
250$ with the TPU which is pretty good and puts it around half the Jetson Tx2,
1/5 the Xavier. I wonder if the TPU could be used for SLAM not just object
identification, now that would be useful

~~~
voltagex_
Do you know where I can buy the board itself? (without the TPU), I only saw
"SOM" versions which I'm not equipped to use.

------
polskibus
Is this anything worth considering, if it cannot be used for training?

------
peterwwillis
When I see products developed by Google, I imagine them as Replicants.
Developed by an advanced tech company, full of futuristic technology and
amazing potential, and destined to die in ~4 years.

~~~
jrockway
The competitors don't really keep chips around for longer. Intel isn't
manufacturing Skylake anymore. Nvidia isn't manufacturing Maxwell GPUs
anymore. (Incidentally, Apple did appear to be using their 4-year old A8 SoC
in the first HomePods, released in 2018, though.)

Hardware and software are different things. We are all sad that Google Reader
doesn't exist anymore, but every silicon product has basically been a flash in
the pan. They make it, you buy it, and by the time it's shipped to you, it's
announced as obsolete. That's the pace of that industry. Maybe with Google's
attention span, they should have been a hardware company all along. They will
fit right in.

~~~
peterwwillis
My example was inaccurate, because at least dead Replicants can be replaced
with newer models, whereas dead Google products have no follow-up model and
require completely replacing what you had created around that product. That's
something seemingly unique to either vaporware start-ups, or Google.

