
How to Easily Detect Objects with Deep Learning on Raspberry Pi - sarthakjain
https://medium.com/nanonets/how-to-easily-detect-objects-with-deep-learning-on-raspberrypi-225f29635c74
======
prats226
I am Prathamesh, co-founder of [https://nanonets.com](https://nanonets.com)

We were working on a project to detect objects using deep learning with
raspberry pi and we have benchmarked various deep learning architectures on
pi. With ~100-200 images, you can create a detector of your own with this
method.

In this post, we have detected vehicles in Indian traffic using pi and also
added github links to code to train the model on your own dataset and then
script to get inference on pi. Hope this helps!

~~~
thedirt0115
Hi, I have a question about the performance benchmarks section. The best
performaing model from your benchmarks, ssd_mobilenet_v1, has a prediction
time in seconds of 0.72 -- Is that the total runtime of the script? I'm
wondering if I could achieve ~1 FPS running in realtime (basically looping as
fast as possible against the camera input), or is there more overhead? (edit
-- made question more specific)

~~~
prats226
Yeah its total runtime of script. However you can get upto 3-4 FPS with more
optimizations. We are going to try more quantization options soon with release
of tensorflow 1.7 and will report our findings (Will post updates in blog).
Also pi camera code needs to be optimized which will further increase FPS. One
big advantage of this method is just by collecting ~200 images for any use
case, you can have detector ready in couple of hours.

One advantage of using API based approach is you can get much higher FPS
without compromising accuracy and is also independent of pi CPU power and
heating etc.

~~~
jd20
Curious, have you looked at any other embedded platforms besides the Pi? The
Tegra might be an interesting comparison point, I wonder what kind of FPS the
onboard GPU would buy you.

~~~
prats226
Hey, good suggestion to benchmark against other SOC's as well. I heard
raspberry pi recently added support for external graphics card. Haven't tried
yet.

------
Aspos
How to draw an owl, yeah [http://i0.kym-
cdn.com/photos/images/original/000/572/078/d6d...](http://i0.kym-
cdn.com/photos/images/original/000/572/078/d6d.jpg)

~~~
sarthakjain
Thanks for feedback. What could we possibly to do make it easier to follow the
intermediate steps?

~~~
Aspos
Say I have a bunch of photos of my cat. Want to be able to use Raspberry PI to
recognize my cat from the others. How can I create the dataset to feed to your
ML engine? Would love to see end-to-end how-to, seriously.

~~~
c54
actually, you might try this: [https://azure.microsoft.com/en-
us/services/cognitive-service...](https://azure.microsoft.com/en-
us/services/cognitive-services/custom-vision-service/)

lets you customize a neural net with a small number of specific images (using
a technique called 'transfer learning')

~~~
prats226
We also use transfer learning a lot. Using transfer learning is but tricky
because sometimes you might upset generalised weights of pretrained network
with bad hyperparameters. We had written a blog before for using transfer
learning as well

[https://medium.com/nanonets/nanonets-how-to-use-deep-
learnin...](https://medium.com/nanonets/nanonets-how-to-use-deep-learning-
when-you-have-limited-data-f68c0b512cab)

------
bronco21016
Maybe I’m missing something but does this blog post conclude with a service to
do inference off device? Why explain all of the steps to inference on device
if you’re offering an API to do cloud inference?

~~~
prats226
Off-device and on-device are alternatives to do deep learning inference on pi,
both with pros and cons. For an example, with on-device inference, you will
need to run a smaller architecture to get decent FPS and will also be
dependent on hardware. Using cloud API removes those restrictions but there
will be some latency in the web request however you can use much more accurate
model and will be independent of pi hardware. Just trying to paint a complete
picture. Any suggestions for blog post are welcome.

~~~
SahAssar
> Maybe I’m missing something but does this blog post conclude with a service
> to do inference off device?

You didn't answer this question. Your "sorta-answer" suggests "yes", but the
title "How to easily Detect Objects with Deep Learning on Raspberry Pi"
suggests that your answer should be "no".

The title wasn't "How to easily Detect Objects with Deep Learning on Raspberry
Pi with cloud services".

~~~
bronco21016
Exactly. I’m certainly interested in “How to easily Detect Objects with Deep
Learning on Raspberry Pi”. Because I’m interested in that, I am most
definitely not interested in “How to easily Detect Objects with Deep Learning
on Raspberry Pi with cloud services”.

~~~
sarthakjain
Then I hope you found the first 90% of the blog interesting. If there was
something we can imporve happy to hear about it.

------
anonfunction
The two pricing tiers for the hosted API don't seem practical for real usage.

$0 for 1,000 slow API calls

$79 for 10,000 fast API calls

To put that into perspective the 10k API calls is less than 10 minutes of 24
fps video. You should have a much higher plan or pay per request overage
price.

~~~
bigiain
To be fair - there's probably not a whole lot of real world use cases that
aren't highly specialised where there's a requirement to run object
detection/identification on every single frame of a 24fps video.

If you want to run hours or days worth of video through an object detector -
you probably want to go out and buy a gpu and machine to stick it on of your
own...

I'm curious as to what the application you're thinking of where this seems
like "real world usage"? (I can imagine applications like vision-controlled
drones, but I'm pretty sure places like ATH Zurich have better solutions (as
in "less generalised and more applicable to drone control") and in-house
hardware to train and run it on.)

~~~
anonfunction
There are plenty of real applications for inferring every frame of video. Any
real time monitoring application would run all the frames, even if you cut it
to 1 FPS with multiple video sources the monthly price doesn't make sense.

One application would be nudity detection for a family friendly site, lots of
video would need to be checked.

The argument that you would want to run your own machine validates my point.
However the same could have been said for video encoding or any other form of
intense processing which all now have cloud alternatives.

~~~
bigiain
OK. I don't think this is the solution you're after if your problem includes
"crowd sourced video".

Nudity detection though - I'd probably at least try doing something like
"Check every 50+rand(100) frames, and only examine more carefully if you get
hits on that sampling". Sure - that's "game-able" \- but subliminal nudity
isn't something I'd expect trolls or griefers to expend too much effort to
slide past your filters...

------
mindhash
Like the direction you are headed. Considering that use of ASICs is going to
rise, think you should consider local installs through docker(like
machinebox.io) or another technique. Also federated learning would be next
thing to take on.

~~~
fulafel
The RPi is an asic based device, no? A web search yields no other applications
of BCM2837B0.

~~~
thesmok
No, it's still a general purpose ARM CPU. A good example of ASIC would be a
bitcoin mining device.

~~~
fulafel
Quoth WP: "Modern ASICs often include entire microprocessors, memory blocks
including ROM, RAM, EEPROM, flash memory and other large building blocks. Such
an ASIC is often termed a SoC (system-on-chip). "

The ARM core on the SoC is part of the asic. As is the VideoCore part.

~~~
prats226
I think what OP meant was ASIC specific to deep learning like TPU's. However
as I see, current frameworks are not matured enough to support GPU's and TPU's
with exact same code. Also there are no standards so every big org is going to
build support for their own ASIC interfaces for the framework they manage. Is
there an open source interface for ASIC's for deep learning?

~~~
mindhash
Yes. I was talking about AI Chips (FPGA and ASICs - NN Processors).
[https://github.com/basicmi/Deep-Learning-Processor-
List](https://github.com/basicmi/Deep-Learning-Processor-List)
[https://www.nextbigfuture.com/2017/11/at-
least-16-companies-...](https://www.nextbigfuture.com/2017/11/at-
least-16-companies-developing-deep-learning-chips.html).

Cambricon is going big in china so its not just google and apples. They claim
to be 6 times faster than GPU.

I am more interested in potential of being able to run video processing, voice
models effortlessly on tiny devices. and also to train models offline or
locally.

I think there is a good scope of solutions (like vision recognition) that port
well across AI chips.

------
kalal
I am very skeptical about pretraining which seems to be the key point of
Nanonets. Sure, it will help to work work better than initialization from
random weights, but, you will always do better if you collect more data for
your problem. This may be fine for problems which do not need optimal
classification and fast performance, but I am struggling to see any use case
for that.

~~~
prats226
There is need of custom model in a lot of businesses like you want to identify
only a specific kind of product from rest of similar looking ones or find only
defective pieces and where you cannot collect 10's of thousands of images from
the beginning. Also pretraining is not only for initialization but also to
improve generalization with less data.

------
dboreham
Are we still worried about splitting the infinitive?

~~~
tzs
No. Most current grammarians and most style guides do not say that split
infinitives are prohibited. Furthermore, in the past when some grammarians
said they were not allowed, that were as many or more equally authoritative
grammarians who said there was no such rule, and those who did support such a
rule never were able to offer a good reason for it.

The only reason one _may_ want to avoid them now is that there were still
enough people who were taught from crappy elementary school textbooks that had
this bogus rule, and they will think your grammar is bad if you use split
infinitives (and they remember enough from elementary school to recognize
them).

Just trust your ear. If splitting an infinitive makes a sentence sound
clearer, do it.

If someone gives you crap, cite the Oxford Dictionary people [1].

PS: same goes for ending a sentence with a preposition. Sometimes it is
clearer to do so. In that case, do it! You can cite Oxford for this, too [2].

[1] [https://en.oxforddictionaries.com/grammar/split-
infinitives](https://en.oxforddictionaries.com/grammar/split-infinitives)

[2] [https://en.oxforddictionaries.com/grammar/ending-
sentences-w...](https://en.oxforddictionaries.com/grammar/ending-sentences-
with-prepositions)

~~~
dboreham
One less thing to worry about.

Haha, just kidding : one _fewer_ thing to worry about.

