
Embed AI into Projects with Nvidia’s Jetson Nano - samizdis
https://spectrum.ieee.org/geek-life/hands-on/quickly-embed-ai-into-your-projects-with-nvidias-jetson-nano
======
woofie11
Does anyone know where I can find good benchmarks? Especially how the CPU
compares (I don't expect to be GPU-bound).

A projects page would be neat as well, to see if anyone has done anything
similar to what I want to do (simple processing on 3-4 video streams -- will
IO keep up?).

~~~
joshvm
I've spent the last couple of years working on an edge multicamera system
(drones). Even at a fairly low resolution, eg a 2MP RGB cam and a VGA thermal,
you run into problems. Depends somewhat on how much cash you have to spend on
the project!

From experience, IO and encode speed are by far the biggest problem in these
systems. You can do reasonably fast inference using a hardware accelerator on
any platform - Google Coral is very efficient, as is the Intel NCS. For many
ML applications, inference latency will dominate your framerate (ie it doesn't
matter if you run it off a Pi, as long as you use USB3). But again, there your
bottleneck is how fast you can feed the accelerator.

Until recently, with more platforms offering USB3, actually saving data was a
challenge. USB2 just isn't fast enough for high bandwidth multi-camera video
and you quickly hit the write buffer on cheap USB storage. Forget SD cards
unless you have low level access. Nowadays you can at least use an external
USB3 SSD or even M.2 on some of these boards. In the past I resorted to tricks
like buffering frames into RAM and then dumping to the card in one go to avoid
doing random writes.

Very few other platforms in this price range can offer encode specs like the
Jetson. The Pi can do one h264 stream which is representative. The RK3399 can
also do one stream at 1080p/30 - though I've never managed to make it work
better than 15. A reasonable solution, actually, is to buy a Pi for each
camera and hardware synchronize them (or not, if it's not important).

Then camera support, if you go CSI then usually this stuff is pipelined into
the ISP. If it's a UVC camera or whatever then usually you're going to be
using Gstreamer. If the board claims to support N streams, that should all be
done over CSI totally on the GPU without any significant CPU load. I'm not
sure if that's what you're after, since you ask for CPU benches, but none of
these platforms can cope with even a couple of HD streams on CPU only. You
need hardware encoding.

Processing is not saving though. You may be able to process video using CPU
only, but if you want to do something with it later, you'll need to encode and
transmit or store it.

The downside is there aren't that many CSI cameras available off the shelf and
they can be pricey, for no real reason other than economy of scale.

~~~
woofie11
I don't have a lot of cash for this project. I also don't want a lot of size
or watts. I am streaming educational materials live. Since it's a livestream,
I don't need to capture or store anything, and output resolution doesn't need
to be better than 720p.

I have around 5 streams coming in, but I'm only using 2-3 of them at a time
(so the rest can be disabled, but it's nice if I don't get USB errors when
switching things around). These are 720 through 4k resolution, MJPEG (YUVY
overwhelms USB). Right now, I'm using an Intel NUC, which works adequately,
but can't do anything else reliably at the same time. If I open up an
application, I get glitches. I'd like something cheap and small to offload
processing to.

All I really /need/ to do is crop, downsize and things like picture-in-
picture. On the other hand, I do some fancier processing as a nice-to-have
(greenscreen, as well as some fancy nineties-level machine vision -- edge
detection and the like). Right now, this is OBS, but I could move this to
Python+OpenCV pretty easily (and probably ought to; I'm bumping into OBS
limitations).

~~~
joshvm
Does the NUC support hardware acceleration? Otherwise you could just use a Pi
4 for this and run all your cameras through gstreamer, which supports the ops
you want - I think you can even do PiP. One thing I don't know though, are you
going to have to decode the streams, overlay and then encode again?

You could also look at getting a Jetson Nano which is a bit beefier, and you
can offload some of this stuff onto CUDA.

Easy to try the opencv route, but you need to be a bit careful on ARM (make
sure you compile yourself with all the optimisation flags on).

------
Avalaxy
Does anyone know how this compares to the megaAI chip which seems to be for a
similar use?
[https://www.crowdsupply.com/luxonis/megaai](https://www.crowdsupply.com/luxonis/megaai)

~~~
nl
[https://medium.com/@aallan/benchmarking-edge-computing-
ce3f1...](https://medium.com/@aallan/benchmarking-edge-computing-ce3f13942245)
has benchmarks comparing a previous version of the Intel Myriad accelerator
with Jetson Nano and Coral accelerator.

TL;DR: The Myriad accelerator was about twice to four times as slow as Jetson
or Coral.

------
ngcc_hk
Need a shopping or part list plus a github ... given "not counting the time I
spent waiting for various supporting bits and pieces to arrive in the mail
while isolated from my usual workbench at the Spectrum office!"

~~~
IrishJourno
This was mainly stuff like the pan/tilt head, the power supply, and some
headers and 40-pin sockets

------
wokwokwok
Isn't this from early last year?

The Jetson Xavier NX is the one released this year.

~~~
ebrewste
That is true. The Nano is $99, where the Xavier is $399. That changes the game
for some use cases. I suspect the Xavier still won't see as much press because
of that, though it doesn't change much of the core message -- GPU based
compute at the edge.

------
jbverschoor
Too bad the storage is sdcard.. Would've been more robust if it's some ssd,
with block-device emulation for easy transfers.

~~~
hobo_mark
The Nano baseboard has an nvme slot.

------
aww_dang
link returns status code 418 for me

