Hacker News new | past | comments | ask | show | jobs | submit login
YOLOv5 on FPGA with Hailo-8 and 4 Pi Cameras (fpgadeveloper.com)
145 points by geerlingguy 3 months ago | hide | past | favorite | 60 comments



After years of wondering, I have to ask.

What are real life actual useful cases for this tech?

I can imagine in manufacturing: detecting defects or layout mismatch - that's one.

Is there any open source project that uses a image recognition library to achieve any useful task? All I've seen from board partners seem to at most provide very simple demos, where a box with label is drawn around an object. Who actually is using that information, how and for what?

I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK and still have a very hard time defending this tech in eyes of coworkers that only see this as a surveillance tech


Great question! I work for a computer vision company (Roboflow) and have seen computer vision used for everything from accident prevention on critical infrastructure to identifying defects on vehicle parts to detecting trading cards for use in video game applications.

Drawing bounding boxes is a common end point for demos, but for businesses using computer vision there is an entire world after that: on device deployment. This can be on devices like an NVIDIA Jetson (a very common choice), to Raspberry Pis to central CUDA GPU servers for processing large volumes of data (maybe connected to cameras over RTSP).

Note: There are many models that are faster and perform better than YOLOv5 (i.e. YOLOv8, YOLOv10, PaliGemma). Roboflow Inference that our ML team maintains has various guides on deploying models to the edge: https://inference.roboflow.com/#inference-pipeline


Can you go into some examples?


I use object detection to track tardigrades in my custom motorized microscope. It's very useful for making long observations in a field much larger than the scope's field of view.

The system works quite simply: I start with an existing object detector and train it with a small (<100) number of manually labelled images. Then during inference, I move the scope's field of view using motor commands to put the center of the tardigrade at the center of the field of view.

This technology is very useful for doing long-term observations of tardigrades (so, useful for science).


Thank you! Is the detection accurate enough, or simply the observation conclusions are not that sensitive to minor errors?

That makes me want to revisit my previous idea: boiling soup spillage detector. I once had a google meeting with a cooking soup to keep an eye on it and thought, heck, that seems like a nice exercise for a visual detector finetune


The detection was accurate enough for me to complete one prototype experiment under controlled conditions- a single tardigrade in an otherwise empty field, and even then, it did lose the tardigrade once or twice. Different lighting conditions, and other things in the field like tardigrade eggs, algae and dirt all make it more challenging.

To make it truly ready for production science, I'd need to put more work into making the model robust. I'd also like better object tracking, so I could track multiple unique tardigrades.

If you want to see even better examples, take a look at DeepLabCut, https://www.mackenziemathislab.org/deeplabcut especially the video examples.


Frigate uses models like this one for NVR:

https://frigate.video/


This hailo 8l is extremely useful in mobile robotic systems particularly when paired with an rpi 5. The main board used in this demo is.... not particularly useful however. Generally speaking boards like fpga miss the forest for the trees. As a systems engineer, where am I supposed to put such big board? Also I can expect that, even if my team incorporated this board into a mobile system, it become vaporware well before we deploy anything due to low production numbers and we'd be paying ebay scalpers 2x as much because our distributors wouldn't carry them anymore lol.

As for the rpi 5 combination, the power draw is relatively low. The whole thing clocks in at about 14 watts on an rpi 5 which allows us to run this platform off of a battery. With 26 tops, this setup can contend with the jetson xavier nx (21 tops, ~$500) and the jetson orin nano (40 tops, ~$500) for for a cost of around $170. Furthermore the cpu on the rpi 5 is generally more performant than the xavier nx.

Specifically, this is an excellent vision module for real-time object detection from multiple cameras if setup properly, while maintaining access to the prolific raspberry pi hardware ecosystem which is typically cheaper than the jetson ecosystem.


I'm working on a project that detects climbing holds and lets you set routes by selecting them. (The usual method is putting a bit of colored tape on each hold, where the color corresponds to a route. This works great but becomes difficult to read once more than four or five routes share a hold.) YOLO made the computer vision part of this pretty smooth sailing.


Embed the holds with a led and ir sensor a la swift concert [0] and you’ve got the whole package.

0. https://news.ycombinator.com/item?id=40492515


As you guessed, high-speed machine vision stuff is frequently used in manufacturing settings for sorting or various quality control tasks. Imagine a picking out bad potatoes on a conveyor belt moving 10s of potatoes per second, or identifying particle counts and size distributions in a stream of water to gauge water quality.


Plus, being a NN it might be possible to detect a foreign object with relative ease (comparing to the classic computer vision); like a rat


Behavior outside of the training distribution is undefined and more often than not desirable. NNs work well on stuff they’re trained on.


I've been very persistent over the past few months in developing a system for agriculture as a primary use case. I want to deploy features to classify crop type, height, vegetation stage, and other important metrics to achieve real-time or near real-time analytics.

Do you have any suggestions on how to proceed further? So far, I've procured a Jetson, five cameras, a stand to fix and calibrate the modules, and a cam array hat to equip four cameras and the jetson. I was checking out VPU and NPUS and other hardware as well but struggling to identify compatible hardware. How can I get ahead and build such model to test and validate in 3 Months of time ?


I've used YOLOv5 models in robotics for object detection. While VLMs are great at describing images more generally, having a bounding box of a detection with a confidence score is very useful when paired with depth cameras for locating objects in an open environment. Especially when it can be run on-board and at framerate.


Field mice that I thought were moles have destroyed my yard. There’s so many tunnels that I can’t tell which are most active. A camera AI that can show which parts of the ground changed significantly would be nice.

At a hotel, we had a problem of luggage carts going missing. There’s a few ways to deal with that. A generic one that would support other, use cases would be to let the camera tell you the last room it went in. Likewise, outdoor cameras might tell you which vehicles had a customer walk in the hotel and which might be non-guests.


> I've also been a part of the Kinect craze and made 3 demos (games mostly) using their SDK

I ought a Kinect but never got it working correctly with a Mac. What is the state of the art here ? Is there active development anywhere ?


To surveil people in the streets.


Yeah, that's what I'm afraid


You don't need to be afraid, this is already in use in Singapore and China, for example. In the US not so much with persons as far as I know, but certainly with cars.


Some of the people didnt look too happy about you filming them, then you went and put them online for the world to see, classy.


Breaking: people in public don’t want to be seen by the public.

It’s a detailed breakdown of a technically impressive project and your main takeaway is the 5 seconds in the demo where a guy covers his face?

Kudos to the author for making something neat and sharing it.


Breaking: people in public don't want to be recorded and uploaded on the internet for millions to see.

Crazy, right?


Well yeah, it is kind of crazy.

Any scenario in which undo harm is brought upon someone because they were a passerby on the street in a video, is a such a reach that you have to question how grounded in reality they are. It's some deep "I'm the main character" level thinking.


A good point that shouldn't be dismissed so quickly. This tracking also has little use beyond surveillance, so one has to wonder what is it that the author thinks they are doing, or why they think it's interesting or useful. That they then go ahead and film a crowd without their consent is more telling of their position on this moral question than it is any direct harm to the people in the video.


That's how they score gov contracts - the safest and quickest way to get rich. And oh boy do governments love control


Oh, the horror. Holy shit was that Jason walking by? He told me he was in Alaska...I'm going to need to have a talk with him. Thank goodness I ran across this video.


That looked interesting as a self-hostable project until it got to the requirement for a $3200 AMD board. Maybe the price will come down one day...


The ultrascale is definitely pricy for non-industry applications. The FPGA design is probably larger because of the 4x camera pipeline.

Perhaps with a single camera you could port this to fit on a Zynq 7000 footprint with something like a Pynq Z1 or Numato Styx, which are in the $250 hobbies price point for example.


No chance does the Zynq 701x series perform capably for this task: it's weak Artix fabric and there's only ~50kLUT on the common parts. Even fully pipelined designs, especially those that interface with the hard AXI bus, will struggle to clock above 50MHz.

The Zynq Z7030+ chip would probably be able to get the job done but aren't as common - https://www.lcsc.com/product-detail/Programmable-Logic-Devic.... The Kintex 410T-480T are available new <$160 from reputable vendors, used <$50 from less-reputable vendors - they do have the performance (and overall IO) for this task.


Depending on what you're trying to do you might not need it. A popular option for object detection people use with their home hosted video surveillance (using Frigate and Home Assistant) is the Coral TPU and Raspberry Pi or some decommissioned thin client.


I did consider the Intel Neural Compute Stick to accelerate with OpenVINO but they're discontinued and it turns out I can get away with doing fewer detections (birds, not people) by doing a pre-filter of motion detection (reduces the number of frames through OpenCV's DNN by 10x).


Google Coral seems abandoned by Google. Not official but last news on their page from May 2022


I've heard from the folks at Pineboard[1] there's been some new activity, as this year manufacturing ramped back up... it's still decent hardware but is getting long in the tooth.

Regarding the Hailo featured in the article, a few of us have been messing with it on a Raspberry Pi 5[2], and it offers more performance in a similar power envelope. The major downside is availability. I can buy the Coral on many electronics supplier sites, but Hailo seems to be selling through 'Product inquiry' right now, which is not easy to navigate as an individual!

[1] https://pineboards.io

[2] https://pipci.jeffgeerling.com/cards_m2/hailo-8-ai-module.ht...


Any good write up on Hailo and similar ?

BTW, is it possible to have 10 of these connected to a single board/cpu ?


Maybe not 10 but in principle it’s possible: https://www.lannerinc.com/products/edge-ai-appliance/deep-le...



~400kLUT is very obtainable these days - if you don't need the ARM cores:

https://www.lcsc.com/product-detail/Programmable-Logic-Devic...

(420T/480T are priced similarly new.)


At the moment I am wondering if I could build an accelerator for george hotz' tinygrad[1] with cheap FPGAs (i do have an arty a7 35K, this might be too small, i guess?). According to the readme it shall be "easy" to add new accelerator hardware. Sadly my knowledge is still a bit limited around all the python-machinelearning-ecosystem, but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.

Didn't have enough time to dive into it yet and still working on some other project, but this still tickles the back of my head and would be cool even if I could only run mnist on it.

- [1] https://github.com/tinygrad/tinygrad/


> but if I understand it correctly you "just" need an openCL kernel and need to be able to shove the data back and forth somehow.

To use it with an FPGA accelerator you also have to build all the "hardware" to run said openCL kernel efficiently, manage data transfer, talk to the host, etc for the FPGA. This is very foreign if you're only used to doing software design and still very nontrivial even if you've done FPGA work, though I think there are some open hardware projects around doing this.


Yeah, i think the, especially efficient, data transfer would be my biggest enemy. I did quite a lot FPGA stuff in recent years, mainly CPU design stuff, a little pipelining and some hardware-software-codesign. So I should just see it as a toy project, start with said mnist to get atleast something running, don't matter if it's efficient and then work my way up. For example I haven't done anything PCIe related yet, but I guess there is enough IP available


Transceivers are the only option for high-speed connectivity - this means PCIE or 100GbE.


What is the ultimate delivery after all this work? Did it correlate/track the same people across multiple video feeds, for instance?


It's line speed processing of multiple cameras in HW - it should be less power consumption than equivalent GPU or Jetson.


Opencv, YOLO, and darknet were some of the best tools back when I was doing research projects with this. While it is way more efficient compared to say feeding it into a RAG LLM.....it is process intensive.

Its hard to handle multiple data streams of video, and I think I maxed out around 20 camera feeds before the computer slowed to a crawl. NVME storage and better internet would defiantly push the limit on what is possible, but unfortunately it is impossible to get good internet where I live. Also, cameras are not cheap and neither is flash storage. But I do know this stuff sees use in both commercial and I bet they use some better version for the defense industry.


What's a good (production ready) setup?

I'm thinking of an external camera (weather resistant) and hardware. The hardware could be a small computer that connects to the camera (maybe with wifi?), and runs the YOLO model.


That already exists. Lilin is one of several CCTV companies implementing on-camera YOLO. Axis, Hanwha, and i-Pro all have options for you to run your own software/models on camera as well.


Ok, thanks. I'll look into that. I hope that they don't require some specific abstruse format for the models!


Great writeup. Always a treat to see more high quality FPGA project postmortems, even if they aren’t using accessible parts/toolchains.


I am not familiar with this NN accelerator.

Does anyone have a comparison between Hailo and, say, a mid or high-end GPU or a TPU?


According to [1] the manufacturers claim 2% of the TOPS of a RTX 4090, and only 0.8% the power consumption.

$200 in prototype quantities [2] which is 12% the price of a 4090 - but perhaps the price drops when you order them in bulk?

They claim it compares favourably to an 'Nvidia Xavier NX' for image classification tasks, providing somewhat more FPS at significantly lower power consumption. 218 fps running YOLOv5m on 640×640 inputs.

They're completely silent about the amount of memory it has, but you can fit int8 YOLOv5m into about 20 MB so it'll certainly be an amount measured in megabytes rather than gigabytes.

Their target market is "CCTV camera that tracks cars and people" rather than "Run an LLM" or "train a network from scratch"

[1] https://hailo.ai/products/ai-accelerators/hailo-8-ai-acceler...

[2] https://up-shop.org/hailo-m2-key.html


From what I know it doesn’t have memory but streams everything to the chip. So there’s no limit regarding the size of the the neural network (unlike Google Coral)


Some of their marketing mentions no need for external DRAM, for example, from this CNX article[1]:

"One of the key reasons for the performance improvement is that RAM is self-contained without the need for external DRAM like other solutions. This decreases latency a lot and reduces power consumption."

Not sure how much RAM is included on the chip, but I'm also thinking in the tens of MB range, certainly not gigabytes.

[1] https://www.cnx-software.com/2020/10/07/learn-more-about-hai...


PCIE is slower than DRAM, so not having DRAM is not a performance improvement, but a hardware limitation reducing the scope of usability. Though it doesn't mean bad - cheaper to produce, serving its own use cases.

Process a stream of data - yes. Machine learning on large set of data - no.


I think what's implied by their marketing at least is that DRAM is included on the chip itself (unlike other solutions like GPUs with external RAM chips)... but the problem is that limits the amount of available RAM quite severely.


Maybe? If a 20 MB network achieves 218 fps that'd need 4.3 GB/s of bandwidth just to stream the network, completely ignoring the images. And they use PCIe Gen3 which is 1 GB/s per lane, with different products having 1, 2 or 4 lanes.

So... maybe just about?


Hailo is one of the newer GPU startups, so not surprising that many people have not heard of them yet. So far price/performance/power consumption of the Hailo products seems to be filling a rater large gap between the Amba stuff that is very well suited for 1-4 camera streams in a typical SoC-based device, and the Jetson, which is really kind of overpriced and power hungry for a lot of video applications (at least IMO).


Very cheap and power efficient if you're willing to run in int8/int4 and have unlimited time and patience for development


It’s a competitor to Google Coral (seems abandoned) and NVIDIA Jetson. I’ve been using it for more than a year and the hardware seems to be one of the best on the market. The software (how to actually do inference on the chip) is subpar though.


This makes me think of monkeys cheerfully building their own cage.

Sorry, but I can’t just ‘cool-project-bro’ this one. Does nobody else have the faintest misgivings about where we’re at right now: human surveillance as just scratching a technical itch?

Apologies if this comes over all grumpy, but wow. Seriously.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: