
YOLO: Real-Time Object Detection - headalgorithm
https://pjreddie.com/darknet/yolo/
======
natvert
Interesting this is trending now. We have actually just recently released an
improved version of YoloV3 (called G-Darknet) [https://github.com/generalized-
iou/g-darknet](https://github.com/generalized-iou/g-darknet), using GIoU as a
loss, which is described here:
[https://giou.stanford.edu](https://giou.stanford.edu)

Also notable in G-Darknet are some tools useful for training (called
darkboard), see [https://github.com/generalized-
iou/g-darknet/tree/master/dar...](https://github.com/generalized-
iou/g-darknet/tree/master/darkboard#setup)

~~~
w-m
Heads up: the boxes are drawn in the wrong places using Firefox 66 on Ubuntu
18.04. [https://imgur.com/a/4d51spv](https://imgur.com/a/4d51spv)

A bit confusing as the drawn boxes don't match the text. Works with Chromium
though.

~~~
natvert
Thx, I'll check that

------
syntaxing
Suprised to see this here since YOLO has been out for a while now. Shameless
plug, I wrote an article on how to use use transfer learning on your custom
dataset with the pretrained weights [1]. One of the downside of YOLO is that
it uses his own deep learning library darknet. I find that the Tensorflow port
dark flow easier to use but it haven't seen a v3 port yet.

[1] [https://www.powu3.com/ml/yolo/](https://www.powu3.com/ml/yolo/)

~~~
joshvm
There is a pytorch port from Ultralytics
([https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3)).
Nobody seems to have figured out how to achieve the training performance of
darknet though, which is entirely uncommented C. The source is all there, but
the loss function changed between v2 and v3, and its not documented in the
paper. I think it's been fixed in that pytorch port now though. The only
frustrating thing is that every commit in the repo is called update...

Alternatively... you can train in darknet and then run inference in another
framework of choice.

Also shameless plug: I wrote an annotation tool which is designed to output
darknet formatted labels:
[https://github.com/jveitchmichaelis/deeplabel](https://github.com/jveitchmichaelis/deeplabel)

~~~
syntaxing
Yeah, I don't remember where I read it but it took them a couple weeks to
train the model from scratch. I tried training my own weights by scratch it
was practically impossible using a Tesla K80. But it's pleasantly surprising
how good the transfer learning results are on a custom data set. You can get
some "state of the art" results when you train for a couple hours. It's really
impressive how he came up with YOLO and wrote his own deep learning library
from scratch.

Thank you for the links! I'm going to check both out. I want to see if the
PyTorch port works with the new deployment feature from 1.0.

------
geofft
The YOLOv3 paper is pretty delightful:
[https://arxiv.org/pdf/1804.02767.pdf](https://arxiv.org/pdf/1804.02767.pdf)

~~~
turbinerneiter
It's honest, instead of the inflated bs you usually have to spin around your
half working experiments.

~~~
roryokane
For another example of such honesty, see the paper “HonestNN: An Honest Neural
Network ‘Accelerator’”, from the SIGBOVIK 2019 proceedings:
[http://sigbovik.org/2019/proceedings.pdf#page=107](http://sigbovik.org/2019/proceedings.pdf#page=107).
I love that paper.

------
daenz
YOLO, no! [https://i.imgur.com/R1RZ2N0.png](https://i.imgur.com/R1RZ2N0.png)

Jokes aside, we need better temporal consistency, especially when we start
arming AI. citizen -> citizen -> citizen -> armed insurgent

~~~
yellowapple
With that particular example, a citizen could very well also be an armed
insurgent. Whether that citizen/insurgent is an ally or neutral or enemy is
the distinction worth solving (even if it's significantly harder for an AI).

Of course, that matters far less when Skynet decides that every human is a
hostile armed insurgent...

~~~
nan0
I am assuming that would be solved by having the AI also take in inputs of
where your troops and allies are located. Perhaps with something like the Blue
Force Tracker [0].

One of the first priorities of an operation is not knowing where your enemy
is, but where you are.

0: [https://www.viasat.com/products/blue-force-
tracking-2](https://www.viasat.com/products/blue-force-tracking-2)

------
stared
Having worked with YOLO, I really recommend this intro:
[https://blog.paperspace.com/how-to-implement-a-yolo-
object-d...](https://blog.paperspace.com/how-to-implement-a-yolo-object-
detector-in-pytorch/). And in general, YOLO is performant and at the same
time, it has a simpler architecture than the Fast(er) R-CNN family.

And in general, due to its head, it is WAY more readable in PyTorch than in
TensorFlow; to the point, I use it as an example in Keras vs PyTorch example
[https://deepsense.ai/keras-or-pytorch/](https://deepsense.ai/keras-or-
pytorch/) (was here at some point).

------
deepsun
It still seems to be using only the single frame, without past/present
context. E.g. a dog sometimes is recognized as teddy bear for a split second.

Is there any "continuous" models for that? Sounds like a simple bayesian post-
processing would do a great deal (e.g. recording the probability of dogs
mutating to teddy bears as very low).

~~~
JasonGenova25
YOLO stands for "You Only Look Once" so I don't think this will ever become
"continuous"

~~~
saynay
AFAIK, the 'Look Once' part refers to other systems that re-ran a section of
the frame at a time through an object detector, resulting in a lot of
reprocessing.

You could still look only once, but have that look include multiple sequential
frames. Or do something like an LSTM of frames.

~~~
JasonGenova25
Good point. I hadn't considered this.

------
barrystaes
[https://pjreddie.com/media/files/papers/YOLOv3.pdf](https://pjreddie.com/media/files/papers/YOLOv3.pdf)

Sounds to good to be true. Also reads like that. :) A gem from this paper:

 _But maybe a better question is: “What are we going todo with these detectors
now that we have them?” A lot ofthe people doing this research are at Google
and Facebook.I guess at least we know the technology is in good handsand
definitely won’t be used to harvest your personal infor-mation and sell it
to.... wait, you’re saying that’s exactlywhat it will be used for?? Oh.Well
the other people heavily funding vision research arethe military and they’ve
never done anything horrible likekilling lots of people with new technology oh
wait....._ 1

 _1 The author is funded by the Office of Naval Research and Google._

------
pjreddie
I don’t really understand how this is any different from Overfeat or SSD...

~~~
elephantum
YOLO is a combination of backend encoder and head. Backend (Darknet) is unique
to YOLO, that would be the main difference with SSD, if I'm not mistaken.

~~~
_Wintermute
You might want to check the username of who you're relying to.

------
edshiro
YOLO is a very good and approachable object detection technique. I recently
re-read the paper for the original YOLO [1] from 2015 and loved the apparent
simplicity of this technique.

As a shameless plug, I wrote an intuitive guide to understanding SSD (Single
Shot Detector), another popular object detection technique:
[https://towardsdatascience.com/understanding-ssd-multibox-
re...](https://towardsdatascience.com/understanding-ssd-multibox-real-time-
object-detection-in-deep-learning-495ef744fab)

[1] [https://arxiv.org/abs/1506.02640](https://arxiv.org/abs/1506.02640)

------
maverick384
It seems that the commercialized version of this technology is here:
[https://www.xnor.ai/technology/](https://www.xnor.ai/technology/).

 _Xnor 's founding team developed YOLO, a leading open source object detection
model used in real world applications. We use a proprietary, high performance,
binarized version of YOLO in our models for enterprise customers._

Too good to be true? Seems that they're running YOLO on conventional multi-
core CPUs. On ARM even.

------
godelski
This guy gave a talk at my university a few weeks ago. He did some live
demonstrations and I was really impressed. With a video camera he did live
detection in the room and was classifying dozens of objects. Like the screen
was filled with identification boxes. He also did a demo where he used his
cell phone. Not as many classifications, but still about a dozen.

Everyone was pretty impressed. I'm always impressed when I see live demos go
(almost) flawlessly.

------
gusdeboer
It's hilarious the main video detects a dromedary as three cows at 3:26

------
olalonde
If I recall correctly, Andrew Ng covers this in his CNN course［0］ and
implementing it is one of the exercises.

［0］ [https://www.coursera.org/learn/convolutional-neural-
networks](https://www.coursera.org/learn/convolutional-neural-networks)

------
abledon
Whats the best route to deploy a python YOLO system to a desktop app? E.g.
have .zip file you extract, install, then run - everything is included ,
tensorflow/keras libs ... no need for user to setup envronment with conda
yadda yadda

~~~
quietbritishjim
At the risk of incurring HN's wrath: Docker is an option. Another is to use
C/C++ instead of Python and statically link it. Either way, if you want to use
the GPU you'll have a world of pain with NVidia stuff.

------
amelius
Can we also get the _orientation_ of each detected object?

~~~
indutny
With some changes - yes. I did this in my experimental project:
[https://github.com/indutny/resistenz/blob/master/python/mode...](https://github.com/indutny/resistenz/blob/master/python/model.py#L102-L106)

The idea is to add an extra 2 params to the output of each classifier cell.
Then do L2 normalization on them (
[https://github.com/indutny/resistenz/blob/master/python/mode...](https://github.com/indutny/resistenz/blob/master/python/model.py#L268)
) and treat them as a cosine/sine pair.

The loss in this case would be the Euclidean distance between the actual and
predicted pairs, which is equal to "2 * (1 - cos(x-y))".

------
darepublic
I cannot get YOLO to detect at 30fps, even on gpu machines. This was true when
I tried keras yolo as well as following the instructions for c compilation on
this page.

~~~
someguy1234567
720p webcam with cuda is about 90 for me.

------
tango12
Awesome stuff!

I understand the benefits (as mentioned); would be interesting to know what
disadvantages this has compared to the classifier type detection methods?

------
dacox
Great project, but pretty old now.

~~~
joshvm
Yolov3 is about a year old and is still state of the art for all meaningful
purposes. It's fast and works well. You might get "better" results with a
Faster RCNN variant, but it's slow and the difference will likely be
imperceptible. Using map50 as pjreddie points out, isn't a great metric for
object detection.

~~~
worldexplorer
Interestingly in our production systems yolo object detection speed was much
faster and accurate.

------
sabujp
those are some awesome humped horses and cows. The police brutality scene was
cool also.

------
mtw
Isn't this from 1 year ago?

------
apoph3nia
Comedy option: every detected object is labeled with the word "noumena"

