
YOLO: Real-Time Object Detection - golanggeek
https://pjreddie.com/darknet/yolo/
======
siliconc0w
Hypothesis: All demos are improved with the addition of dragonforce.

~~~
reificator
Discovery: Blaring music of any kind, including my preferred genres, is
unpleasant when trying to watch a demo.

~~~
jayd16
Doesn't hold up in independent tests.

------
edshiro
Nice! Btw this guy's CV is awesome :) :
[https://pjreddie.com/resume/](https://pjreddie.com/resume/)

~~~
quickthrower2
I wouldn't hire someone who formats their CV like that. Employment is a
serious proposition, and ... Ah just kidding. Nice CV.

~~~
taneq
It's the kind of thing that you can only get away with if you're very, very
good. At that point it becomes a kind of power move, like showing up to a high
end interview in jeans and a t-shirt.

~~~
systemtest
Or if you pretend that you are very good. If you show up to an interview in
jeans and a t-shirt, people will think you are very good.

~~~
taneq
True, it can also be a form of peacocking. But in an industry where results
are easily tested it's much harder to get away with that.

~~~
BooglyWoo
In this case, the guy has been spearheading a very high performance,
innovative methodology for real-time object detection. Companies interested in
that task should be falling over themselves to hire him.

------
bangonkeyboard
TED talk:
[https://www.ted.com/talks/joseph_redmon_how_a_computer_learn...](https://www.ted.com/talks/joseph_redmon_how_a_computer_learns_to_recognize_objects_instantly)

------
deepnotderp
This is cool but outdated, there're other better papers now.

~~~
edshiro
What better papers would you recommend reading?

~~~
deepnotderp
Arxiv is thou friend :)

[https://www.google.com/search?q=fast+object+detection+deep+l...](https://www.google.com/search?q=fast+object+detection+deep+learning+arxiv&oq=fast+object+detection+deep+learning+arxiv&gs_l=psy-
ab.3...7628.7966.0.8024.5.4.0.0.0.0.102.315.3j1.4.0.foo%2Cnso-ehuqi%3D1%2Cnso-
ehuui%3D1%2Cewh%3D0%2Cnso-mplt%3D2%2Cnso-enksa%3D0%2Cnso-enfk%3D1%2Cnso-
usnt%3D1%2Cnso-qnt-npqp%3D0-1633%2Cnso-qnt-npdq%3D0-5608%2Cnso-qnt-
npt%3D0-1229%2Cnso-qnt-ndc%3D2051%2Ccspa-dspm-nm-mnp%3D0-06145%2Ccspa-dspm-nm-
mxp%3D0-153625%2Cnso-unt-npqp%3D0-1506%2Cnso-unt-npdq%3D0-4694%2Cnso-unt-
npt%3D0-061%2Cnso-unt-ndc%3D300%2Ccspa-uipm-nm-mnp%3D0-007625%2Ccspa-uipm-nm-
mxp%3D0-053375...0...1.1.64.psy-ab..2.0.0.Gv5RYBfHmfU)

and

[https://www.reddit.com/r/MachineLearning/search?q=object+det...](https://www.reddit.com/r/MachineLearning/search?q=object+detection&restrict_sr=on&sort=new&t=all)

There's been an absolute deluge of papers, I can't say I've kept up with them
all. There was one interesting one in particular was able to learn object
detection in an unsupervised manner in a novel improvement upon Bottou's "is
object detection for free" paper.

~~~
maffydub
To save someone googling, the referenced "Is object detection for free" paper
is at
[http://leon.bottou.org/publications/pdf/cvpr-2015.pdf](http://leon.bottou.org/publications/pdf/cvpr-2015.pdf)

I also just came across "Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks", which looks interesting and is at
[https://arxiv.org/pdf/1506.01497.pdf](https://arxiv.org/pdf/1506.01497.pdf)

(I'll be reading these after work tonight.)

~~~
nl
These are both older and not as good as YOLO9000.

~~~
deepnotderp
I know, they are old references.

------
eggie5
most of the top submissions to the last ImageNet comp. are tweaks on YOLO and
Faster RCNN.

But why is this trending now?

~~~
Oras
could be because more people are getting into DL and image recognition? I am
one of them so what might be old to some people is new info for beginners

------
abhisuri97
For the threshold value, how is 0.25 determined to be the most optimal? is
this value just specific to the datasets YOLO was trained on or is it
something that is universal.

Also, this is really awesome!

~~~
nl
Generally you learn a threshold value like that under cross validation.

------
ricardobeat
If you look closely at the video, the quality of detection is not really great
for anything besides `person`. I wonder if weighing objects by the rate they
are being found over N frames, and conforming to a somewhat linear trajectory
could improve results; feeding matches from previous frames could also help
find the same object in subsequent ones, and maybe even work out temporary
occlusion (thinking out loud here - probably way more complex than one would
expect).

~~~
aeleos
I'm not 100% sure about this but I think part of the reason why its called you
only look once is that it is single frame object detection. I do know that
there are other types of networks that can previous frames into account, but
the models themselves are much larger (in terms of VRAM requirements for
loading into memory) and more computationally intensive. This network is
special because it can run with very little power. From what I remember it can
run on a 10W board at 6fps which compared to networks only a few years ago is
10x decrease

~~~
Animats
It's clearly single-frame; you can watch recognition succeed and fail from
frame to frame. The "person" recognizer misses some clear faces, so it's not
heavily face-oriented. Since it's single-frame, it's not recognizing
articulated motion.

Recognition seems to be limited to "person", "motorbike", "tie", "cell phone"
(a gun, actually), "umbrella", "truck" (misidentified part of a train) "bench"
(a railing) and "horse" (motorbike with duffel seen from rear). "Person",
"umbrella", "tie", and "motorbike" seem to work; the others are kind of
random.

The trouble with running recognizers on Hollywood movies is that they have
many conventions of what appears on screen and how big it is on screen. Are
they training on such data?

Good data sets would be side views from a moving vehicle, like Google
StreetView data or just a GoPro pointed sideways while driving around.

------
plaguuuuuu
(a) very cool project

(b) ..interesting CV

------
shahar_m
The tutorial says "Edit src/yolo.c, lines 54 and 55" to train for different
classes, but I don't have any yolo.c file at src folder... any ideas?

[[https://github.com/pjreddie/darknet/wiki/YOLO:-Real-Time-
Obj...](https://github.com/pjreddie/darknet/wiki/YOLO:-Real-Time-Object-
Detection)]

------
hedgehog
This is good background but modern architecture are significantly better:

Learning Transferable Architectures for Scalable Image Recognition

[https://arxiv.org/abs/1707.07012](https://arxiv.org/abs/1707.07012)

~~~
0xbear
This is not "modern" though, the paper is quite old. He has improved it
significantly since then:
[https://arxiv.org/abs/1612.08242](https://arxiv.org/abs/1612.08242). Best
paper honorable mention at CVPR2017.

~~~
hedgehog
I missed that the site was updated for YOLO9000, it is very impressive. The
Google paper is recent though (July), maybe you're thinking of
[https://arxiv.org/abs/1611.10012](https://arxiv.org/abs/1611.10012)? Anyway,
what I meant to point out was that there has been a lot of work since on
improving the efficiency of the architectures used for feature extraction
(Xception, MobileNet, ShuffleNet, NASNet).

~~~
0xbear
Sure, there's new work all the time. But I think you might be missing the
forest for the trees here. The main efficiency gain here is not in how you
extract and transform features (although those improvements would also work in
YOLO, as far as I can tell). The novel part is in structuring the problem as a
single forward pass regression. Further improvement in this year's paper is in
a better loss function. Other possible efficiency improvements are orthogonal
and complementary to that.

~~~
hedgehog
I agree that the single shot approach in YOLO was pretty clearly a big step
forward. In the two years since though there much of the efficiency work has
been in the underlying feature detector architecture, which as you point out
should integrate well with the YOLO9000 training improvements. It's very cool
to see this kind of capability get within range of phone-sized devices.

------
giza182
Impressive. Does anyone actually know of real apps that make use of object
detection? Apart from Google apps.

~~~
nonsince
Not hotdog?

~~~
giza182
haha. Had no clue the guys at Silicon Valley built an actual app for that.
Thanks for sharing.

------
iampims
Impressive speed.

