
PP-YOLO Surpasses YOLOv4 – State-of-the-art object detection techniques - rocauc
https://blog.roboflow.ai/pp-yolo-beats-yolov4-object-detection/
======
CompleteSkeptic
This isn't directly relevant to PP-YOLO, but I'm surprised roboflow is still
promoting "YOLOv5" \- despite that model not having an associated paper and it
not being made by the authors of the previous YOLO's.[1]

The ML community has been asking the authors of that model to rename their
project[2] because they are basically stealing publicity by making it seem
like the next version of YOLO, despite its performance being worse than that
of YOLOv4.[3]

Roboflow has deflected this in the past by claiming they don't know if
"YOLOv5" is the correct name[4], but by continuing to promote it, they are
directly supporting it. In fact, I wouldn't be surprised that their claim of
not being affiliated with Ultralytics to be either false or a half truth,
given that all the top pages about "YOLOv5" were done by roboflow, including
the first official announcement.[5]

[1]
[https://github.com/AlexeyAB/darknet/issues/5920](https://github.com/AlexeyAB/darknet/issues/5920)

[2]
[https://github.com/ultralytics/yolov5/issues/2](https://github.com/ultralytics/yolov5/issues/2)

[3]
[https://github.com/AlexeyAB/darknet/issues/5920#issuecomment...](https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-642812152)

[4] [https://blog.roboflow.ai/yolov4-versus-
yolov5/](https://blog.roboflow.ai/yolov4-versus-yolov5/)

[5] [https://blog.roboflow.ai/yolov5-is-
here/](https://blog.roboflow.ai/yolov5-is-here/)

~~~
yeldarb
We’re also the top result when you google eg “How to train yolov4” and several
of the top terms for training efficientdet. Hopefully we will be a great
source of info on all computer vision models someday. Our mission is to make
these things easier for people to use and understand.

Regardless of what you think about its name, YOLOv5 a great model for a lot of
use cases. And hundreds of our customers are using it in production and are
very satisfied with its performance. Just as many are using YOLOv4. And
EfficientDet. And MobileNet SSD v2.

They’re tools, not sports teams. It’s kind of weird that they’ve developed
fanbases.

~~~
sk0g
Why are you attacking a "fanbase" mentality when there is none? YOLO stood for
a series of networks and subsequent improvements by PJ Redmon, derivative work
like PP-YOLO still signals that it's derivative work, but ones like "YOLOv5"
signal that it's an updated/ improved version, which it is not.

This weird defence pretty much confirms that Ultralytics and Roboflow are
related though.

~~~
sillysaurusx
Just chiming in: I had the similar concerns about Roboflow initially, but to
my surprise @josephofiowa from Roboflow reached out to me to discuss it. They
set aside time to specifically address a lot of the concerns I raised – e.g.
that they seemed to be hyping up a model without doing appropriate benchmarks
(they later did a thorough benchmark: [https://blog.roboflow.ai/yolov4-versus-
yolov5/](https://blog.roboflow.ai/yolov4-versus-yolov5/)).

They didn't need to do this. Part of my conversation was "I get it, you're a
startup, you have to focus on business value rather than research concerns."
But they made the time, and put in the effort, and I feel compelled to at
least mention that that happened.

Also, @pjreddie has said that he's "happy for anyone to keep using the YOLO
name! Just try to avoid version number collisions":
[https://twitter.com/pjreddie/status/1272618558254534657](https://twitter.com/pjreddie/status/1272618558254534657)

Anyway, as a fellow researcher, I just wanted to put in a good word for
Roboflow. Their priorities seem to be in order. I've also learned some
interesting things from their yolo breakdowns, e.g. that training time on the
newer models is significantly lower.

~~~
rocauc
Thank you very much for the kind words.

------
tmabraham
As a sidenote, can they get Redmon's name correct? In [1] and [2] they call
him Redmond, and in [3] they call him PJ Reddie, which is his username and not
his real name. It's not even that hard to be correct here...

[1] [https://blog.roboflow.ai/pp-yolo-beats-yolov4-object-
detecti...](https://blog.roboflow.ai/pp-yolo-beats-yolov4-object-detection/)

[2] [https://blog.roboflow.ai/a-thorough-breakdown-of-
yolov4/](https://blog.roboflow.ai/a-thorough-breakdown-of-yolov4/)

[3] [https://blog.roboflow.ai/yolov5-is-
here/](https://blog.roboflow.ai/yolov5-is-here/)

~~~
rocauc
Thanks for the copyedits - I've updated to "Joseph Redmon."

------
KingOfCoders
Was using "YOLOv5" (hope they merge all the efforts or relabel) yesterday and
was amazed on how easy it was (no input image scaling or manipulation) and how
fast it was with my model (<1h on RTX2080). Also on how easy it was to use in
general (runs, ...) and how easy it was to install (Ubuntu 20.04).

To me PyTorch is much more convenient than Darknet.

~~~
mpfundstein
not only to you :-)

------
sillysaurusx
Suppose someone wanted to train a model to identify which decade a photo was
taken in. What would be the current SOTA architecture for that type of task?
(Suppose also that you had a few million labeled examples.)

I like yolo because it’s a production grade object defector. It seems harder
to find a production grade classifier.

One amusing but dumb idea would be to use yolo for this: train the model on
“photo from 1930,” “photo from 1940,” etc, where the bounding boxes cover the
entire photo. But I’m curious what the professional solution might be.

~~~
ericjang
Out of curiosity, what would such a model be used for? I worry that as ML is
becoming more popular and powerful, people are jumping to use it on problems
from which the answer cannot possibly be identified accurately from the inputs
alone. This model's predictions would entirely based on historical trends of
what images looked like, rather than something objective like carbon dating.

I am not discouraging someone from building such a model, but it would be
really helpful to know the context for which such a model is being developed.
If it's just a hobby investigation, it would be cool to see how "predictable"
dates are from images. I could even see it being used in forensics to provide
a "first guess" as to when an image occurred, and helping with triaging of
evidence. However, things become deeply problematic if the result from the
image is fed to people as "ground truth" simply because the model was found to
be accurate on a validation dataset. I certainly wouldn't want this model to
be used to determine whether a suspect is innocent / guilty, or to be used
naively by museums to date photographs.

~~~
theplague42
For example, creating a scrapbook of somebody's life from scanned photographs
that lack any metadata.

------
Imnimo
I think at a certain point, FLOP count will be more important than FPS. Like
once you're running at real time, there aren't a lot of applications that care
about 120 FPS vs 110 FPS. But there are a lot of situations where you care
about the total number of operations (regardless of GPU parallelism) because
you want to run on an edge device or have power constraints.

~~~
CompleteSkeptic
There actually is some work
([https://arxiv.org/abs/2003.13630](https://arxiv.org/abs/2003.13630))
claiming that FLOPS are a poor measure of real-world performance - with some
of the more recent FLOP-efficient models actually running slower than older
models.

~~~
flafla2
Forgive me if I’m being dense, but shouldn’t we expect performance to degrade
if flop count per unit time is decreased, assuming performance is defined as
overall runtime (FPS in this case)? It’s a trade off scenario where runtime
performance is being balanced by other concerns such as power consumption.

------
atty
Slightly tangential, but has anyone had a chance to use PaddlePaddle? I played
around with it for a little bit a few months ago, and found it to be generally
a regression in use-ability when compared to Pytorch or Tensorflow V2. I’d be
interested to know what someone more experienced with it thinks.

------
jcims
Do the image collections that these models are trained on have EXIF data? Is
that included in the training?

~~~
yeldarb
Usually, it is not.

There have been some attempts to combine image and text data into a hybrid
model but I’m not sure how widespread it is. Ex:
[http://cbonnett.github.io/Insight.html](http://cbonnett.github.io/Insight.html)

~~~
jcims
I’ve been thinking of enduring some exit data (lens focal length, aperture,
gravity vector, focus distance, etc) directly into the image as a band of
scale bars across the bottom to inject a reference frame into the image
directly.

Not sure if it will do anything, just curious if it would help.

~~~
nl
You can inject any data (in numeric form if you want) as a second data input
into your neural network (assuming you are doing something custom).

For example we do this to represent the position of specific sub-image parts
we extract in an original image.

> just curious if it would help

Depends what you are trying to do.

For example normal CNNs aren't rotation invariant[1] so if you know a gravity
vector it can be useful to make your image upright.

(Whilst CNN's aren't rotation invariant, it's common practice to augment
training data by applying some rotation to the same image, so depending on how
the CNN was training it may be fine)

[1] [https://stats.stackexchange.com/questions/239076/about-
cnn-k...](https://stats.stackexchange.com/questions/239076/about-cnn-kernels-
and-scale-rotation-invariance),
[https://stackoverflow.com/questions/41069903/why-rotation-
in...](https://stackoverflow.com/questions/41069903/why-rotation-invariant-
neural-networks-are-not-used-in-winners-of-the-popular-co)

------
epberry
Pretty interesting that the conclusion supports a slightly worse detector with
a better framework.

------
29athrowaway
Baidu should be sanctioned. It is one of the companies responsible for what's
happening to Uighurs and other minorities.

[https://www.youtube.com/watch?v=OQ5LnY21Hgc](https://www.youtube.com/watch?v=OQ5LnY21Hgc)

Computer vision technology, face recognition, object detection, image
segmentation... it's all being weaponized.

AI/ML frameworks should have more restrictive licenses that forbid mass
surveillance.

