Someone asked it to not be called YOLOv5 and their response was just awful . They also blew off a request to publish a blog/paper detailing the network .
I filed a ticket to get to the bottom of this with the creators of YOLOv4: https://github.com/AlexeyAB/darknet/issues/5920
Beyond that, we're just fans. We're amazed by how quickly the field is moving and we did some benchmarks that we thought other people might find as exciting as we did. I don't want to take a side in the naming controversy. Our core focus is helping developers get data into any model, regardless of its name!
Fourth, YOLOv5 is small. Specifically, a weights file for YOLOv5 is 27 megabytes. Our weights file for YOLOv4 (with Darknet architecture) is 244 megabytes. YOLOv5 is nearly 90 percent smaller than YOLOv4. This means YOLOv5 can be deployed to embedded devices much more easily.
Naming controversy aside, it's nice to have some model that can get close to the same accuracy at 10% of the size.
Naming it v5 was certainly ... bold ... though. If it can't outperform v4 in any scenario, is it really worthy of the name? (On the other hand, if v5 can beat v4 in inference time or accuracy, that should be highlighted somewhere.)
FWIW I doubt anyone who looks into this will think roboflow had anything to do with the current controversies. You just showed off what someone else made, which is both legit and helpful. It's not like you were the ones that named it v5.
On the other hand... visiting https://models.roboflow.ai/ does show YOLOv5 as "current SOTA", with some impressive-sounding results:
SIZE: YOLOv5 is about 88% smaller than YOLOv4 (27 MB vs 244 MB)
SPEED: YOLOv5 is about 180% faster than YOLOv4 (140 FPS vs 50 FPS)
ACCURACY: YOLOv5 is roughly as accurate as YOLOv4 on the same task (0.895 mAP vs 0.892 mAP)
Then it links to https://blog.roboflow.ai/yolov5-is-here/ but there doesn't seem to be any clear chart showing "here's v5 performance vs v4 performance under these conditions: x, y, z"
Out of curiosity, where did the "180% faster" and 0.895 mAP vs 0.892 mAP numbers come from? Is there some way to reproduce those measurements?
The benchmarks at https://github.com/WongKinYiu/CrossStagePartialNetworks/issu... seem to show different results, with v4 coming out ahead in both accuracy and speed at 736x736 res. I'm not sure if they're using a standard benchmarking script though.
Thanks for gathering together what's currently known. The field does move fast.
Crucially, we're tracking "out of the box" performance, e.g., if a developer grabbed X model and used it on a sample task, how could they expect it to perform? Further research and evaluation is recommended!
For size, we measured the sizes of our saved weights files for Darknet YOLOv4 versus the PyTorch YOLOv5 implementation.
For inference speed, we checked "out of the box" speed using a Colab Notebook equipped with a Tesla P100. We used the same task for both - e.g. see the YOLOv5 Colab notebook. For Darknet YOLOv4 inference speed, we translated the Darknet weights using the Ultralytics YOLOv3 repo (as we've seen many do for deployments). (To achieve top YOLOv4 inference speed, one should reconfigure Darknet carefully with OpenCV, CUDA, cuDNN, and carefully monitor batch size.)
For accuracy, we evaluated the task above with mAP after quick training (100 epochs) with the smallest YOLOv5s model against the full YOLOv4 model (using recommended 2000*n, n is classes). Our example is a small custom dataset, and should be investigated on e.g. COCO. 90-classes.
> SIZE: YOLOv5 is about 88% smaller than YOLOv4 (27 MB vs 244 MB)
Is that a benefit of Darknet vs TF, YOLOv4 vs YOLOv5, or did you win the NN lottery ?
> SPEED: YOLOv5 is about 180% faster than YOLOv4 (140 FPS vs 50 FPS)
Again, where does this improvement come from?
> ACCURACY: YOLOv5 is roughly as accurate as YOLOv4 on the same task (0.895 mAP vs 0.892 mAP)
The difference in 0.1% accuracy can be huge, for example the difference between 99.9% and 100% could require an insanely larger neural network. Even much less that 99% accuracy, it seems clear to me that there can still be some limitations on accuracy from neural network size.
For example, if you really don't care so much for accuracy, you can really squeeze the network down .
The YoloV5 repo itself shows performance comparable to YoloV3: https://github.com/ultralytics/yolov5#pretrained-checkpoints
Another comparison suggests YoloV5 is slightly WORSE than YoloV4: https://github.com/WongKinYiu/CrossStagePartialNetworks/issu...
The article still adds value by suggesting how one would run the network and in general the site seems to be about collating different networks.
Perhaps a disclaimer could be good, reading something like: "the speed improvements mentioned in this article are currently being tested". As a publisher, when you print somebody else's words, unless quoted, they are said with your authority. The claims are very big and it doesn't feel like enough testing has been done yet to even verify that they hold true.
Nice, I really respect research coming out of NIH. (Happen to know Travis Hoppe?) Coincidentally, our notebook demo for YOLOv5 is on the blood cell count and detection dataset: https://public.roboflow.ai/object-detection/bccd
We've seen 1000+ different use cases. Some of the most popular are in agriculture (weeds vs crops), industrials / production (quality assurance), and OCR.
Send me an email? joseph at roboflow.ai
Unfortunately I am now unable to edit to reflect this better.
For history, Ultralytics originally forked the core code from some other Pytorch implementation which was inference-only. Their claim to fame is that they were the first to get training to work in Pytorch. This took a while, probably because there is actually very little documentation for Yolov3 and there was confusion over what the loss function actually ought to be. The darknet repo is totally uncommented C with lots of single letter variable names. AlexeyAB is a Saint.
That said, should it be a totally new name? The changes are indeed relatively minor in terms of architecture, it's still yolo underneath (in fact I think the classification/regression head is pretty much unchanged). The v4 release was also quite contentious. Actually their previous models used to be called yolov3-spp-ultralytics.
Probably I would have gone with efficient-yolo or something similar. That's no worse than fast/faster rcnn.
I disagree on your second point though. Demanding a paper when the author says "we will later" is hardly a blow off. Publishing and writing takes time. The code is open source, the implementation is there. How many times does it happen the other way around? And before we knock Glenn for this, as far as I know, he's running a business, not a research group.
Disclosure: I've contributed (in minor ways) to both this repository and Alexey's darknet fork. I use both regularly for work and I would say I'm familiar enough with both codebases. I mostly ignore the benchmarks because performance on coco is meaningless for performance on custom data. I'm not affiliated with either group, in case it's not clear.
> you'll see that AlexeyAB's fork basically scooped them,
> hence the version bump.
Yeah that sucks, but it does mean they should have done some proper comparison with YOLOv4.
> This took a while, probably because there is actually very
> little documentation for Yolov3 and there was confusion
> over what the loss function actually ought to be. The
> darknet repo is totally uncommented C with lots of single
> letter variable names. AlexeyAB is a Saint.
Maybe I'm alone, but I found it quite readable. You can quite reasonably understand the source in a day.
> The v4 release was also quite contentious.
Kind of, I am personally still evaluating this network fully.
> I disagree on your second point though. Demanding a paper
> when the author says "we will later" is hardly a blow off.
Checkout the translation of "you can you up,no can no bb" (see other comments).
> And before we knock Glenn for this, as far as I know, he's
> running a business, not a research group.
I understand, but this seems very unethical to take the name of an open source framework and network that publishes it's improvements in some form, bump the version number and then claim it's faster without actually doing an apples to apples test. It would have seem appropriate to contact the person who carried the torch after pjreddie stepped down from the project.
But.. it was still very much undocumented (and there were details missing from the paper). I think this almost certainly led to some slowdown in porting to other frameworks. And the fact its written in C has probably limited how much people are willing to contribute to the project.
> Checkout the translation of "you can you up,no can no bb" (see other comments).
That's from an 11 day old github account with no history, not Ultralytics as far as I know.
> Kind of, I am personally still evaluating this network fully.
Contention referring to the community response rather than the performance of the model itself.
Who actually is "WDNMD0-0"? Looks like the account was created to make just that one comment.
Despite that, there was still a lot of controversy over the decision to call it v4.
See that thread for the discussion on v5 and you can make your own judgement.
Although YOLOv4 isn't anything new achitecture-wise, it tried all the tricks in the book on the existing YOLO architecture to increase its speed performance, and its method and experiment results were published as a paper; it provided value to humanity.
YOLOv5 seemed to have taken the YOLO name to seemingly only to increase the startup name value without giving much(it did appear to provided YOLOv3 Pytorch implementation, but that's before taking YOLOv5 name) back. I wonder how prjeddie would think of YOLOv5.
I don't see any response by them at all. Do you mean the comment by WDNMD0-0? I can't see any reason to believe they're connected to the company, have I missed something?
It's hard to interpret benchmarks in a fair way, but it's sort of sounding like YOLOv4 might be superior to YOLOv5, at least for certain resolutions.
Does YOLOv5 outperform YOLOv4 at all? Faster inference time or higher accuracy?
Learned a new phrase today.
This is literally trash talking Slang in Chinese, because this field is full of young bloated researchers who forget their last name
I've not heard that one before either. Is it a reference to the Dark Tower? ("[he] has forgotten the face of his father") or did Stephen King borrow it from somewhere else?
Edit: obviously I should google dark power first lol.
Can you write it in Chinese?
"If you can do it, then you go and do it. If you can’t do it, then don’t criticise others."
Edit: Although as yeldarb explains in a comment here, it's probably a bit more complicated than that.
> it's probably a bit more complicated than that.
Legally speaking I'm not sure anything wrong was really done here.
Morally speaking, it seems quite unethical. AlexeyAB has really been carrying the torch of the Darknet framework and the YOLO neural network for quite some time (with pjreddie effectively handing it over to him).
AlexeyAB has been providing support on pjreddie's abandoned repository (e.g. ) and actively working on improvements in a fork . If you look at the contributors graphs, he really has been keeping the project alive  (vs Darknet by pjreddie ).
Probably the worse part in my opinion is that they have also seemingly bypassed the open source nature of the project. This is quite damning.
I guess given the info I have now, to me it boils down to whether there's precedent for the next version of the name to be taken by whoever is doing work on it? If the original author never endorsed AlexeyAB (I don't know one way or another), then perhaps AlexeyAB should have changed the name but references or payed homage to YOLO in some way?
Eh, this is all starting to feel a bit too close to youtube drama for my liking.
What do you mean? I thought the DL hypetrain was dying as companies failed to make returns on their investments.
And the tags ended up being hilarious: https://pbs.twimg.com/media/EYXRzDAUwAMjXIG?format=jpg&name=...
(I'm particularly fond of https://i.imgur.com/ZMz2yUc.png)
The data is freely available via API: https://www.tagpls.com/tags/imagenet2012validation.json
It exports the data in yolo format (e.g. it has coordinates in yolo's [0..1] range), so it's straightforward to spit it out to disk and start a yolo training run on it.
Gwern recently used tagpls to train an anime hand detector model: https://www.reddit.com/r/AnimeResearch/comments/gmcdkw/help_...
People seem willing to tag things for free, mostly for the novelty of it.
The NSFW tags ended up being shockingly high quality, especially in certain niches: https://twitter.com/theshawwn/status/1270624312769130498
I don't think we could've paid human labelers to create tags that thorough or accurate.
All the tags for all experiments can be grabbed via https://www.tagpls.com/tags.json, so over time we hope the site will become more and more valuable to the ML community.
tagpls went from 50 users to 2,096 in the past three weeks. The database size also went from 200KB a few weeks ago to 1MB a week ago and 2MB today. I don't know why it's becoming popular, but it seems to be.
# fetch raw tag data
$ curl -fsSL https://experiments-573d7.firebaseio.com/results/.json > tags.json
$ du -hs tags.json
# fetch tag metadata (colors, remapping label names, possibly other stuff in the future)
$ curl -fsSL https://experiments-573d7.firebaseio.com/user_meta/.json > tags_meta.json
$ du -hs tags_meta.json
$ jq tags_meta.json
curl -fsSL https://experiments-573d7.firebaseio.com/meta/.json | jq
(I think that's due to a poor architectural decision on my part, which is solvable, and not due to egress bandwidth via the API endpoint. But it's always fun to see a J curve in your bill... It's about $1 a day right now. https://imgur.com/4gUTLO7)
If anyone wants to contribute, I started a patreon a few minutes ago: https://www.patreon.com/shawwn
I guess "be gentle" means "please troll us."
Surprised and happy to hear you're seeing high labeling quality.
We'll re-host with credit on https://public.roboflow.ai What license is this?
We don't host any images directly – we merely serve a list of URLs (e.g. https://battle.shawwn.com/tfdne.txt). But any data served via the API endpoints is CC-0.
Those are also drawings/anime, not photos. We have an /r/pics experiment (SFW, 99 tags https://www.tagpls.com/exp?n=r-pics) and /r/gonewild (NSFW, 57 tags https://www.tagpls.com/exp?n=r-gonewild) but currently I haven't gathered enough urls to be very useful -- it only scrapes about 100 or so images every half hour. So there is a lack of tags right now on human photos. We also have a pps experiment (NSFW, exactly what you think it is, 306 tags https://www.tagpls.com/exp?n=pps) but I assume that's not quite what you were looking for.
If you have an idea for a dataset, you can create a list of image URLs like https://battle.shawwn.com/r/pics.txt and we can add them to the site. You can request an addition by joining our ML discord (https://discordapp.com/invite/x52Xz3y) and posting in the #tagging channel.
Also, if anyone's curious, here's how I'm measuring the tag count:
$ curl -fsSL https://experiments-573d7.firebaseio.com/results/danbooru2019-e/.json | jq '.' | grep points | wc -l
$ curl -fsSL https://experiments-573d7.firebaseio.com/results/e621-portraits/.json | jq '.' | grep points | wc -l
$ curl -fsSL https://experiments-573d7.firebaseio.com/results/r-gonewild/.json | jq '.' | grep points | wc -l
$ curl -fsSL https://experiments-573d7.firebaseio.com/results/r-pics/.json | jq '.' | grep points | wc -l
$ curl -fsSL https://experiments-573d7.firebaseio.com/results/pps/.json | jq '.' | grep points | wc -l
on a serious note, kind of interesting the authenticity/accuracy if it's just filled in... eg. turning black and white pictures back to color eg. was it actually green or blue
There is a market somewhere
It looks like an HN user on an EC2 server decided to fetch data from our firebase as quickly as possible, running up a $3,700 bill. Once (or if) that's sorted out, and once we verify tagpls can handle HN's load without charging thousands of dollars, we'll add an "about" page to tagpls and submit it.
Seems like the camera motion is probably already solved with optical flow/photogrammetry stuff, but you might be able to use that to help scale the scene and start filtering your tagging based on geometric likelihood.
The idea of hierarchical reference frames (outlined a bit by Jeff Hawkins here https://www.youtube.com/watch?v=-EVqrDlAqYo&t=3025 ) seems pretty compelling to me for contextualizing scenes to gain comprehension. Particularly if you build a graph from those reference frames and situate models tuned to the type of object at the root of each each frame (vertex). You could use that to help each model learn, too. So if a bike model projects a 'riding' edge towards the 'person' model, there wouldn't likely be much learning. e.g. [Person]-(rides)->[Bike] would have likely been encountered already.
However if the [Bike] projects the (rides) edge towards the [Capuchin] sitting in the seat, the [Capuchin] model might learn that capuchins can (ride) and furthermore they can (ride) a [Bike].
Eyes are very hard to make and coordinate, yet there are almost no cyclops in nature.
Most flagships can do this though, any multicamera phone can get some kind of stereo. Google do it with the PDAF pixels for smart bokeh (they have some nice blog posts about it). I don't know if there is a way to so that in an API though (or to obtain the depth map).
Are you folks able to do any multi-spectral stuff? That seems interesting.
I've also done some work on satellite imaging which is 13-band (Sentinel 2). Lots of people in ecology use the Parrot Sequoia which is four-band multispectral. There really isn't much published work in ML beyond RGB, which I find interesting - yes there's RGB-D and LIDAR but it's mostly for driving applications. Part of the reason I'm so familiar with the yolo codebases is that I've had to modify them a lot to work with non-standard data. There's nothing that stops you from using n-channel images, but you will almost certainly have to hack every off the shelf solution to make it work. RGB and 8-bit is almost always hard coded, augmentation also often fails with non RGB data (albumentations is good though). A bigger issue is there's a massive lack of good labelled datasets for non rgb imagery.
On the plus side, in a landscape where everyone is fighting over COCO, there is still a lot of low hanging fruit to pick I think.
I've not done any hyperspectral, very hard to (a) get labelled data (there's AVIRIS and EO-1/Hyperion maybe) (b) it's very hard to label, the images are enormous and (c) the cameras are stupid expensive.
By the way, even satellite imaging ML applications tend to overwhelmingly use just the RGB channels and not the full extent of the data.
* This is the first yolo implemented in Pytorch. Pytorch is the fastest ml framework around, so some of YOLOv5's speed improvements may be attributed to the platform it was implemented on rather than actual scientific advances. Previous yolos were implemented using darknet, and EfficientDet is implemented in TensorFlow. It would be necessary to train them all on the same platform for a fair speed comparison.
* EfficientDet was trained on the 90-class COCO challenge (1), while YOLOv5 was trained on 80 classes (2).
re: PyTorch being a confounding factor for speed - we recompiled YOLOv4 to PyTorch to achieve 50 FPS. Darknet would likely top out around 10 FPS on the same hardware.
EDIT: Alexey, author of YOLOv4, provided benchmarks of YOLOv4 hitting much higher FPS here: https://github.com/AlexeyAB/darknet/issues/5920#issuecomment...
In our initial look, YOLOv5 is 180% faster, 88% smaller, similarly accurate, and easier to use (native to PyTorch rather thank Darknet) than YOLOv4.
YOLOv4 -> YOLOv5
Inference time: 20ms -> 7ms (on P100)
Frames per second: 50 -> 140
Size: 244mb -> 27 mb
OT: the stats above should be part of the PyTorch marketing material, indeed impressive
This is not a verb.
He actually stopped working on it because of ethical concerns. I'm inspired that he made this principled choice despite being quite successful in this field.
The author of YOLOv3 quit working on Computer Vision due to ethical concerns. YOLOv4, which built on his work in v3, was released by different authors last month. I'd expect more YOLOvX's from different authors in the future. https://twitter.com/pjreddie/status/1230524770350817280
The intriguing part is that he has also done research in particle physics (as Ulatrlytics) that has been published in Nature .
I had never seen anything like that.
It would be fair to state also why he chose to discontinue developing YOLO, as it is relevant.
1. How to train YOLOv5: https://blog.roboflow.ai/how-to-train-yolov5-on-a-custom-dat...
2. Comparing various YOLO versions https://yolov5.com/
The problem is that your conclusions aren’t independent of this choice. A different network might be far better in terms of accuracy/speed tradeoffs when evaluated at a lower precision. But there is no reason to use 32-but precision for inference, so this is just a big mistake.
(might need to wait a couple of months since this was just released)
Not the the difference matters that much.
Edit: I should add, that most of the actual progress is being made by smart people who think its an interesting problem and are unaware or uncaring of the clear outcome of such tech.
There are obviously privacy concerns with this example, it’d ideally be fully on-prem.
That's not really for you to decide, is it? You're absolutely free to have that opinion of course.
>We should not respect those who lend it their time and effort.
Also your choice of course. Facial recognition is essentially a light integration of powerful underlying technologies. Should 'we' ostracize those working on machine learning, computer vision, network and distributed computing, etc?
I'm much more worried about people using your arguments to try and shut down the discussion than people trying to open the debate, because once the mass surveillance/face recognition mass adoption pandora's box is open there won't be any way to go back.
When I see predator drones and FBI stingray planes above every major us cities during protests I already know we're not going in the "let's talk about this before reaching the point no return" direction.
Can you provide evidence for this ? Not that I doubt it, but if I want to tell other people that story, I must have evidence to be believed :-)
edit: ah of course, 30 seconds of duckduckgo just provide the info I need : https://thehill.com/homenews/house/501445-democrats-press-dh...
Once the tech is out there it's simply a question of "when" will it be used for borderline illegal activities, especially in the US where you have these different entities (fbi, cia, nsa, dea, &c.) basically acting in their own bubble and doing whatever they want until it's leaked and/or gets outrageous enough to get the public attention.
I mean, there were unidentified armed forces marching in US streets last week, if people don't se this as the biggest red flag in recent US history I don't know what they need.
I can't think of other uses and I'd be interested if you can come up with some.
1) assistance to recognizing people (because low vision, because memory fails, because you have a lot of photos...)
2) ensure candidate X is actually candidate X and not a paid person to take the exam in name of candidate X
3) door access control (to replace/in addition to access card)
4) having your own X-Ray (like in Amazon Prime): identify an actor/actress/model
5) having your personal robot addressing you by name
Can you imagine the bureaucratic nightmare that would be unleashed upon yourself if "the system" decides you aren't who you say you are because of the way you aged, an injury, surgery or a few new freckles?
This already happens sometimes with birth certificates and identity theft, and it's awful for those who have to experience it. I'd hate to have a black box AI inflicting that upon others for inexplicable reasons.
Compared to using API in the cloud or purchasing Hikvision cameras.
The current service we use, while accurate, costs 50 cents per verification...
Edit: reading through this thread, if the model isn't super massive, we could offer on-browser verification! 27MB is still a hefty download though.
Arguably this is a front for mass surveillance, or can easily be misused for that, but the ostensible purpose is separate and benign.
Couldn't you argue the same way against just about any kind of IED or booby trap? Yet people tend to ostracize those who make them more than they do people who make ball bearings and nails.