Hacker News new | past | comments | ask | show | jobs | submit login
YOLOv4: Optimal Speed and Accuracy of Object Detection (arxiv.org)
234 points by groar 32 days ago | hide | past | web | favorite | 54 comments

The original author of YOLO stopped working on it[1]. Alexey Bochkovskiy, aka AlexeyAB, created a fork on GitHub and wrote an extensive guide to customizing YOLO's network architecture, added new features, and has answered zillions of questions.

1: https://twitter.com/pjreddie/status/1230524770350817280

The linked tweet:

> I stopped doing CV research because I saw the impact my work was having. I loved the work but the military applications and privacy concerns eventually became impossible to ignore.

The YOLOv3 paper is a blast to read


And its conclusion is gold:

But maybe a better question is: “What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook.I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal infor-mation and sell it to.... wait, you’re saying that’s exactly what it will be used for?? Oh. Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait..... [1] I have a lot of hope that most of the people using com-puter vision are just doing happy, good stuff with it, like counting the number of zebras in a national park, or tracking their cat as it wanders around their house. But computer vision is already being put to questionable use and as researchers we have a responsibility to at least consider the harm our work might be doing and think of ways to mitigate it. We owe the world that much.In closing, do not @ me. (Because I finally quit Twitter).

[1] The author is funded by the Office of Naval Research and Google.

Very few grad students have a chance of showing their first result on TED and getting 19k citations. Consequently, they usually can't get away with criticizing their sponsors and peers' ethics post-factum, regardless of what they think of them.

Yup. He is an inspiration.

One of his other comment on reddit related to stopping CV research:

"pjreddie 30 points·2 months ago· edited 2 months ago

I've never been in a car that was using my tech to avoid killing people but I have had a 3 star general rave about how my work was being deployed in war zones and how army research groups love my software.

Edit: to say that i'm not arguing against CV research as a whole, i'm just saying i don't want to do it anymore because of the impact i saw my work having"

I've seen people try to use YOLO for homebrewing a self-driving car.

There's not even any need to go as far as assume evil intent on the users of the software, just plain recklessness can easily cause people to die.

So the author goes, "wait, this actually causes more evil than good, I will not work on it any longer", and the other guy goes "don't worry, I will keep doing the evil for you!"


It's tricky thing... where do you draw the line? If someone works on the linux kernel, and someone uses the OS to do bad things.. should one stop?

Also consider that object detection has a lot of impact in positive ways as well. It doesn't seem so black and white.

I'm going to jump in here as someone who is using object detection, and currently working on getting a Darknet detector in particular up and running, in what I'd like to think is a positive way.

I'm a researcher working on a system for monitoring offshore kelp farms for renewable aquaculture with an autonomous underwater vehicle. I can't use GPS underwater, Dopper velocity logs and sensitive inertial navigation systems are either too expensive and/or export-controlled, and doing manual filtering of visual data is tricky and inconsistent. Adding good object detection for kelp to helps to make that kind of system much more reliable, which can help provide useful metrics to groups working on creating new biofuel sources and marine ecosystem monitoring. I think that's valuable work, and it's enabled by YOLO.

Joe Redmon has paid attention to where and how his work is being used, and I respect him for disengaging with something that he finds not to line up with his values. But it's worth pointing out that there are people (not just me) who are using that work in ways that might be worthwhile.

I'm of the opinion that kelp farms could be a huge benefit to humanity if we could scale them up enough.

Your work seems worthy of an AMA if you ever have the time.

I appreciate that, I think they have a lot of promise as well! I do think things are probably a bit early to be trying to do an AMA (I feel that they're often as much Q&A as they are a pitch about the topic itself), and both the tech and farm structure at the moment are both up in the air.

For example, the robotics end is currently a human-piloted ROV with data getting obtained in post-processing on dozens of fronds, rather than an AUV doing real-time inference (hopefully to be solved by YOLO) on thousands. However, I recently heard about some tentative plans for a more in-depth pilot-scale effort in Spring/Summer 2021, by which time we'll hopefully have some more interesting results to talk about!

Is there anywhere I can read more about your project? Seems tangentially related to what I am using it for, actually!

We're still working on the first draft of our results for publication on the subsurface imaging side of things, so there's unfortunately not a paper I can point you to yet, but an overview of the project at a high-level can be seen here [1]. If you're really interested, feel free to reach out to me over email, and I'd be happy to discuss things and learn about what you're doing as well!

[1] https://arpa-e.energy.gov/?q=slick-sheet-project/scalable-aq...

You wouldn't happen to have been part of the group that came over to Santa Barbara a while back, were you? I worked with Erin and Sean to get some of the sidescan data on the test lines out here with a REMUS 100 one afternoon.

I agree - but I imagine even personally it might be a hard thing to hear that something you created is being used to kill people, even with the knowledge that it is doing a lot of good.

I'm curious if this is something that some sort of modified license could help resolve. Do I have the option to license my software so that it can't be used in war?

> I'm curious if this is something that some sort of modified license could help resolve. Do I have the option to license my software so that it can't be used in war?

"The Software shall be used for Good, not Evil" - https://en.wikipedia.org/wiki/Douglas_Crockford#%22Good,_not...

Wouldn't object detection save people over the current method of dropping a huge bomb at a wedding?

Yes, but you need to be prepared to take it to court when someone uses your software in war.

and when the answer is “hey, it’s GPL’d, here’s our source download link, fuck off we’re exactly following the rules”?

Licenses are not a viable solution to this sort of ethical quandry.

That answer should be unsatisfactory to the court, since it wasn't licensed under the GPL. The rules, in this case, say no use in war.

The fact that it's a judgment call doesn't absolve you of responsibility to use your judgment.

Do you commonly defer your moral judgements to a third party?

Or do you sigh only because in this instance you agree with the original author, but you don't actually think it should be a rule that others are constrained by the moral thoughts of their predecessors?

I know of a promising treatment for types of hypothyroidism, but the original discoverer doesn't want to continue work on it because she doesn't agree with animal testing.

If my moral calculus says that the quality of life of millions of humans is more important than the quality of lives of thousands of rats and dogs, am I not allowed to pick up where she left off?

Technology by itself is never "evil". There is no such thing as improving welfare by prolonging our ignorance of the natural world.

There is no such thing as "technology by itself".

Who was that? I can't find the tweet.

The only thing stopping the robot apocalypse is the power problem. We have every technology necessary to enable mass casualties from commodity robots except long battery life. We don't need AI. We need decent enough pattern recognition that can pinpoint a weakness in the human body.

I often use my 'Stabby the Robot' diatribe to convey this to people. Imagine a robot that moves too fast to evade, that doesn't rely on expendable, limited ammunition(i.e. a blade or sharp surface), and can locate the jugular vein of a human. You could make one now out of a commodity drone, but it wouldn't last very long because of the power problem. AI would obviously make it more dangerous, but it isn't necessary.

We've already placed very dangerous tools into the hands of humans. Maybe it helps AlexeyAB sleep at night to stop working on YOLO, but his idle hands are not holding back the future.

> I often use my 'Stabby the Robot' diatribe to convey this to people. Imagine a robot that moves too fast to evade, that doesn't rely on expendable, limited ammunition(i.e. a blade or sharp surface), and can locate the jugular vein of a human.

Oh hey I remember that write-up on qntm[1]!

[1]: https://qntm.org/robots

Never saw that but thanks for the link! It's a much better wording than I'd ever have come up with.

While your robots do sound like they could kill a lot of people, I feel like this is a bit overblown. For a start, for "apocalypse", they would have to be produced in massive numbers, i.e. by a nation state-level actor. State-level actors can already make nukes or bioweapons.

What's so hard about manufacturing a few million drones? With a relatively modest amount of money(relative to, say, developing a nuclear program or chemical weapons program), you could hire a contract manufacturer to spin out a few million drones. They probably wouldn't even ask what the blade/needle was for, although it would be easy to craft a cover story(i.e. "They're for mass inoculation/slaughter of livestock"). Update the SW and congrats, you now have a weapon of mass destruction. You can target people by face, race, location, etc.

There's a reason the US military has been playing with drone clouds for years.

It's not hard for a nation to manufacture or order millions of drones. It's also not hard to obtain enough rounds of ammunition to shoot millions of people. It was decades if not a century ago when the primary obstacle to a nation killing millions of people ceased to be "inherent technical difficulty."

It's not hard to shoot millions of people, but it is hard to raise the army required to do so. It is hard to do so without raising suspicions. It is hard to do so without damaging infrastructure.

I rogue state, or even a non-state actor, can leverage cheap drones(assuming, of course, a solution to the power problem) to obtain a weapon of mass destruction. Drones are fundamentally different from bullets.

My roboticist opinion is that you need extremely developed AI to do what you’re saying. It requires navigational capabilities that don’t exist yet except in very open spaces like fields. Also, lightweight drones which could do these fast maneuvers aren’t very robust; you can literally slap them down with some pain. I get that even with a low success rate it could be a disaster if you have a huge amount of them deployed somewhat.

I don't think we've fully solved the navigation problem, but I think we're close and it will certainly be solved before the power problem. A quick google got me this:


and I suspect there are dozens of other similar projects out there.

This work is amazing and I’m a big fan of Tedrake; however the gap between what you’re showing here and a drone a chases humans and slice their jugular might be bigger than you think. It’s very hard to extrapolate stuff like that.

But don’t we need people doing the research to solve the problems that need to be solved but in the right way. Take covid contact tracing as an example. If Apple and Google didn’t step in and preempt with their own tech, we may have had a problem with privacy eroding tech being forced through.

Chris: So, I talked to him.

Mitch: You did?

Chris: Yeah, and he used to be the number one stud around here in the 70’s. (whispers) Smarter than you and me put together.

Mitch: So what happened? Did he crack?

Chris: Yes, Mitch. He cracked, severely.

Mitch: Why?

Chris: He loved his work.

Mitch: Well what’s wrong with that.

Chris: There’s nothing wrong with that, but that’s all he did. He loved solving problems, he loved coming up with the answers. But, he thought that the answers were the answer for everything. Wrong. All Science no Philosophy. So then one day someone tells him that the stuff he’s making was killing people.

Mitch: So what’s your point? Are you saying I’m going to end up in a steam tunnel?

Chris: Yeah.

Mitch: What?

What is this from?

It's also in Ultralytics' repo which is very frequently updated.


I object to the use of the word "optimal" for a task like object detection; it feels counterproductive to claim that this is the "optimal" way of solving such a broad and complex problem. Great results, but their language needs some tempering.

This is a strictly scientific definition. When something lies on the Pareto optimality accuracy/speed curve, then it is optimal in terms of accuracy and speed. "Pareto optimality" is a formally defined concept used to describe when an allocation is optimal.

EVERY ‘good’ thing starts out with ‘good’ intentions. And while it is small scale remains good. But as it scales up, it becomes evil. Even google, remember their motto? Now harvesting data like human bodies for its matrix. Even Facebook was a blast when it started, but now it’s a merch store. Even the internet was beautiful when it started, now it’s a sewer. It seems the only thing that can scale well without getting perverted is deep learning. But as long as humans are in that loop, it will fail. Cloud computing is a mistake. Bring back the pc, you know, ‘personal computer’

These companies were always doing those things. Google always collected and connected information on you. Facebook was always using users content/data to experiment with. Cloud computing was always a mistake for most (pay more, get less control, get locked in). Deep learning opens up so many taboos our society is not ready to deal with them. As it scales it will open up more cans of worms.

A video on YOLOv4 - Really informative. https://www.youtube.com/watch?v=_JzOFWx1vZg

I have had a lot of fun working with YOLO v3 for robotics applications, very excited to try these updates. Thanks to the authors for the updates and good documentation. Good object recognition is the backbone of a huge range of future applications, and YOLO has been a good option for a while.

I'm a little skeptically of the Swish implementation after looking at Table 2.

Method | Top 1 | Top 5 No-op | 78% | 94% Swish | 64.5% | 86% Mish | 79% | 94.5%

Swish is the only value that decreases performance (and by a huge magnitude) but a very related methodology improves performance hummm...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact