
Yoloface-500k: ultra-light real-time face detection model, 500kb - qiuqiu-dog
https://github.com/dog-qiuqiu/MobileNetv2-YOLOV3#500kb%E7%9A%84yolo-face-detection
======
bawana
We are all essentially evil because 1) tomorrow will find some behavior that
is accepted today as bad and we are all doing it 2) creating an AI that can
manipulate people is too tempting for the predator in us to avoid (this is a
REAL turing test - making an AI that can make tools out of humans and
everything else.)

The only way to stay ahead of the 'evil' corrupting influence of new tech is
to prevent it from widespread use controlled by a single entity. So, yolo is
ok as long as you cannot deploy it in a cloud at scale.

So, just as nuclear weapons (the massive concentration of energy release at
tremendously fast rates) are bad, so is a super AI/AGI (the massive
computational ability at nanosecond scale).

No evil was ever perpetrated by institutions of learning - only business
entities and governments who scaled up those discoveries caused evil.

And now for the flame bait- So, by this argument we should elect Luddites to
govern us , especially ones that are not imaginative or creative.

------
haditab
I believe this is exactly why pjreddie quit computer vision research. It must
kill him to see such projects based off of his work.

~~~
woah
This detects _any face_ , it does not identify people. It’s for stuff like
autofocus, etc.

~~~
voqv
Detecting where is a face in a picture is the first step that's necessary
before detecting whose face is that.

~~~
bufferoverflow
And purchasing a knife is the step before stabbing someone. Your argument is
ridiculous.

~~~
hdkrgr
How is it ridiculous? The fact that purchasing a gun is the first step of
shooting people is a good enough reason for most countries to ban the purchase
of guns...

~~~
marcinzm
Yet they don't ban kitchen knives because there's legitimate uses for kitchen
knives. Thus the point that you utterly missed.

~~~
tuesdayrain
I don't think "because there's legitimate uses" is the differentiating factor,
since that implies only kitchen knives have them. Self-defense is a legitimate
use for owning a gun, for example.

~~~
marcinzm
In many countries self-defense is NOT a legitimate use for owning a gun.

~~~
kevin_thibedeau
Many of these countries have exterminated their large predatory animals.

------
Tempest1981
I'm looking for something to run on a Raspberry Pi, to detect humans on a
security camera. The built-in camera software has false triggering, esp. on
windy days.

When looking at these projects, how do I figure out what hardware they're
aimed at? This one mentions NVidia/CUDA.

Is there any sort of hardware abstraction layer that YOLO or R-CNNs can
operate on? Can I use any of this code (or models) for my R-Pi?

~~~
earleybird
'round about 2004 I built a pentium based 'motion' recorder. It kept a
circular buffer of images that were spooled to the output stream when motion
was detected. Motion was determined by optical flow iirc - the OpenCV call
returned an array of blob center points, size, and velocity vector. If the
blob was large enough and the velocity vector made sense (eg, horizontal as in
walking or driving at an appropriate magnitude) it was considered motion.
Reduced leaf flutter, branch waving false positives to effectively zero. No ML
required. ML is too liberally applied without understanding how or why it has
triggered. I've lost track of the original quote but the spirit of it is:
"It's artificial intelligence while we don't understand it. Once we understand
it, it's computer science"

~~~
Tempest1981
Interesting, I may try that out of curiosity. Although seeing Mask R-CNN demos
is pretty intriguing, albeit expensive.

I remember a professor saying, "The definition of AI is: something that
doesn't work"

------
DSingularity
I love how every new YOLO project inevitably leads to the discussion of the
ethics. At the very least more people will be wondering if they should also be
taking ethics into consideration wrt their lines of work.

Pjreddie is a giant for this. It is a real contribution.

~~~
rgrieselhuber
This also puts social distancing into perspective.

------
rocauc
Purpose-built, small, and fast models appears to be the inevitable evolution
for computer vision.

Where can the "Easy Set, "Medium Set, and "Hard Set" evaluations referenced in
the "Wider Face Val" be found?

~~~
qiuqiu-dog
## Wider Face Val Model|Easy Set|Medium Set|Hard Set
\------|--------|----------|-------- libfacedetection v1（caffe）|0.65 |0.5
|0.233 libfacedetection v2（caffe）|0.714 |0.585 |0.306 Retinaface-
Mobilenet-0.25 (Mxnet) |0.745|0.553|0.232 version-slim-320|0.77 |0.671 |0.395
version-RFB-320|0.787 |0.698 |0.438 yoloface-500k-320| __0.728 __| __0.682 __|
__0.431 __|

~~~
rocauc
Thanks, I see the table. Are the source datasets available for creating
additional benchmarks?

------
rjeli
Wow, 100MFlop’s. That could run real time on a $5 dsp.

~~~
chvid
I think it is very cool.

Trying to think of some applications for this. For example one could create a
mechanism that watched people entering and exiting a shop providing the shop
owner more quantitative data that he could use to optimize his sales.

Or you could have it watch a soccer game. Generating all sorts of data on how
the game went.

All on relative cheap piece of hardware.

~~~
mycall
Entering/exiting buses for automatic passenger counters is more important than
ever now. Being able to broadcast GTFS-Occupancy in real-time when only 50%
(or less) of the bus can be filled with passengers, is a real issue transit is
facing today.

------
layoutIfNeeded
Wew, 500kb is ultra-light nowadays. I wonder how much space would the original
Viola-Jones face detector take.

~~~
srg0
Check out
[https://github.com/opencv/opencv/tree/master/data/haarcascad...](https://github.com/opencv/opencv/tree/master/data/haarcascades)

Plain-text XML for the frontal face detector is 912 KB. 132 KB gzipped. It
should be smaller in binary.

------
sp332
"Bflops"? I'm guessing this is a measure of the total processing power needed,
in billions of floating-point ops, and not a measure of operations per second?

~~~
rjeli
Yes exactly, FLOPS vs FLOPs/FLOP’s, there’s no unawkward way to write it but
it’s almost always obvious from context.

------
ta1234567890
Is it possible to run this "in reverse" so it generates faces instead of
detecting them? If so, how?

~~~
phonebucket
There are much better ways to generate faces than using this, e.g.
[https://github.com/tkarras/progressive_growing_of_gans](https://github.com/tkarras/progressive_growing_of_gans)

~~~
ta1234567890
Thank you for the link. I'm still curious about the possibility of taking a
detection algorithm/network and just running it in reverse. Is that feasible?
Are there people doing it?

------
ddrt
I shudder for the future.

~~~
stu2b50
Yolo does bounding box detection on faces. Classifying those faces is
something else's job.

~~~
jakear
Xolo does feature vector extraction on face regions. Extracting those regions
from larger images is something else’s job.

Zolo does feature vector similarity analysis on facial feature vectors.
Extracting those vectors is something else’s job.

Oh look, now every authoritarian government has free access to never before
seen levels of data harvesting. But nobody has to feel any guilt because they
only contributed a third of the machine. Hooray!

~~~
zo1
The future is coming, and the tech that can enable authoritarian behavior will
come no matter what we do as they're just tools. It's been here and around us
in ever-increasing forms for decades and yet we haven't necessarily devolved
into a big brother state.

What we should be worrying is _actual_ usages of grand scale citizen-control
and monitoring projects that are enabled by technology. Think China, not UK or
US.

~~~
longtom
> and yet we haven't necessarily devolved into a big brother state.

Well this is damn close. It just needs an executive apparatus which is where
drones surely come in handy.

[https://en.wikipedia.org/wiki/Global_surveillance_disclosure...](https://en.wikipedia.org/wiki/Global_surveillance_disclosures_\(2013%E2%80%93present\))

~~~
8fingerlouie
And yet, a $0.50 facemask will completely destroy the surveillance.

~~~
eunos
Nah, gait based subject recognition might come in the future.

~~~
anigbrowl
Gait based recognition is way overrated, plus it's a lot harder to sell to a
jury.

~~~
jakear
Already in use in places like China. And we know the US govt has secret courts
to authorize wiretaps. We’re on the brink.

------
m0zg
There's also Blazeface:
[https://arxiv.org/abs/1907.05047](https://arxiv.org/abs/1907.05047), which
the authors do not seem to mention.

------
hirundo
Imagine an app like Pokémon VR but instead of virtual pocket monsters it
targets members of <outgroup> whose faces are detected by cadre smart phones,
then tracked, flash mobbed and dealt with, for the crime of making members of
<ingroup> feel unsafe.

Watching leadership supine and carefully uncritical of burning, looting mobs
offers little confidence that they will stand in the way of this. After all
only <outgroup epithet>s have anything to fear.

Please tell me why this is an unlikely scenario.

~~~
jcahill
The submission is an implementation of a core task in computer vision. Your
response is an appeal to vividness unrelated to the submission except in its
recruitment of computer vision to sell the FUD.

If researchers getting good at something is sufficient priming to cause you to
direct your imagination toward hyperbolically negative outcomes, the problem
on your hands is a constitutional resistance to further progress in the
research area.

In that case, challenging readers to produce arguments on the finer details of
the narrative you've painted in support of the technopessimism is bad faith
rhetoric.

~~~
Kiro
I don't understand why you decided to respond to an almost-dead comment when
the top-voted comment thread shows the same technopessimism.

~~~
jcahill
I'm similarly unclear on what sort of response you expect.

There weren't many comments when I replied.

My criticism was tailored to a fairly specific phenomenon: asymmetrically
imaginative doomsaying that appeals to a vivid vignette / sketch of an
adjacent possible future featuring some hyperbolically elaborated extension of
trending tech, like Flash Mob Gone Wrong[1] and Slaughterbots[2].

HN flagging and points are irrelevant to me.

____________________

[1]:
[https://youtube.com/watch?v=RyMdOT8YJgY](https://youtube.com/watch?v=RyMdOT8YJgY)

[2]:
[https://en.wikipedia.org/wiki/Slaughterbots](https://en.wikipedia.org/wiki/Slaughterbots)

