
Face detection in pure PHP (without OpenCV) - nreece
http://svay.com/blog/index/post/2009/06/19/Face-detection-in-pure-PHP-(without-OpenCV)
======
ktharavaad
I am the author of the Javascript solution which you ported from. The code
itself was a modification of the one in openCV ( its not a direct port since I
did some optimizations to speed things up ). However, the dat file was
generated from one of the Cascades trained using the openCV haar training
program. To summarize how it works:

1, the algorithm use something called haar filters, its simply box filters
with a "White" and "black" area \- for each region of the image, you take the
sum of the pixels in the "White" area and minus the sum of the pixels in the
"black" area. Thus the filters output a simple single number;

\- First you generate a gazillion of these filters for different regions of
the image with different shapes. such as a 2 horizontal rectangle or 2
vertical ones ( one black and one white ).

\- You take the result of the output of the filter to find the threshold which
differentiate the faces and non-faces in your training images the best.

\- Using all the filters and outputs you have, feed it into an machine
learning algorithm called "Adaboost", which attempts to minimize an
exponential loss function of the error of classification. These different
filters are then assembled with different weights.

\- The final structure of the detector is a "detection chain", which is a
degenerate decision tree (like a chain) with nodes consisting of the
aforementioned filters assembled together. This is how the algorithm achieves
its speed, by rejecting non faces early in the detection process. Only when a
image region passes all the nodes in the detection chain that its labeld as a
"hit";

\- After that you scan all regions of the image at all sizes ( brute force )
and then assemble the detection results.

To be honest, the first time that I saw the description of the algorithm, it
seemed a little "magical". The underlying reason why these "box filters" work
so well is because the human face is well-defined by "boxy" feature such as
our eyes, nose, eyebrows, lips..etc. Its a wonderful wonderful application of
Machine learning to a specific domain.

This is also why this algorithm has MUCH lower detection rates for cascades
trained to detect side view faces, because these "boxy features" that are so
well detected by these filters are simply not as prominent in the side view

For more, refer to the Viola and Jones paper , of all the versions out there,
I find this the best:

[http://lear.inrialpes.fr/people/triggs/student/vj/viola-
ijcv...](http://lear.inrialpes.fr/people/triggs/student/vj/viola-ijcv04.pdf)

to be perfectly honest, this detector sucks.... Although its fairly
illumination invariant, its not rotation invariant and sucks for side-view
faces detection. I've been trying for the longest time to implement the
histogram based detector outlined in this paper:
<http://www.cs.cmu.edu/afs/cs.cmu.edu/user/hws/www/CVPR00.ps>

which is also what pittpatt uses for their detector and IMO its a much better
detector. However the lack of training images and time has been impeding my
progress. It'll be opensourced when I'm done.

~~~
Keyframe
on 9/10 tests I did on this php code, it didn't find a face - is this due to
the data or the algorithm inaccuracy?

edit: it is more 10/10 since one that kind of recognizes a face is on a multi
face image and it kind of frames a torso with head. I've tried your JS version
and it works on single face and multi face images to recognize one image,
however multi face recognition seems to be broken - blue bar comes to the end
and that's it, nothing happens ever.

~~~
ktharavaad
I have not looked at the PHP code but a few things I can think of off the top
of my head

1) The detector doesn't like faces that are slightly rotated, either in-plane
or out of plane.

2) The detector is "racist" in the way that most of the training images used
were Caucasian people and therefore it works best on Caucasian.

3) It fails badly on kids because they aren't present in training data.

Try normalizing your data a bit for that and see how it works out. If you want
a detector that works better , simply use the one in openCV since it does a
more thorough detection. The performance optimizations I did also cuts down
the accuracy of the algorithm ( assuming his port has those too ).

~~~
Keyframe
funny because when I've uploaded a test image I found on google with multiple
faces - I've run a single face detection and it selected a third face on the
image which is not caucasian :)

------
abyssknight
Beautiful code, just beautiful. Having implemented adaptive boosting once
before for a college course, I have to say it is a joy to see some vision
algorithms again. I would be curious to know how he generated the .dat file
he's using. Facial detection data sets aren't exactly easy to get.

------
petercooper
A trillion karma points (well, at least fifty) to whoever can summarize how
this works here - since there was no basic summary on the post or the post
that inspired the post ;-)

~~~
abyssknight
This is not necessarily based on his code, but the way I believe Viola-Jones
Adaptive Boosting works is as such:

1\. Run the adaptive boosting code which creates a complex decision function
using known positive and negative data.

The code creates what is called an Integral Image from each piece of data, and
runs it through the algorithm.

Each iteration adds or subtracts a 'feature' from the decision function. So in
the end we build up a general detector for faces. The algorithm only keeps the
features which work best _together_ not individually, so in the end you have
the best 'team' of features.

Essentially we are training a function so it can decide what is and is not a
face. This is likely what the .dat file output was, a descriptor of the
resulting decision function.

The more data used to train, the better the decision function will be.

2\. Run the decision function against unknown data.

Ta da, an answer. It could be right, it could be wrong. Sometimes these
algorithms find faces in the darnedest places like walls and other textured
surfaces.

For more info on Viola-Jones adaptive boosting check out the original paper:

Robust Real-time Object Detection [http://research.microsoft.com/en-
us/um/people/viola/pubs/det...](http://research.microsoft.com/en-
us/um/people/viola/pubs/detect/violajones_ijcv.pdf)

~~~
Luc
Do the 'faces' in walls and other textured surfaces actually look like faces?
That would be interesting to see in action...

~~~
abyssknight
The answer is, sometimes. :) It all depends on what data you trained with, the
variety of light/shape/tilt of the faces, etc. The code can only build on what
it knows, so when it fails the best thing to do is dog-food that case and add
it to the training set. Of course, outliers like that can also cause more
false positives as the decision function has to generalize even more to cover
those cases.

------
edw519
This an excellent example of why language wars are so silly. Sure, some
languages are better than others for certain things and we all have our
preferences, but there are many wrenches in the hackers toolbox. I doubt php
was ever meant to do this, but so what? Very nice. Thank you.

------
aarongough
Very well timed! I've been working on getting into some machine vision stuff,
so far I've got motion tracking and blob detection working fairly well. The
list of things I want to do includes digital image stabilization and face
detection.

Looks like I might be porting parts of this code into processing!

------
FiReaNG3L
I wonder why he didnt code it in C with a PHP wrapper, would have been much
faster

~~~
noodle
same reason he didn't want to use OpenCV -- he wanted a purely PHP solution.

