
OpenCV Face Detection For Cropping Faces - philipcristiano
http://seatgeek.com/blog/dev/opencv-face-detection-for-cropping-faces
======
vjeux
At Facebook, we are using a similar technique to crop images. The part that is
different is that instead of using the center of all the faces, we find the
viewport that maximizes the number of faces to be displayed.
[http://blog.vjeux.com/2012/image/best-cropping-
position.html](http://blog.vjeux.com/2012/image/best-cropping-position.html)

~~~
philipcristiano
That's going to be the next step for these as requests come in for images of
low aspect ratio which generally just become resized smaller instead of
cropped around a face.

------
fideloper
Thumbor
([https://github.com/globocom/thumbor](https://github.com/globocom/thumbor))
is a great Python library for helping cropping which uses openCV for facial
detection (as well as other algorithms for finding "interesting" parts of
photos)

~~~
philipcristiano
Thanks for posting that. I haven't seen that one in a while but built a few.
I'd recommend if anyone uses it to not expose it directly to the internet,
once someone malicious sees all the sizes in the URL they may try hitting 1..n
X 1..M which is a big hard on the servers.

We used specifically named sizes to avoid that problem although whitelisting
in the edge servers would also work.

------
geuis
I implemented something like this with OpenCV late last year using a node.js
wrapper. We were looking to do something similar for image cropping. What I
found was that OCV was kind of unreliable. A major percent of the time, it
would mark knees as faces or even wrinkles in pictures of women's handbags. It
was a side project and we didn't pursue it, so I was unclear if there was a
way to train it over time. If so, did you do this to improve your accuracy?

I also used PIL a few years back to do image generation from text. I kept
running into memory fragmentation bugs, weird artifacts creeping into images,
etc. I haven't been able to recommend PIL for any serious work since then. Has
this gotten better?

~~~
philipcristiano
We haven't trained our model at all and just use the include XML file. We have
adjusted the default neighbors threshold which helps mitigate the knee factor.
Faces tend to be in the 40+ neighbors range from my trials and the default is
3.

We use ImageMagick for cropping and resizing, PIL was just easier for the
example. I haven't noticed any issue with PIL but I also haven't used it for
any serious image processing.

------
hellopat
I built something like this to automate cropping headshots for players in
major sports. I figured it would work flawlessly after I ran a few test images
through, but it turns out that it failed detecting a face or gave the wrong
coordinates about 20% of the time.

I'm curious to know what the success rate of SeatGeek's process is.

~~~
philipcristiano
Were you using your own test data or a provided XML file?

Sports shots in our case don't get a great hit rate. Adjusting the
`minNeighbors` parameter can help out with that depending on how many false
positives you can accept. Musical artist misses are in the single digit
percentages although shadows and strange backgrounds can give some additional
faces that we don't really care for.

When collecting images we are now searching for those with more direct faces
visible to make the detection easier. After that though we just try to get the
face in the direct center and fall back to hoping the face is in that spot if
we can't detect any.

At some point I want to try checking for partial face matches as well which
should help in major sports since we tend not to use headshots.

~~~
hellopat
I used a provided XML file. The library I used:
[https://github.com/peterbraden/node-
opencv](https://github.com/peterbraden/node-opencv)

I haven't touched the app in about a month, updated the library to the most
recent version, and I'm now getting a 100% hit rate. I guess the library was a
bit buggy.

Also, these headshots are the ones the players take before the season starts,
not action shots. In theory there should be a high hit rate.

