1, The face detection of this program is more impressive than the face recognition.
2, The detector is able to detect across variaty of poses and come up with a an estimate for the pose of the face. It does a lot more than the best opensource detector (openCV) out there.
3, If you download the mac program and open up the package contents, you will see that under the data file directory, it uses a different set of files (features.bin, log_likelihood.bin) for frontal and profile view detection.
4, If you look at the company's CEO's background and publications here:
It would suggest that the detector is using a histogram-based detector as outlined in his paper. The other detector made by CMU is the Rowley-Kanade one but I don't think that one is fast enough to run in realtime.
1. Image to Image is a bit misleading because we are fairly strict in our online demo in pose requirements, so many faces are "non-suitable" because they aren't frontal.
2. It works much better for video settings (but that type of online demo is way too CPU intensive) because we can organize each person into continuous tracks and we only need them to become 'frontal' once to do matching. And the more they are frontal, the more data points we have.
Here's an example of recognition applied to video: http://www.youtube.com/watch?v=jsjf3IDXef8
Just to try to explain the question better, here's an example. Let's say I have 10 images and I want to find the most similar people among any pair of images. Do I need to run every pair of images (45 full comparisons) or can I pre-process the 10 images into something such that the 45 comparisons can be done in a less expensive way?
Generating 2 templates is many more times expensive than comparing those templates. However, as your dataset grows, generating templates grows at N, and the number of comparisons you need to do grows at N^2. So eventually, comparisons dominate.
In retrospect, it's not surprising that they're optimizing some parts with assembler - even SSE (Streaming SIMD (Single Instruction, Multiple Data) Extensions http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions), which I hadn't heard of, but which is the kind of vector parallelism that gives supercomputers their speed.
It needs a threshold of -3 to detect it, but it gets the right face (I think).
Here's 2 I did:
The second one pretty impressive, even if it's under the default 0.00 threshold. I mean, RMS's mouth is obscured behind a katana and it still got a -0.14.
You need to work on the PG recognition though. It doesn't seem to want to recognize two photos as being both him. :P
(synopsis: ancient mr. show sketch involving a corporate mascot called pit-pat. warning: swearing)
Good job tho guys. Its pretty impressive.
Did you use your own algorithm for it?