Now we just need someone to design a Samaritan-esque UI for it ;-).
- c4: http://git.io/c4
- info: https://github.com/turbo/c4
- Samaritan reference: http://personofinterest.wikia.com/wiki/Samaritan
But when I clicked through, the title of the article was 'Counting People with Machine Learning', and it's just counting presence, not foot traffic :(
If I had a store, though, I would count someone walking past one way and then they other as two potential visits.
I would imagine that "people in room at time t" and "people in room t+1" is quite a good proxy for "number of people present all day", it is certainly an upper bound.
Once you've done all that, a top end GPU can handle about 4 FPS (assuming 1080P input).
Then the problem is that YOLO will occasionally miss blindingly obvious objects. That combined with the fact that you've only got 4 FPS means that detecting the direction of a person is hard - they tend to move across the camera's field of view before you've got enough data to be confident what just happened. A person walking from left to right looks the same as someone walking off the left of the frame and then someone different walking onto the right of the next frame.
At some point its easier, cheaper and more accurate to install an IR laser beam and count the breaks. You'll save about half a kilowatt too.
Another interesting point is that YOLO is pre-trained on hundreds of object classes. This feels like a waste. I wanted to retrain it with all but the people class removed from the training set. My learned colleague suggested that was a stupid idea because YOLO learns general info about how to separate objects from backgrounds from all the object classes. Not showing it surfboards makes it worse at detecting people. Crazy.
OpenCV's facialRecognizer class is one of the fastest I found. And it's what I used in my program.
Primarily, it does a LBP cascade finding "any face", including ones that look like walls. Thankfully it has False positives, but almost never false negatives. Then, I use each region of interest's area and do a haar cascade for eyes. If theres at least 1 eye in the region of interst, I pass it to the classifier.
From there, the classifier then runs the image zone into the classifier. If its not there, it adds it. if it is, then it adds this as another sample to further prove the face.
I can get 15 FPS@1280x720 on a Thinkpad T61
I also implemented a "no more than 50 samples per matched face" to keep the size of the face-hash-db down.
and my old code's currently on gitlab, gitlab.com/crankylinuxuser . It's pretty crappy as it was a weekend hack. I need to separate the engine from the GUI, and make the GUI web accessible. There's a few more pieces to do that, but I was looking at selling it for various purposes.
Correct. But if someone walks past really slowly, would you count that as multiple visits, because they appear in multiple frames?
'<SNIP> is quite a good proxy for "number of people present all day"'
The term to which I objected was 'Foot Traffic'. Whether something is a good proxy for <something other than foot traffic> is irrelevant.
Although I was initially creeped out by Insecam, I was fascinated with the idea that I could peer into so many different corners of the world just by clicking on a couple of links.
If it sounds creepy, and probably is. Go with your gut on this kind of thing. Then again, maybe it's better to have the tech out in the open.
It's very patriotic, just like all that top secret work you used to do with the Eyring Research Institute!