Really awesome, and the test cases look just as good as I could do.
I would warn web designers to not blindly apply this to everything though. It scans all the pixels of an image, which can take up to 100ms each, especially on mobile devices. A good use case would be a file upload box with a suggestion to crop the image upon upload.
I'm not so sure you'd want to throw away all of that processing power on the client though. Running some performance testing on different types of devices and figuring out which ones can handle it fine to good enough (especially if you can do it in the background with a web worker) while sending really horribly performing devices right to the backend.
Good suggestion, though this could be true for both server-side processing and local processing. If you want the whole image uploaded, anyways - start that, and then run the crop in parallel, and just transmit the crop box once both are done, with minimal additional impact to interaction flow.
The tests highlight the fact that face detection might be an important parameter to add to the overall weight. Test photos where people had noisy shirts ended up getting cropped badly.
Cropping images is a massive problem for social media. Here is a talk from 2013 by Christopher Chedeau a front end engineer at Facebook describing some of the problems with their image layout algorithms.[1]
Initially, they tried to solve the problem by getting users to tag people inside the image and then use location of the tags as parameters to crop. If someone is tagged in a photo, Facebook makes sure that person is always inside of the cropped version.
Here is the write up from Christopher's blog.[2]
Was Instagram only using square images at one time? That would have been a brilliant way to have solved this problem.
>Initially, they tried to solve the problem by getting users to tag people inside the image and then use location of the tags as parameters to crop. If someone is tagged in a photo, Facebook makes sure that person is always inside of the cropped version.
Do they still do this? A lot of the times people aren't tagged by where their face is, but just tagged off to the side or something. I feel like this wouldn't work in a lot of cases.
Now when I upload an image to Facebook it automatically uses facial recognition to tag my friends. In the video from 2013 he mentions they had already solved the problem using heuristics to find faces in images. Before that, circa 2012, they might have been relying on people to tag their photos.
Damn this is cool. It's kinda amazing that it is all done in a few hundred lines of code. I see there is a skinColor method and setting defined as
skinColor: [0.78, 0.57, 0.44]
I was curious how it worked with darker skin (I admittedly don't understand what the numbers mean without further analysis), and It came out pretty well (it may default on lighter skin, i don't know)
Almost all skin colors, e.g. black and white, have the same chroma. Only the brightness changes. I have worked on video projects that took advantage of this to have skin-color auto-adjustment.
This is called "salient region detection" and some current approaches (there are many) include detecting the contrast between each pixel and the global or regional average color or luminosity. Areas of high contrast are likely to be regions which are considered interesting. Once you have those regions, you would have to have a separate algorithm which maximizes the placement of a rectangle (the crop) to get the greatest coverage of "interestingness".
You could also combine this with face detection, so that a picture of someone in a bikini doesn't end up cropping just to their midsection, since going by surface area, the torso could have more high-contrast pixels than the face.
But the difference is that these are all running on a backend, while the JS version will probably run in the frontend.
I'm not sure what the right place for this would be, but frontend has some advantages and quite some disadvantages:
* It's as slow as the devise running the JS. In mobile era that probably means a magnitute slower than even the cheapest DO server.
* You can't reply on it, so if you want to ensure you receive, say, square images, you'll need to validate and re-crop on backend again.
* It's very error-prone. On a backend you can ensure versions of imagemagic or some other processing is there, on frontends: you'll rely on the parsing in JS (very slow) or on the rendering of the client (unstable in the sense of: you don't know where it works in what way, and if that will stay so)
* It is distributed. My smartcropper for Ruby is heavy, memory-gobbling, and slow (partly the fault of the bad code, It is long due some refactoring). It is near impossible to run without some async-job system. Whereas, a cropper in the frontend scales nicely across users, because each user brings their own processing-power
Edit: bullet-point markup and realised a pro for doing this frontend.
As it happens I'm working on an image cropping front end (using CropperJS [1]) - I'm going to integrate this so that the initial crop selection is set using the results from SmartCrop.
I've actually been looking for something related. I need to quickly classify if an image contains a face and also if it contains any text. The former seems to be relatively straightforward, but I haven't found anything for detecting text, only OCR'ing it which I don't need. Anyone seen anything like this?
I would warn web designers to not blindly apply this to everything though. It scans all the pixels of an image, which can take up to 100ms each, especially on mobile devices. A good use case would be a file upload box with a suggestion to crop the image upon upload.