Everyone in the ML field understands that ML-assisted upscaling doesn't produce output that accurately represents the original full-resolution image. It produces output that humans perceive to be realistic-looking, or free of traditional upscaling artifacts.
While this is obvious to anyone familiar with the technology, it's difficult to explain to casual observers. The image output looks real. It feels like it could be real. It's free of traditional scaling artifacts that would trigger suspicion. Without additional explanation, it's easy to see why casual observers would assume the hallucinated upscaled version is an accurate representation of the original image.
Historically, police sketches and blurry surveillance images are obviously low quality enough that people inherently know they're approximations. The problem with these ML hallucinated upscaled images is that they look and feel real enough that they bypass people's suspicions. We can try to present them as "Here's what the suspect might look like", but when they look like a full-resolution photograph, people will simply assume that it's exactly what the suspect looks like.
One obvious problem is creating only one output face when there are many possible faces that match the low-res version. A tool that turns a low-res face into, say, 16 maximally-different upscaled versions doesn't suffer from the same level of false certainty.
You arbitrarily picked 16 possible photorealistic faces out of a total solution space of what? Millions?
Wouldn't the balance of probability be on someone in the general population of humans more closely resembling one of your 16 candidate images than any of your candidates resembling the Ground Truth image?
Doesn't the problem get both better and worse as you scale your N up from 16? That is, it would be better because one of your candidates is more likely to match the Ground Truth, but it would be worse because you've also widened your net for catching false positives?
I saw the blurry picture of Obama and my brain thought "I think that's Obama". When it was 'enhanced' it became a (white) person that I would never recognize as Obama. In fact I did not recognize them at all. The image may have been a more probable person overall, but clearly not a more probably famous person.
Which makes me wonder, can we train a model on only famous people, weighted by their relative famousness?
>You arbitrarily picked 16 possible photorealistic faces out of a total solution space of what? Millions?
Wouldn't the solution be to find a 4 different axes and pick faces that represent different endpoints. With a little psychology to help us identify what features are best to inform the public of we should be able to create a collection of photos that will be more likely to result in someone identifying the suspect than either the single photo option or the photo collection option. We would still need to test to see if it is better than the single lower quality photo option.
Exactly, and then it is up to your defense attorney to somehow dispute that the AI just made up a face, that just happens to look like yours, and then explain it to 12 laymen who likely know nothing of how ML or computers work.
The company that develops this tech will want to market themselves as some sort of oracle to the people and LEO. Junk forensic science sticks in courts forever, we should be careful before we assent to more.
Considering that many models are actually trained on L1, L2, “texture loss” or a combination thereof, I am not sure I would say “humans perceive to be realistic looking”. Then there are some GANs of course doing magic in lieu of coming up with an actual metric that evaluates “human perceived difference”.
A more correct view could be “It produces output that was trained on minimizing differences in the pixel/feature-space”.
My point being: minimizing actual human perceived differences is seldom/never done but always rather with some proxies or complex loss function constructs that make sure that no scientist has to actually deal with human observers and their preferences ;-)
I would say the technology has some issues. For example when you don't include enough black people in your training dataset, probably by simply not thinking about it, it can make the algorithm a bit racist. Example : https://twitter.com/Chicken3gg/status/1274314622447820801
It seems like a stretch to call the algorithm "racist". It's the humans with the bias here. It only seems racist because you are capable of recognizing the image on the left and you know he identifies as a black man. The point of the algorithm is that the photo on the right closely resembles, when downsampled, the photo on the left, and the photo on the right is a generative artifact that has characteristics of human faces. It only seems "racist" if you cherry-pick one example and don't look at all the other faces the algorithm generated.
It takes one bad application of such technology to ruin a person's life.
We can't have a mindless algorithm take or assist decisions that can effect a person's life without extreme caution.
Not because people who make these decisions are never wrong but because they are held accountable.
These algorithms give false impression of realness and truth while being totally unaccountable.
Just look how Yan Lecun (who wasn't even accused of anything) immediately took to throw the blame on the data and not the algorithm. His claim is just a silly demagoguery because with ML there's no real distinction between data and algorithm. But the point is ML is dangerous because it makes it way too easy to pass the buck and doing harm without accountability. Lecunn just made a great demonstration....
I think it's interesting in that it points out the range of faces that could have produced the pixelated version. You think it's Obama but on a certain objective metric it could just as easily have been anonymous guy. Why couldn't it be used to exonerate someone who supposedly appears in a blurry surveillance video?
Agreed, while the algorithm is novel, it is vulnerable to introducing false assumptions about the original image. For example, if you were to use this as a method for determine suspects from low quality security camera footage, you may generate suspect images that are biased or completely false.
Sounds like it's time for anti-racist movements to start pressuring the ML community to start introducing and using datasets which aren't biased towards white people.
Though, things like black people being more poorly recognized by facial recognition can be a blessing in disguise depending on the circumstances
Let's say the training dataset has the same proportion of black people as their proportion of the population.
That, by itself would be biased for the ML, since there are fewer images. But it's not an indication of bias by the people programming it.
But this would imply you need huge data-sets for every minority, not matter how small a proportion of the population.
It might instead be necessary to teach the model about race as a concept, so it can categorize images, and then process them correctly. But of course that leads to a different can of worms since you are explicitly making a "race aware" ML.
It's not clear to me why this matters for a research paper. And the algorithm isn't racist - perhaps the training set is.
If I want to work on a project like this, and the only appropriate training dataset I have is photographs white males, am I not allowed to work on generating faces until I've fleshed out the data set?
It's not like they are offering some service to the general public where they can de-pixelate faces. It's a research paper.
A final note: The only professor with her name on the paper(Cynthia Rudin) is very active in researching the intersection of machine learning and social justice. I'm not so sure she would put her name on a paper that can be flippantly described in 3 sentence comments on internet forums as having "some issues" wrt race.
It would be deeply wrong to use such an algorithm to add detail to blurry CCTV images in order to incriminate someone. I don't think every police force, security firm, security agency etc will agree with me on this and that concerns me. Police in my country are already using live facial recognition in some areas.
It would be deeply useless. It produces plausible wrong answers. Those are the worst kind of wrong answers, because they are red herrings with statistical near certainty, and yet convincing enough to generate bias.
I think the police would, if they are doing their jobs (a big "if") be completely uninterested in an algorithm that is the direct equivalent of an unreliable eyewitness confabulating a plausible face for a photofit.
With a large enough database of potential faces (see VKontakte), you might be able to use this tool to upscale blurry images and match them to a short list of candidates. Other intelligence could then lead you to the actual person in the blurred image. Scary implications.
It might be less true for pixeled videos. I know from some IR camera companys, they use very small movements from the camera to calculate the picture in a higher resolution. And I think, the first picture from a black hole use also a kind of this technologie
Upsampling used to be pitched in papers as a way to add plausible information to images so they don't look so degraded.
Recently, however, there's been a lot of work pitching upsampling for "deblurring" faces, which seems like a great way for LEO to run the programme until they hit on a face that most likely looks like a person of interest.
Hell, you can even make it explicitly so that the net takes two inputs, the blurred image and a suspects image, and generate a plausible upsample that is similar to the suspect.
That sounds like bad faith science that no judge would accept, but perfectly legitimate technologies like DNA testing has historically been abused this way in the courts.
> For starters, Rudin said, “We kind of proved that you can’t do facial recognition from blurry images because there are so many possibilities. So zoom and enhance, beyond a certain threshold level, cannot possibly exist.”
Instead of blurring faces in photos of protests, Google Street View, etc., blur them and then upscale them, so they don't have the jarring blur effect but still anonymize.
Right. As long as the number of pixels remains the same it should be possible to remove gaussian blur almost completely. The information isn't lost, it is still there, but smeared out over a larger area.
Also I don't understand why the first Mona Lisa results in a picture that when pixelated again, wouldn't produce the original pixelated picture. It is as if the create a face inspirated by the original, but not one that could ever be the original.
Not really. The images aren't pixelated as an effect - it's just a representation of the actual number of pixels taken from the ground truth.
You can just as well take these pixels and apply any kind of blur filter - you still wouldn't retain any more information. If you go from say 1k x 1k pixels down to 100 x 100 pixels, you end up with 1% of the original information, no matter what you do.
I think they may have been thinking of just blurring rather than actually lowering the amount of pixels. A Gaussian blur doesn't really remove as much information as you might expect. Although it's very sensitive to noise.
It's information theory. No matter what blur technique you use, there's only so much information in the pixels. You can't produce more information by magic, or by technology so advanced that it is indistinguishable from magic (h/t Arthur Clarke).
Well, I'm suggesting that, for example, pixelation where you change 10x10 blocks of pixels to 1 10x10 block of a single color...is very lossy. A blur of a known algorithm (gaussian, for example), can be somewhat reversed. It's not as lossy.
You seem to have misunderstood what the article is about.
It's not about turning a shitty image to a nice a clean one. It's about turning a lowres image into a hires version, i.e. you start with 100 x 100 pixels (doesn't matter how you obtained them - could be a section of a much bigger image, for example) and try to extrapolate a 1k x 1k pixel version from it.
The pixelation you see in the examples is just a representation of what little information you have to work with. It's NOT in any way shape or form related to where you get these pixels in from in the first place.
Just to give you a different context here: imagine this upscaling being used to "enhance" a single face in an image like this [1] - there's no way "Gaussian blur" or whatever filter you'd like gets you more information out of that.
In the example image I created, I applied a Gaussian blur to the "pixelated" (i.e. enlarged) version of the marked image section.
As you can see, enlarging the same section using super-sampling (i.e. similar to what you proposed), doesn't change the information content and one version can basically be transformed into the other.
No, I understood. The article gives several different examples of "blurry", and they aren't all the same. Fixing motion blur is one example of something where you aren't just completely guessing and "inpainting".
Can such a model be used to compare a real picture of a suspect with a low res picture and produce an objective measure of similarity (i.e. not only raw pixel similarity but a similarity measure that actually takes into account how actual human faces downscale, possibly resilient to minor rotations?)
The increasing politicization of the machine learning field makes me grow tired. It makes me want to work on things without political implications. Something like electrical engineering. Does anyone else feel this way?
While this is obvious to anyone familiar with the technology, it's difficult to explain to casual observers. The image output looks real. It feels like it could be real. It's free of traditional scaling artifacts that would trigger suspicion. Without additional explanation, it's easy to see why casual observers would assume the hallucinated upscaled version is an accurate representation of the original image.
Historically, police sketches and blurry surveillance images are obviously low quality enough that people inherently know they're approximations. The problem with these ML hallucinated upscaled images is that they look and feel real enough that they bypass people's suspicions. We can try to present them as "Here's what the suspect might look like", but when they look like a full-resolution photograph, people will simply assume that it's exactly what the suspect looks like.