Their explanation for it:
"The lines of letters have been recovered quite well due to the existence of cross-scale patch recurrence in those image areas. However, the small digits on the left margin of the image could not be recovered, since their patches recurrence occurs only within the same (input) scale. Thus their resulting uniﬁed SR constraints reduce to the “classical” SR constraints (imposed on multiple patches within the input image). The resulting resolution of the digits is better than the bicubic interpolation, but suffers from the inherent limits of classical SR [3, 14]."
So it's guessing based on the larger characters. Neat.
Here's their last line:
Here's the actual:
(from what I can read.)
( http://1.bp.blogspot.com/-VgvutrSWaFk/T4VI-2tDH2I/AAAAAAAAAX... )
This could probably help ocr.
Other than this, I think Genuine Fractals and BenVista PhotoZoom give similar results.
Essentially it is looking at the large letters (or pieces of them) to guess how the small letters should look.
And as JTxt notes above, sometimes it chooses the wrong large letter. Even on the third-to-last line it couldn't get one of the letters correct. The output looks like "HKO" while the actual chart has "HKG" -- but since there is no larger "G" for the algorithm to use as an example, it ended up with a different but similar shape. This could probably be improved by the other SR techniques they mention that use libraries of sample images.
[...] there is plenty of patch redundancy within a single image L. Let p be a pixel in L, and P be its surrounding patch (e.g., 5 × 5), then there exist multiple similar patches P1,...Pk in L (inevitably, at sub-pixel shifts). These patches can be treated as if taken from k different low-resolution images of the same high resolution “scene”, thus inducing k times more linear constraints (Eq. (1)) on the high-resolution intensities of pixels within the neighborhood of q ∈ H (see Fig. 3b). For increased numerical stability, each equation induced by a patch Pi is globally scaled by the degree of similarity of Pi to its source patch P. [...]
[...] Assuming sufficient neighbors are found, this process results in a determined set of linear equations on the unknown pixel values in H. Globally scale each equation by its reliability (determined by its patch similarity score), and solve the linear set of equations to obtain H.
Then they also add patches at different scale by making a few more versions of the image which have been scaled down by various amounts, and doing comparisons between the target patch and patches in those images too. If good matches are found, they can then use the original higher-resolution original patch that yielded the match when it was shrunk.
The M first shows on the third from the bottom, but not very clear, but tries to reconstruct it; then it uses that M to replace the M the second from the bottom, finally it guesses that W (only known as a 4x3 pixel blur on the bottom line)is a M.
It's also interesting that the bottom M is skewed out a little on it's top right corner to match the fuzzy W shape.
>Fig. 6 [the eye chart] compares our uniﬁed SR result against ground truth. In Fig. 7 [baby with hat] we compare our method to results from [11,13] and . Note that our results are comparable, even though we do not use any external database of low-res/highres pairs of patches [11, 13], nor a parametric learned edge model . Other examples with more comparisons to other methods can be found in the paper’s website.
But their bottom line is mostly correct because their process also uses larger versions of the same pattern to replace the smaller ones.
So, there was not a much larger T, so it made a little fuzzball instead. So I wrote "?".
How about it's what they output when they are not sure?
Most real-world OCR, though, deals with single-scale fonts, and this reconstruction technique may have limited applicability.
Please apply this technique to resolve a historically important question about the assassination of President John F. Kennedy by identifying the license plate number of the car immediately behind the bus:
Better resolution photos should be available since the ones I've seen printed in books are better than these. And there are plenty of photos of Texas license plates from 1963 with which to seed your algorithm.
The reason this is interesting is because Roger Craig, a Deputy Sheriff in Dallas, after he witnessed the assassination said that he saw Lee Harvey Oswald run from the Texas School Book Depository and get into Nash Rambler station wagon driven by a another man.
These photos show a Nash Rambler station wagon that was indeed passing the Depository just after the assassination. If motor vehicle records still exist for 1963, then its owner can be identified.
The key point is that the existence of a person picking up Oswald after the assassination would strongly indicate the possibility that he was not acting alone.
I'll readily grant that this isn't likely to settle the issue, but I think it still would be amazing if the super-resolution approach was able to generate new information about a topic that no one expects to ever see new information about.
I'm pretty sure that if you seeded the algorithm with images of many other Texas license plates, and then asked it to sharpen this image, what you'd get would simply be the super-resolution sum of all the other Texas license plates. It would show you something that looked intelligible, but actually had absolutely nothing to do with the original image. That would do more to confuse the issue than to shed light on it.
The other point is that I'm not sure how well this would actually work on blurry images, or whether (as a super-resolution technique) it's very specialised for working on pixelated images.
That said, if it can work on blurry images too, then it would be a fun and interesting exercise to see what other historical images could yield insight (or confusion) with this kind of enhancement.
I do not think so. That sum would not look like any other Texas license plate. Instead, it would locally pick the best fit parts to what data it can find in the image. Chances are their algorithm works at different scales; the better they can mix the information at different scales correctly, the smaller the risk that those local best fits are inconsistent with each other (for example, the left of a letter might look like a b, while the right looks like a q. If the algorithm does not look at scales of the width of a character, it could easily produce something that combines the two characters into one. Looking at that Snellen chart example, I think I see something like that in the top/bottom of some of the characters (last line should probably read D K N T W U L J S P X V M R A H C F O Y Z G; see http://guereros.info/img.php?fl=p5r4p4m4d4n2t24606s4t2m416d4...)
I think that, if there is sufficient data in the image, chances are good (but not necessarily 100%) that the result will be a sharp representation of the real license plate. Chances are also good that it would remove that barely visible dead fly from the plate. If there is insufficient information, it would pick a valid license plate that fit the data best. It would be nice if the algorithm also computed some validity estimates (a bit like an alpha mask 'my confidence in predicting this pixel is x%')
If instead you printed the document out and blurred it with a wet sponge, it would be much harder to reverse, because the real world is much noisier.
Similarly, an analog photo at the limit of its resolution is full of real noise that's hard to remove.
It doesn’t use time and space travel to reach back into the original scene and pull data that wasn’t captured by the camera. Instead, it guesses at what the blown up image should look like based on an examination of other features already in the image, with some expectation that shapes and textures will be similar in different parts of the image and at different scales – will have a sort of fractal nature – and by aiming to make shapes with edges instead of fuzzy boundaries.
Using these algorithms can’t reveal the precise content of a license plate that was too small in the original image to see the letters of.
I did mention that better quality photos are available. And I did show that there are at least 2 photos of the same license plate -- there are probably more. In the photos printed in books, the license plate looks like the last couple lines of the Snellen eye chart in the original article -- i.e. it's quite blurry, but the super-resolution nethod was able to resolve it.
> it guesses at what the blown up image should look like based on an examination of other features already in the image, with some expectation that shapes and textures will be similar in different parts of the image and at different scales
It is true that there are no license plates at different scales in the same image. But are you quite certain that the technique cannot be applied across a range of photographs? That is, couldn't photos of license plates of the same type (and hence with exactly the same font) at different scales taken by the same photographic process on the same day in the same conditions have the same "fractal nature" that could be exploited to improve the resolution of the target photo?
Let me put it another way: Suppose I cut up the Snellen eye chart into 6 equal pieces. Now I have 6 photos. Clearly, 5 of those photos can be used to improve the resolution of the 6th. Is there some principle at work here that says that for super-resolution to work, all data must come from a single original photo?
If you look carefully, it’s actually possible to make meaningful guesses at the letters in the eye chart just as well in the bicubic version as in the “SR” version.
Information cannot be created from nothing, the algorithm sees that the chart seems to be repeating a similar image over and over, so it guesses the best fit for those images on the bottom lines. They might even be completely wrong, for example, is it really a D and not an O?
He is not asking for magic, and it's obviously not "creating information from nothing". No one here is stupid. It's "matching low-density patterns to correlated high-density ones", which you already know if you read the article. Maybe there isn't enough data at all to do it - I don't have access to these images to know better, but that doesn't make the idea any less reasonable.
For instance, if they used this to "enhance" a picture with a piece of paper in someone's pocket, these letters would be enhanced based on other forms somewhere else in the picture -- so some vague letter shapes might become 'Burger King' just because the photo also contains a Burger King logo somewhere :)
For example, if what you're really trying to do is decode text, this algorithm will lose every time to one that knows the prior probabilities of natural human languages. ("ONLY" is much more probable than "QNLY", but this algorithm does't know that).
Yeah, and in fact the fact that it looks crisper can be misleading. You can't rely on a detail that a method like this infers from the low resolution image. Look at the green tiles on the kitchen floor in the 6th example. The "Our SR Result" image looks fantastic, but it's obvious that it's false, because the tiles end up misshapen and unevenly spaced. You really wouldn't want to convict someone of murder based on this type of enhancement.
Statistics is not "luck", it's possibilities.
So, it might can't be "relied upon" _totally_, but, as it is automated, it can be relied upon _MORE_ than mere guessing at the fuzzy picture.
Humans are easily fooled. They will look at a fuzzy picture and see possibilities, but they will look at the output of this algorithm and see fact. There are other algorithms that predict the possibility of objects (usually letters) producing an image that are more suited for investigative work.
As a general rule of thumb, the performance shown in example figures of a computer vision paper should be taken as best cases (and in some papers, as outliers), and even the "failures" shown are often not the worst or most typical. Similarly, quantitative results are generally as optimistic as possible, arrived at through "graduate student descent" of the parameter space.
So I think we are still quite far from a super-resolution method that is truly practical OR effective.
Some specific points about this paper:
- This research group has a long history of exploring methods based on exploiting self-similarity in images for various tasks (super-resolution, denoising, segmentation, etc.), and although they have shown remarkable progress on this front, it is generally agreed that using more than just a single image would improve results drastically for just about any task.
- The use of more-than-exhaustive self-similarity search is EXTREMELY expensive, however, and that's why runtime is often not mentioned in these papers. It's not uncommon for processing times to be on the order of DAYS, for a single tiny VGA image. (I don't remember whether they quote a number in this paper or not, but it's certainly not faster than a few hours per image, unless you start using some hacks.)
- As others have commented, this method is not actually extracting new information from the image, but rather "hallucinating" information based on similar parts of the image. There is certainly reasonable justification for its use as a prior, but it's not clear whether it's optimal, or even close to it.
- My gut feeling is that super resolution methods will find much more application in compression-related areas rather than CSI-esque forensics. For example, JPEG compression is roughly 30%. But downsample an image by 2x in each direction and your compression ratio is already 25%. So if you can downsample images a few times and then use super-resolution to "hallucinate" a high-resolution version, that's probably good enough for the vast majority of common images (e.g., facebook photos), where users aren't very sensitive to quantitative "correctness". And of course with mobile adoption happening much faster than mobile bandwidth is increasing, this becomes an ideal application domain.
that is a wonderful phrase :) going to keep it in mind :)
Implementation in Python: http://mentat.za.net/supreme/
Not really. Dig into concepts called sparsity and compressed sensing, which are quite similar algorithms (not really the algorithms, but the underlying ideas).
I can't find the authors' original website, only this old wired article from 2010: http://www.wired.com/magazine/2010/02/ff_algorithm/2/
So especially with the ongoing miniaturization of imaging sensors, which will allow you to take even bigger and bigger pictures, there are issues these algorithms could solve, unless SD cards get really cheap really fast I guess.
Genuine fractals (now called Perfect Resize)
The question is, how fast is KNN on such high-dimensional data.
What do you think is gained by starting with the new result?
(In CS terms, this is akin to comparing your algorithm to something using a bubble sort, and ignoring the invention of n log n sorting algorithms)
And, with that, I'm bracing myself to get schooled.
Edit: This reminds me of the Iterated Fractal Systems proprietary lossy image compression algorithm they tried to commercialize in the 90s. It was able to decompress to a larger scale image that introduced synthetic detail that was often convincing to the eye. Notice how this article talks about different scales.
The point is that they should be comparing to sinc. They're comparing to a known-stupid algorithm to make themselves look better than they are.
Sadly, this is usually first and last time we see the technology in question. They do not seem to produce any impact that could increase our quality of life. They just sit on some dusty shelves somewhere.
The low-res texture stuff is widely used in emulators (both mobile and on PCs) - after all, that's exactly where it was meant to be used.
The seam carving (the "least interesting line removal" thing) is now a major feature of Photoshop (called content-aware scale) and has become really well integrated (I hear there are attempts to do in video as well).
The fact that you don't see this tech in your line of work doesn't mean it's not out there.
It effectively eliminates the least interesting line when performing a resize, allowing you, for example, to resize a skyline by eliminating the uninteresting smaller buildings and gaps.
There may be others but I haven't run across them.