
Page Dewarping - pietrofmaggi
https://mzucker.github.io/2016/08/15/page-dewarping.html
======
kazinator
This strikes close to home for me, because I've done this "by hand" quite a
number of times, using some interactive Gimp filters, with very good results.
I've been able to take a perspective photo of a curved page lying on a desk,
get it almost perfectly straight, and threshold it to clean black and white or
sharp gray scale.

Here is my very quick and dirty manual job of the same example page:

[http://www.kylheku.com/~kaz/dewarp.png](http://www.kylheku.com/~kaz/dewarp.png)

Literally less than five minutes.

First I cropped the image. Then duplicated the layer. Blurred the top layer
(Gaussian, 50 radius). Then flipped to Divide mode and merged the visible
layers. This leveled the lightness quite well, almost completely eliminating
the shadow over he right side of the page and all other lighting differences.
There is a hint of the edge of the shadow still present because it is such a
sharp contrast; but that can be eliminated in an adjustment of the intensity
curves. In such cases it may be helpful to experiment with smaller blur radii,
too.

I then did a perspective transform in the lateral direction, squeezing the
left side top-bottom and expanding the right, resulting in the warp now being
approximately horizontal. (The perspective transform is not just for adding a
perspective effect; it is also useful _reversing_ perspective!)

Finally, I used the Curve Bend (with its horrible interactive interface and
awful preview) to warp in a compensating way. Basically, the idea is to draw
an upper and lower curve which is the opposite of the curve on the page. I
made two attempts, keeping the results of the second.

If the preview of this tool wasn't a ridiculous, inscrutable thumbnail, it
would be possible to do an excellent job in one attempt, probably close to
perfect.

Because the page is evenly light thanks to the divide-by-blurred layer trick,
it will nicely threshold to black and white, or a narrow grayscale range.

~~~
mzucker
Hence the tagline on the blog: "Why do it by hand if you can code it in just
quadruple the time?" :)

~~~
digi_owl
More work up front, less work afterwards.

------
cooper12
I love well-illustrated writeups. Even a reader without mathematics or
programming knowledge can understand what steps the author took. His model
actually seems to better represent the warped paper than a cylinder would.
(though I don't know the actual specifics of the CTM model)

I wish he went into more details on the steps taken after dewarping. You can
tweak the image levels to get good contrast, but surprisingly there aren't any
shadows from underleveling or loss of detail from overleveling. I wonder if
the author ran OCR on the scans after, and speaking of OCR, IIRC Leptonica is
one of the dependencies of Tesseract so it must do some similar pre-
processing.

Edit: reading more carefully, he mentions that he used adaptive thresholding
from OpenCV.

------
niftich
Recently, Dropbox wrote about dewarping prior to OCR in their app:
[https://news.ycombinator.com/item?id=12297944](https://news.ycombinator.com/item?id=12297944)

This code had the same idea, and is open-source!

~~~
marcusjt
No, what this code does is much more sophisticated - Dropbox do not dewarp
(i.e. remove non-linear distortions) they only transform the image to make the
document rectilinear (leaving the distortions/deformations intact), which is
much simpler.

------
troymc
I use Microsoft's "Office Lens" app on my Android phone all the time, like a
"smart camera" which automatically squares off and white-balances each photo
of a page (usually mail or forms filled in by hand). It can't handle warped
pages though, so I hope they add something like this!

~~~
claar
Yes, Office Lens is great for that. I recently switched to CamScanner, which
has an even better algorithm for deskewing pages (take a pic of a receipt at
an odd angle, it automatically transforms it to flat and rectangular). Pro
version was on sale -- I'm not affiliated, just a happy customer and highly
related to this article.

~~~
reitanqild
I have it as well, but use Office Lens or Google Drive (create a shortcut to a
folder on home screen of android device and use it to scan into it).

The reason is CamScanner tried to upsell me to some monthly plan.

For all good advice on HN that you should build recurring revenue: It
seriously annoys me when people tries to do that by demanding monthly payments
for static features.

(Totally OK with selling license keys for new features etc.)

~~~
reitanqild
Update: CamScanner is now less annoying than what I used to remember. I might
actually be switching back.

------
peterjmag
_it came in handy whenever a student emailed me their homework as a pile of
JPEGs._

Gotta admire this guy's resourcefulness—and patience. If I were a professor,
I'd probably just reject the assignment outright if a student sent me a bunch
of photos from their smartphone in lieu of a PDF or a "proper" scan. :)

~~~
vidarh
Reject it, in the form of a picture of a handwritten note on a print-out of
his jpegs.

------
mzucker
Thanks for the feedback, everyone -- happy to answer questions here or in the
Disqus comments on my blog.

~~~
tomn
This is really interesting, thanks!

If you decide to try to make this faster, check out ceres[0] a non-linear
least squares optimisation framework that does automatic differentiation using
a clever C++ template hack.

I've used it a few times to solve these kind of problems and found it to be
very good!

[0] [http://ceres-solver.org/](http://ceres-solver.org/)

~~~
mzucker
Yep, I'm still waiting to use ceres for something - I didn't end up using it
on my image approximation project
[https://mzucker.github.io/2016/08/01/gabor-2.html](https://mzucker.github.io/2016/08/01/gabor-2.html)
because it doesn't work well with inequality constraints.

~~~
tomn
Ah yes, only supporting constraints on the parameters can be annoying if your
problem requires that.

------
anilgulecha
Here's something that I think has not been done, but could be quite lucrative,
building a high resolution scanner using the phone camera, multiple pictures
and interpolation/noise removal.

Most phone cameras these days have good resolutions, and you could technically
take a 6x4 photo, divvy it to 3x3 grid and take close up photos, and have
smart algorithms interpolate the pixels to form a single image with high res.
I'd even bet you'd results equal to or better than a flat bed scanner.

For better us, just open the camera preview and slowly pan over the image.

Has someone tried something like this? With FOSS apps like mosaic, hdr tools
and imagemagick, it should be possible. I'm guessing opencv would be needed
for interpolation and noise removal..

~~~
brownbat
I've used some free apps that turn phones into document scanners. Almost all
banks I use have something like that embedded for check deposits. Maybe
they're just doing deskew rather than dewarping though... loose documents
aren't usually warped like book pages.

The next hard problem would be help with DIY book scanning. Like the camera
could sit over my shoulder and detect when I've turned a page, then
automatically take a picture of the new page. Then OCR kicks in, and
conversion to EPUB, preserving graphics when pages have non-text elements.
Mostly just feel like we can probably do better than the massive contraptions
over at www.diybookscanner.org

------
Syzygies
This problem is fundamentally different with stereo images; there is a hope of
reconstructing the exact 3D geometry of the page before flattening, rather
than inferring from content. An iPhone app that did this would do well.

~~~
radarsat1
I've always thought it should be possible using a short video of the camera
orbiting the page from different angles. (Assuming a static scene of course.)

~~~
manmal
Or now, some frames of a live photo.

------
Ciantic
I wonder if someone has tried to dewarp whole book from a video when flipping
through it. I imagine that could be handy way to copy the whole book.

~~~
mortenjorck
Page warp becomes but a tiny part of the problem with a page-flip video – now
you have motion blur, varying exposure, resolution, noise, and depth of field
to worry about. Maybe if you had a camera capable of recording extremely high
frame rates at high resolution (probably north of 120fps @ 1440p), a high-
intensity, diffuse light source, and a lens with a tight aperture / narrow
focal length, you’d at least have something to work with.

~~~
voxic11
Doesn't that describe most modern cell phone cameras?

~~~
Jarwain
I haven't heard of a phone camera that can do video at 120fps, they're usually
30, maybe 60 fps. The S7 has some high framerate slow-mo option, but it's
limited to a 720p resolution.

~~~
chrisfosterelli
The iPhone 6S records 1080p at 120 fps and 720p at 240 fps for their Slow-Mo
feature [0].

[0]:
[http://www.apple.com/ca/iphone-6s/specs/](http://www.apple.com/ca/iphone-6s/specs/)

------
jgable
This is great! My wife is a music teacher and often scans sheet music so that
it's more portable. She has been asking me for a while for something exactly
like this. I'll have to tweak it to work on sheet music, since I imagine his
methods to identify lines of text won't work for the music staff out of the
box.

~~~
orivej
ScanTailor [1] also implements automatic dewarping and is overall great for
scan postprocessing.

[1] [http://scantailor.org/](http://scantailor.org/)

~~~
dalke
I used it for about 1000 pages I photographed with a DSLR. The automatic
dewarping rarely worked for me, so I ended up doing it manually.

It likely didn't help that many of the pages were typed carbon copies or hand-
written. For the former I had to put another piece of paper behind the page to
prevent the next page from bleeding through.

That said, I managed to get the job done, though it took a couple of weeks.
Next is to capture the metadata (author, date written, ...). There were a lot
of one page letters in the documents I copied.

Would love to try out another tool that could read from the ScanTailor project
file to get the page segmentation or even the warping I did manually to
improve on the result.

------
amelius
The next step would be to "depixelate" the resulting image. How could this be
done? I guess OCR would not work because of the variation of the fonts (you
don't want the document to end up in a single font; you want to keep the
fonts). Could a deep learning approach work here, even if it has not been
trained on all the specific fonts?

~~~
vidarh
Plenty of engines will do OCR and use the shapes recognised with high
certainty to affect how they detect the rest.

There are many ways of doing this, and you can achieve some results even
without knowing if your image is text, but just has lots of self-similarity by
virtually sliding a "grid" over the image, slicing it up into n-by-n squares,
running any of a number of nearest-neighbour variants over it, and then for
each cluster replace all instances of the squares in the cluster by the one
which minimise the overall error rate vs the others.

This will work reasonably well for very structured images such as text, as
long as enough characters are near correct, and will retain custom fonts etc.
but clean them up quite a bit as long as they either are different enough, or
occur often enough on a page to not get "corrected".

I'm sure there are better ways of doing this too - it's been a decade since I
kept up with OCR research.

------
renlo
How much of an effect does the camera lens make in page warping? Correct me if
I'm wrong, but for shorter focal length lenses I would think it would warp the
page more. If a person accounted for that, could they get a near perfect
result? Or does his algorithm account for that? It seems that one would need
to know where the center of the image would be.

~~~
mzucker
Pretty sure focal length would affect things as you say, but also physical
dimensions matter too. My program assumes fixed focal length and then picks
the page dimensions that work with the assumed focal length -- almost
certainly not the correct ones.

------
petters
"You can see these are not exactly small optimization problems. The smallest
one has 89 parameters in the model, and the largest has 600."

Those _are_ small optmization problems. These types of problems are solved in
computer vision for hundreds of thousands of variables. His problem can be
solved in real-time, not tens of seconds.

~~~
mnw21cam
Compared to weather prediction, which is an optimisation problem involving
hundreds of millions of variables, and occupies a hefty supercomputing cluster
for a few hours.

------
paul_milovanov
Consider mentioning Dan Bloomberg as the author of the original work as well
as Leptonica. :)

------
voltagex_
Thank you for the gif near the start - it really helped me to understand what
was going on.

------
saynsedit
Is using a curve whose end points are fixed to zero to model the warping
accurate? I can't see a rationale for why the end points should both be 0.

~~~
mzucker
The main rationale is removing redundant degrees of freedom -- since the page
is allowed to rotate freely, the edges of the page can still move around
plenty.

~~~
saynsedit
I see, so basically assume the two end points are at zero and there is some
rotation accounting for the endpoint offset in real space.

It still doesn't seem fully accurate as I can imagine a non-rotated cubic
curve with endpoints at an offset, but I assume your simplification works well
enough.

------
artursapek
I've always been interested in getting into graphics programming, and stuff
like this only makes me more interested. Really well written post.

------
nullcipher
Wrap this in a service and you have a startup!

------
anowell
This is solid. I'm an engineer at Algorithmia, and this caught our attention
as the sort of project we love to host as a service on our algorithm
marketplace. We've already made note of it for our team to consider adding
(thanks to the generous MIT license), but I wanted to reach out in case you'd
rather add, own, and optionally monetize it on our platform yourself. Either
way, this was a great read with impressive results.

