

Briss Trims PDFs so They Fit Better, Are Easier to Read on Your Ereader - mono
http://lifehacker.com/5744899/briss-trims-pdfs-to-make-them-more-readable-on-your-e+reader

======
larsberg
I know I should "just do it myself," but I keep waiting for something that can
unsplit and unwrap PDFs generated in ACM double-column style with LaTeX word-
breaking and turn it into an epub with graphics for the figures/tables. Trying
to deal with that 9-ish pt. font is a huge pain for my old eyes. I ended up
giving up on reading them on my iPad because keeping a reasonable zoom level
and managing to scan down then over to the next column required the finger
dexterity of a concert pianist (even on GoodReader, which is quite, well,
Good).

~~~
rubidium
Calibre was mentioned in the article as being able to convert PDF's into epub
format. I had my hopes up for a second, so downloaded it and tried it on a
textbook and a smaller scientific publication.

It threw up on both the math equations and figures. It didn't handle the
general formatting of the book too well either.

To my knowledge, a good PDF->epub converter has not yet been built. Any
takers?

~~~
dpapathanasiou
" _To my knowledge, a good PDF- >epub converter has not yet been built. Any
takers?_"

Check out eBookBurn.com, which is a site I launched last month
(<http://denis.papathanasiou.org/?p=468>).

It lets you upload pdfs and attempts to parse them into editable text.

The pdf parsing is based on my experiments with pdf-miner
(<http://denis.papathanasiou.org/?p=343>), and while still imperfect (in
general parsing pdfs is a difficult problem), it works fairly well for certain
types of whitepapers.

~~~
roel_v
This is a wicked cool site, but you need to put in screenshots of the input
(how it went in) and the output (what the output looked like in an epub
reader).

What approach do your algorithms use? Do you do recognition of title,
subtitles etc based on differences in fonts, spacing, line length etc.? Or do
you need to enter regexps to recognize those?

Do you recognize paragraphs correctly?

Can you filter out front- and back filler like the ToC, and extract only the
'content' pages?

If so, it's 90% of what I'm looking for and I think good enough to pay for :)

I have some notes on how to approach from when I tried to make it myself, it
includes what functionality I consider necessary for a MVP. Let me know if
you're interested...

~~~
dpapathanasiou
I'm working on an FAQ/Help page which will show some of those features in more
detail.

The algorithm I use is a variation of the code described here:
<http://denis.papathanasiou.org/?p=343> except the output is html, not text,
so that I can take account things like font sizes and paragraph breaks.

If you signup and try it (it's free for the first 3 days), you'll see that the
parser renders each pdf page as text, and it's up to you to decide which range
of pages you want to use in your book.

Feel free to contact me by the form on that site, and I can reply in more
detail.

------
felixc
A similar tool is my own PDFMunge, previously discussed on HN here:
<http://news.ycombinator.com/item?id=1089068> and in more depth on my site
here: [http://www.felixcrux.com/posts/pdfmunge-improve-pdfs-
ebook-r...](http://www.felixcrux.com/posts/pdfmunge-improve-pdfs-ebook-
readers/)

But this looks quite a bit more polished and user-friendly.

------
afshin
Clever name ... a little crass.

~~~
chopsueyar
I thought Briss was the client app, and Mohel was server side.

------
gaiusparx
PDF is ill fitted to be the format for mobile devices due to its format-for-
print purpose with no text-overflow. Its time epub and mobi takes over.

~~~
sliverstorm
Really PDF is just ill-suited for distribution of text. The only reasonable
exceptions are when that text is _explicitly_ meant for printing (ala fliers
or posters), or when said text is not computerized- e.g. a scan of written
script that has yet to be OCR'd

~~~
stcredzero
Is the name of this product a commentary on those who send PDF to mobile
users?

------
fdb
On the iPad, GoodReader can do margin cropping on the fly, and remembers the
margins you've set up for a document so they're reapplied when you open the
document again.

<http://www.goodreader.net/goodreader.html>

~~~
larsberg
I just wish it had some smarts for two-column PDFs (easily 90% of what I
read). I often resize, read down the left column, then shift to the top, which
moves the crop window and confuses GR horribly.

------
kraemate
I've been wanting something like this for ages - particularly to print ebooks
and latex stuff with their huge side-margins. The basic aim is to trim all
margins and print 2 pages side-by-side (landscape).

While Briss trims the margins just fine, printing the (trimmed) document as
pdf(or ps) restores the margins. (Tried on okular/evince). What gives?

------
chopsueyar
Good name.

------
siculars
Oy vey.

