Hacker News new | past | comments | ask | show | jobs | submit login

If you grab any academic paper (particularly two columns) there is a good chance getting the text out will be hard, and any part of the paper with maths or tables will be unusable. Sorry. I'm away from a computer now, to make a smaller example.



The paper "GADTs Meet Their Match" (first I had in my list) seems to work fine, but I don't know what it was generated with.


cairo 1.13.1 is listed as the generator.

http://checkers.eiii.eu/en/pdfcheck/?url=https%3A%2F%2Fwww.m...

The ACM template fails more! http://checkers.eiii.eu/en/pdfcheck/?url=https://www.acm.org..., and it's generated by pdfTex-1.40.15


I'll pick on one of my own random papers:

https://www.cs.york.ac.uk/aig/projects/implied/docs/cp03.pdf

Try extracting "Theorem 2" on page 5, or any text really. I just get random noise through either a PDF reader, or something like pdf2ascii / ps2ascii.

We just made this with standard latex.


Any chance you could post the source code for this? It's using bitmaps for characters instead of proper fonts, which shouldn't happen nowadays. Maybe you should put "\usepackage{lmodern}" at the start? See for example https://tex.stackexchange.com/questions/1291/why-are-bitmap-...

I work with course materials made in Latex, and students sometimes need/want to copy and paste from them, so I try to avoid these kinds of problems.


That’s interesting. Did you \usepackage[T1]{fontenc}?


Thanks paper is from 2003, so I'm not sure.

This is just an example. From experience, most PDFs at conferences and journals, generated from pdf, are not accessible to varying degrees.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: