
The sad state of PDF-Accessibility of LaTex Documents (2016) - rmbeard
https://umij.wordpress.com/2016/08/11/the-sad-state-of-pdf-accessibility-of-latex-documents/
======
konjin
>Take your average computer science graduate from the last ten years. Do you
think anyone would be remotely able to understand what is going on there?

Yes, you literally read the literate program of TeX and understand what's
going on: [http://brokestream.com/tex.pdf](http://brokestream.com/tex.pdf)

I had never learned Pascal but I've managed to edit and compile TeX
successfully, and it was easier than trying to understand any of my own non-
literate programs.

>My point being that if we wouldn’t rely on TeX itself and use ANT (or
whatever alternative) which is written in the quite elegant OCaml, than
hacking it would be at least possible for mere mortals. Although I have to
admit, despite being in love with OCaml since my PhD days, it’s also a quite
niche language. But imagine if the whole thing was written in Python, or at
least C.

Imagine if software engineers were actual engineers instead of glorified
script kiddies.

>I wish someone would design a new space shuttle because while it's a neat
project I only understand MKS units and it's too much effort to use a
calculator for converting between them and Imperial units.

~~~
dewey
> Imagine if software engineers were actual engineers instead of glorified
> script kiddies.

Maybe you shouldn't need to be an engineer to write a document with some
formulas inside?

~~~
HarryHirsch
Back in the day we had engineers writing documents from markup. We called them
typesetters. It was a qualified job, you don't _expect_ the task to be
something Joe Random could to with 6 weeks of training.

~~~
throw0101a
> _Back in the day we had engineers writing documents from markup._

Back in the day secretaries were creating electronic documents with troff and
nroff:

> _The first version of Unix was developed on a PDP-7 which was sitting around
> Bell Labs. In 1971 the developers wanted to get a PDP-11 for further work on
> the operating system. In order to justify the cost for this system, they
> proposed that they would implement a document-formatting system for the Bell
> Labs patents department[1]. This first formatting program was a
> reimplementation of McIllroy 's roff, written by Joe F. Ossanna._

* [https://en.wikipedia.org/wiki/Troff](https://en.wikipedia.org/wiki/Troff)

~~~
rmbeard
OP, here. It's not that long ago, I've used troff myself. Or maybe it was that
long ago, but it just seems like yesterday.

------
choeger
The thing is: LaTeX might try hard to look like a declarative language for
structured documents, but it is not. It is a set of TeX macros. And TeX is a
type setting system.

There is no good reason to put the accessibility into the type setting.
Instead, use a declarative (e.g., any markup) language, translate that a) to
(LaTeX) and b) to accessibility annotations and then combine the two results.
Problem solved.

Unfortunately you will either lose a lot of expressiveness along the way or
you have to find a _very_ sophisticated markup language.

~~~
romwell
Why not simply augment LaTex with PDF tags which would be inserted manually,
in the process of typesetting?

Something like:

    
    
        \pdftag{blah}
    

Common packages could then generate these tags, and very few modifications of
TeX source would be needed.

~~~
choeger
Yeah, one could do that. But then again, one could do that with any other
scripting language. A true declarative document would mean a single source of
truth and freedom from those technical matters.

------
hprotagonist
An answer, particularly in the sciences, is to also distribute the source
*.tex files, which being plain text with markup, can be handled just fine by
things like emacspeak, or accessibility tooling for other sensible editors.

This comes up a bit around the blind accessibility issue for mathematics,
which is why I suspect it's bubbling up this week on HN.

~~~
konjin
Standard maths notation is sight first and sight only.

If you want maths for the blind you need to convert to something like
s-expressions, which emacspeak and read perfectly.

~~~
hprotagonist
There's a bunch of counter-argument here from working practising blind
mathematicians who read and write raw TeX every day, so there's at least a
non-zero audience.

[https://www.nfb.org/latex-what-it-and-why-do-we-need-
it-0](https://www.nfb.org/latex-what-it-and-why-do-we-need-it-0)

~~~
konjin
There were a bunch of mathematicians who used roman numerals for arithmetic
between 0 AD and 1200AD. That they existed is not a counter-argument to the
fact roman numerals are terrible to do arithmetic in. That is an argument that
people can get used to anything, and become proficient enough at it that
changing to a new - and better - system will set them back enough for it to be
not worth while doing - for them. The same is true for modern maths notation.

~~~
hprotagonist
I am not blind. Neither am I a mathematician. I'm a sighted biomedical
engineer.

I have heard blind mathematician colleagues very loudly espouse using LaTeX
for everything they do. I'm inclined to believe them!

------
scoresmoke
Even though LaTeX is still not very close to producing perfectly accessible
PDF documents, there is some recent work towards this goal.

\- [https://ctan.org/pkg/tagpdf](https://ctan.org/pkg/tagpdf)

\- [https://ctan.org/pkg/accessibility](https://ctan.org/pkg/accessibility)

I am using the former for some personal documents and found that it improves
text selection and copying on Apple devices. (This could be related to how
PDFKit handles text.)

Edit: formatting.

~~~
JadeNB
accessibility, which its own author says not to use any more (or at least not
until foundational problems are fixed), is mentioned in the article.

------
bfirsh
Converting LaTeX to HTML may be a route to making it accessible. I'm working
on this project: [https://github.com/arxiv-
vanity/engrafo](https://github.com/arxiv-vanity/engrafo)

It's 80% of the way there, but with 80% more work it could be a pretty
complete implementation.

It powers this: [https://www.arxiv-vanity.com/](https://www.arxiv-vanity.com/)

------
vehemenz
If you want accessibility, it would be better to convert your content to XML
and run the LaTeX through MathJax first, using accessibility extensions
([https://mathjax.github.io/MathJax-a11y/docs/](https://mathjax.github.io/MathJax-a11y/docs/)).
Then use a third-party converter such as PrinceXML to generate the PDF from
the XML.

------
ffk
One pattern I like to use is to write my documents using markdown which can be
compiled into pdf via latex with a template of my choosing. It is also capable
of compiling to other formats which may be more accessible such as plain text,
html and docx.

[https://pandoc.org/](https://pandoc.org/)

[edit to add link to pandoc]

------
minikites
The "accessibility" LaTeX package maintainer is looking for help in this area:
[https://github.com/AndyClifton/accessibility/issues/42](https://github.com/AndyClifton/accessibility/issues/42)

------
amai
The author writes:

"Did I mention that both Word and LibreOffice generate tagged PDFs?"

But then the simple solution is this: Convert your LaTeX to Word or
LibreOffice. Then generate the PDF.

Absurdly the easiest way to convert LaTeX to Word/LibreOffice is by creating a
PDF first ([https://tex.stackexchange.com/questions/111886/how-to-
conver...](https://tex.stackexchange.com/questions/111886/how-to-convert-a-
scientific-manuscript-from-latex-to-word-using-pandoc)), import that into
Word/LibreOffice and then create your PDF/A from that.

------
bokumo
Ross Moore did a presentation about accessibility and PDF at the 2020 TUG
online conference.

[https://youtu.be/VF9Ubax_HIY](https://youtu.be/VF9Ubax_HIY)

------
amai
[https://tex.stackexchange.com/questions/498987/generate-
pdf-...](https://tex.stackexchange.com/questions/498987/generate-
pdf-a-1b-with-lualatex)

------
hugh-avherald
Why must the PDF encapsulate all requirements? My understanding of
accessibility requirements it that you must have _a_ version that is amenable
to automatic speech, not that all versions must be.

~~~
ldjb
It doesn't have to. But ideally you only want to have to produce and
distribute a single file.

Also, a lot of people don't consider accessibility when generating PDFs. If
LaTeX produced accessible PDFs, the PDFs would be tagged automatically without
the author even thinking about it. Of course, it might still not be perfect,
but it would be a lot better than the status quo.

------
mci
1\. Needs (2016) in the title.

2\. Even by 2016, pdfTeX had been largely superseded by LuaTeX.

3\. The author bizzarely links to "the mess" of the literate source of TeX the
program as a WEB file rather than as a typeset document.

4\. AIUI, the source code of the TeX engine has nothing/very little to do with
adding tags to PDFs, which it is the job for LaTeX packages. Admittedly,
understanding and writing their source code is a rarer skill than reading the
literate source of TeX.

~~~
ketzu
> 2\. Even by 2016, pdfTeX had been largely superseded by LuaTeX.

Is that true? My experience was that pdftex is by far the most used one, but
while thinking about it, I noticed I have zero data to back that up.

~~~
jimhefferon
The overwhelming majority of people writing today use pdflatex. But at this
point lualatex is pretty fast, and is starting to accumlate packages that
leverage it to do good things. It is starting to build momentum.

------
aklemm
I can't believe MathML just died and it's like not even part of the
conversation about the history of math markup.

~~~
lol768
MathML isn't dead! Igalia have been doing some pretty great work on getting it
upstreamed into Chromium, where there has been no MathML implementation (in
contrast to Firefox) for some time.

------
rbobby
Imagine how much further ahead HTML/CSS would have been if the academic crowd
abandoned latex 20 years ago.

Revisit this comment in 5 years.

~~~
Mediterraneo10
It wasn’t because mathematicians were reluctant to change that math in HTML
didn't take off. Rather, it was because browser developers were loathe to
implement and maintain the enormous and complex pile of code that is MathML,
and they said "Why should we, if you mathematicians already have LaTeX?" [0]

[0]
[https://en.wikipedia.org/wiki/MathML#Browser_support](https://en.wikipedia.org/wiki/MathML#Browser_support)

~~~
jfk13
Firefox has had MathML support for a long time. Complain to Apple and Google
(and vote with your browsing activity, by using the browser that is less
driven by commercial considerations).

~~~
realityking
Safari has supported MathML since 2011. Though apparently the implementation
is somewhat buggy.

The real issue is the lack of MathML support by Chrome (and until recently,
Edge)

~~~
extra88
Yes, Chrome had some MathML support but removed it (I think as part of their
forking of Blink from WebKit).

------
svnpenn
I had a realization a while back, that in my opinion LaTeX isnt really needed
anymore. Pretty much anything you can do with LaTeX, you can do with HTML.
Want a PDF? Most browsers will print to PDF now, or you can use a library like
this:

[https://github.com/dompdf/dompdf](https://github.com/dompdf/dompdf)

Need a page break? Here you go:

[https://developer.mozilla.org/Web/CSS/break-
after](https://developer.mozilla.org/Web/CSS/break-after)

Im not sure what you would do about TikZ and stuff like this, but I have seen
some pretty wild stuff in CSS, so surely its possible:

[https://pattle.github.io/simpsons-in-css](https://pattle.github.io/simpsons-
in-css)

~~~
jimktrains2
Not even considering the math formatting, Html is still lacking good
footnotes, bibliographies, glossary generation, index generation, and table of
context generation. Browsers also render things atrociously compared to a
latex pdf.

~~~
vehemenz
There are third-party tools that do all of this.

A LaTeX-generated PDF does not render correctly at all for a blind user.

~~~
jimktrains2
I'm not saying latex is exemplary in all forms. (I would also like to know if
the big browsers render all web pages as accessible PDFs, though).

I simply meant that a plain html document + the browser leaves much to be
desired for even non-technical documents.

Tbh, I would like to see a more advanced and open html-based ecosystem for
documents. Latex has many watts, but also a lot of features.

