
Latex to HTML5 Conversion for Scientific Papers - smarr
https://github.com/smarr/latex-to-html5
======
rmorlok
pdf2htmlEX
([https://github.com/coolwanglu/pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX))
would be another route if you aren't concerned about preserving the semantics
of the markup. Obviously you would to go LaTeX -> PDF -> HTML5. It maintains
the paginated style and looks good for scientific papers (example:
[http://coolwanglu.github.io/pdf2htmlEX/demo/demo.html](http://coolwanglu.github.io/pdf2htmlEX/demo/demo.html))

~~~
cakes
That's definitely very cool - I wanted to find some more info about the
project for my own interests
([https://news.ycombinator.com/item?id=4528797](https://news.ycombinator.com/item?id=4528797))

------
coherentpony
Cool. That said, I use LaTeX for typesetting equations. Neither of the linked
examples show output of typeset equations.

~~~
privong
I was wondering about that as well.. Getting equations right is critically
important. But, from my limited experience, the MathJax syntax is LaTeX-enough
that equations might be one of the easier parts of doing this?

~~~
bachmeier
> MathJax syntax is LaTeX-enough

It is LaTeX syntax.

~~~
privong
> It is LaTeX syntax.

That's LateX-enough for me :)

But in all seriousness, thanks for clarifying/confirming. When I posted I
didn't want to risk overstating.

------
blixt
Nice! There's also this one for getting LaTeX SVGs on the fly:
[http://tex.la/](http://tex.la/)

------
jimhefferon
What does it do beyond what tex4ht does? If there is some way tex4ht could be
improved, perhaps it would be best to contribute to that project?
[https://www.tug.org/applications/tex4ht/mn.html](https://www.tug.org/applications/tex4ht/mn.html)

~~~
smarr
It is built on top of tex4ht. It provides merely a few settings for tex4ht and
post-processing scripts that beautify the generated HTML. You might ask why
post-processing? Well, because it was simpler for me than figuring out how to
get tex4ht to do the desired thing. I just find Tex/Latex not pleasant to use
as a programming language, but that's personal taste.

~~~
xorcist
If the post-processing stage is useful to others, perhaps it could be
upstreamed into tex4ht?

(I sometimes think that the user interface of github puts too much emphasis on
cloning and not enough on cooperation. Many useful tools ends up in a dozen
forks, all with slightly different features, all equally inactive.)

------
effie
What science needs, I believe, is not another tool for making TeX more usable
for information interchange, but a simple (much simpler than LaTeX, whose
complexity and user experience is terrible), web-oriented standard language
for typing in libre scientific and technical documents that browsers would
support (or it can be translated to valid HTML+CSS seamlessly). Web documents
are cheaper and more accessible to people, we should concentrate on those. I
don't know if there is usable language and platform of this kind already. When
we'll have that, tools for converting web documents to other less important
formats such as paper-printable ones can be created.

~~~
mjn
Web documents are cheaper and more accessible, but there's still a quite large
usage of print documents, so at least I, as a document author, don't want to
commit to a "web-only" toolchain without a good to-print workflow also being
available.

It's possible for HTML+CSS to also provide a good to-print workflow, but I
don't think it's there yet, at least using open-source tools. I have heard
PrinceXML can produce good results, with sufficient control over the print
layout to make HTML+CSS usable as a print-oriented markup language. But
between the cost, and the prospect of becoming dependent on a proprietary tool
with no obvious alternatives, I haven't tried it.

~~~
jamespaden
PrinceXML is definitely the best tool, but there are a couple similar
commercial alternatives and wkhtmltopdf have become a decent open source
solution as long as you want basic docs.

------
michal_h21
You should definitely post some info about your project to tex4ht mailing
list, I hope some interesting and more informed discussion might happen there.

you may also take a look at make4ht
([https://github.com/michal-h21/make4ht](https://github.com/michal-h21/make4ht)),
it is a build tool for tex4ht, included in TeX distributions and it also solve
some of the problems as your script (ligatures, spurious span elements, image
conversion, unicode, etc.). It can execute custom commands on all output
files, so your script could be used with it as well

------
amelius
The funny thing is, I find HTML5 to be much more predictable when it comes to
formatting :)

But maybe that's just my ignorance of LaTeX showing, although I did manage to
write two theses in it.

------
therealmarv
Latex is a very bad format and dead end for interchanging to other formats.

Also like Pandoc and Markdown... if somebody is interested for a workflow to
write a scientific paper with it check this project out:
[https://github.com/tompollard/phd_thesis_markdown](https://github.com/tompollard/phd_thesis_markdown)

~~~
j2kun
Strictly speaking, isn't this false by virtue of latex compiling to pdf?

And I have been looking hard for years and haven't found anything to replace
TeX that fulfills even half my needs (as a person who does math typesetting in
large documents almost daily). LaTeX has its issues, but it's the best we've
got.

~~~
peatmoss
Compiling to PDF as a display format, and being amenable to translation to
another markup format that retains the structure of the original markup, are
slightly different tasks.

In general, I'd agree with the parent that LaTeX is something of a dead end in
terms of translation. LaTeX will happily compile to PS/PDF/DVI, but
translation to something like HTML is pretty reliant on using a subset of
LaTeX. You can, after all, write LaTeX code to do computation.

In my experience with Pandoc, there are a good number of packages that simply
don't work when translating from LaTeX. The more specialized or formatting
specific the package--the further you deviate from the standard Article class
and simple section headers--the less likely that you'll have a good result.

~~~
j2kun
A format is a format, but point taken.

I've found that pandoc (and every other conversion utility) doesn't even
produce useful results for LaTeX even if the LaTeX source doesn't use any
complex macros or fringe packages.

The problem seems not to be that LaTeX is too powerful, but rather that the
point of HTML and Markdown is to be as lightweight as possible. LaTeX, on the
other hand, is meant to be a useful tool for humans to use to minimize the
amount of boilerplate needed to write a large technical document (in print or
on the web!).

------
anc84
Nice! I would suggest a sans-serif font though, aren't they better for display
reading? Hyphenation would be nice too.

~~~
dfan
The main historical reason for using sans-serif fonts on displays was the
displays' lousy resolution. With modern display technology that reason has
mostly disappeared.

~~~
dyladan
At what point does the resolution become good enough for serif fonts I wonder?
Unfortunately 1366 x 768 is still quite the norm (118 ppi on a 13.3" screen).

~~~
yellowapple
I find serif fonts to be perfectly readable even on 1280x800 displays.

------
skimpycompiler
I find that Pandoc fulfills all of my needs.

~~~
smarr
I tried Pandoc before reverting back to tex4ht. Unfortunately, it models a
rather small subset of the things I was interested in. Specifically around the
typesetting of citations and listings, as far a I remember. So, tex4ht and
HTML post-processing it was.

------
nmc
Not enough Web knowledge to easily dive into the codebase...

Could anyone be so kind as to compare this with:
[https://github.com/pyramation/LaTeX2HTML5](https://github.com/pyramation/LaTeX2HTML5)

------
j2kun
Is there full support for math mode? (it doesn't appear from the examples that
the author uses math mode, so I'd guess no)

~~~
jimhefferon
tex4ht does support math mode. By default it puts out pictures. But it is
highly configurable. See for example
[https://www.tug.org/applications/tex4ht/mn11.html](https://www.tug.org/applications/tex4ht/mn11.html).

------
geyang
i have found all these utilities unsatisfying. It might make sense to write a
JavaScript library that parses latex.

~~~
mhartl
In general, mapping LaTeX to HTML is an unsolvable problem (and I speak as the
author of one attempt to solve it
([http://github.com/softcover/softcover)](http://github.com/softcover/softcover\))).

~~~
gkop
Link has an extra closing parenthesis, this link works:
[https://github.com/softcover/softcover](https://github.com/softcover/softcover)

~~~
mhartl
Thanks. Looks like it's due to a bug in the Hacker News link parser. I'll be
more mindful of that in the future.

