LaTeX is a good idea with a terrible implementation. The popularity of markdown ...

jessriedel · on Oct 26, 2014

I've said it before and I'll say it again: the single most effective use of ~$1 million for advancing math and physics research (two disciples for which no non-LaTeX solutions exist) would be to hire some developers for a couple of years and make an enterprise-quality successor to TeX. Keep the math syntax, make it handle infinite pages for the Web, and fix all the awful bits that waste hundreds of thousands of grad-student-man-hours each year.

This isn't fantasy. Zotero is evidence that custom built academic software funded by charitable foundations can be a tremendously positive service to the academic community.

https://www.zotero.org/about/

Also, everyone should read deong's comment.

https://news.ycombinator.com/item?id=8511509

joemi · on Oct 26, 2014

To be fair, LaTeX has a more complicated method of processing text since it considers a lot of typographic issues that browsers do not, so it has to be slower than a browser when rendering text. That said, I don't know enough to say whether the amount it's slower is proportional or not.

deong · on Oct 26, 2014

There are a couple of issues here. First, Markdown and HTML simply punt on the vast majority of the issues that TeX solves. Just as the author rightly comments that TeX is not geared toward online publication, HTML is geared toward only that model. If you want to paginate HTML or Markdown, you do it yourself. Widows and orphans are (obviously) your problem to deal with. Compared to the work that TeX is doing, Markdown is, to a first order approximation, just catting the file. HTML can reflow a document in real-time because it's doing a really poor job of reflowing the document. Even when they work, they're just putting line breaks in whenever the width would otherwise be too wide. TeX is running a dynamic programming algorithm to minimize the "badness" of the line breaks across multiple lines and even paragraphs. And quite a lot of the time, the browser just throws its hands up and says, "fuck it, you can just scroll horizontally to read the rest of this line". You can't do that on paper. So of course it's faster. You might as well be complaining that Preview is faster than Photoshop.

HTML and Markdown don't do automatic hyphenation (across multiple languages). They don't do ligatures. They don't do proper text justification (neither does Microsoft Word or Libre Office for that matter). They don't do cross reference tracking (i.e., having automatically numbered sections, tables, figures, etc. with automatically updated references). They have no logic at all for automated float placement. Font handling is specified by a human instead of relying on algorithmic font selection and substitution when necessary. I could go on for pages of this.

I think the idea that web browser vendors are better at this sort of thing than TeX and LaTeX is so wrong I don't know where to start. The author complains that some of his 20 year old LaTeX articles rely on outdated files to render properly. While this is true, and very occasionally a problem, it's only very recently that you had even the vaguest hope that your HTML document would render the same way on two different computers owned by the same person today! Arguably, the biggest slice of the software industry is now devoted to making things render on browsers. And for Markdown, we quite recently saw that even the simplest text rendered in no fewer than 17 different ways depending on which software (and version) you processed it with. If my goal is to be able to reproduce the output of today 15 or 20 years from now, HTML would be the absolute worst choice I could think of, unless again, you stick with <b> tags and the like, and the subset of LaTeX you can reliably assume will always work gives you much broader coverage of the space of typesetting issues than the subset of HTML that doesn't change monthly does. Not to mention, I can still more easily go get an old LaTeX implementation to rebuild a document that doesn't compile anymore (but in 15 years, I've never had to). It's quite a lot harder to get Netscape Navigator 3 up and running to attempt to faithfully render a document I wrote in 1997.

Also, web browsers have historically been just about the buggiest, most insecure, and transient pieces of software we've ever written as a field, and TeX is famously maybe the highest quality piece of software ever written. It's more or less fine that the web changes every 18 months. It's a problem for archivists, but the web isn't really intended for that. Academic publications are though, and the impedance mismatch is, in my opinion, brutal.

The interface (by which I mean the programming language) of TeX and LaTeX is indeed pretty dreadful, but this is a really minor issue compared to the rest of it. There are a lot of things I dislike about LaTeX, but I don't see how HTML or Markdown is an improvement. You'd need a completely new thing that supported everything that LaTeX supports, and while you could certainly do so with a nicer language, you couldn't do it with something as clean and simple as Markdown -- there are just too many things you need to be able to tell it you want it to do.

mangecoeur · on Oct 27, 2014

I disagree that browsers (and I do mean modern browsers, i recognize it hasn't always been this way) are somehow solving an easier problem than tex or doing it in a half-arsed way - on the contrary they solve the very hard problem of correctly rendering content that might be badly formed or underdefined. I don't think there's anything in tex that you can't do in html5 and CSS - including ligatures, auto numbering, and so on.

As for, markdown that's just an example of how there is a demand for text-based writing (I could also give Restructured Text which has a much stricter spec than markdown). I think markdown could evolve to fill the Latex niche.

For a better implementation look at pandoc, which very cleanly parses documents to an internal data structure and convert that to a range of outputs, I think that's a much better basis for a document system. At the moment it has to go via Latex to produce PDF - in fairness latex still has the most mature pdf rendering system. I for one would like to see that change, I think we can do better.

deong · on Oct 28, 2014

As far as I know, every system that can go to LaTeX as an export option gives you a basic LaTeX document. I don't know how you tell Pandoc, for instance, "OK, I need three authors in the author block, centered horizontally, with their affiliations below their names. But authors 1 and 2 have the same affiliation, so only include that information once, but center it below both names as a unit."

How do I tell CSS that I want my bibliography to be sorted by author last name, and have the inline citations be of the form (Author, Year), except when I'm using the author's name in the text as a noun, in which case it should be just "Author (Year) showed that blah blah blah"? For that matter, I don't think CSS can even do justification properly (by properly, I mean not treating each line as an independent unit, but shifting text around within an entire paragraph to minimize deviation from the desired inter-word spacing globally). I know someone implemented TeX's algorithm in Javascript once upon a time, but I'm willing to bet it's not any faster than TeX.

I have no real argument against the idea that you could build something that does everything LaTeX does just as well. Clearly you can. I am arguing that LaTeX has a huge amount of really important things already built in, and people use those things every single day. You have to (a) have all that stuff ready on day one if you want people to use a new thing, and (b) getting from where you are today to that point will necessarily involve taking the nice clean thing that seems so much nicer than LaTeX and making it messier, uglier, and more complex. The only thing that makes Markdown, for instance, nice for people to use is that it only does a handful of common things, so it can make those common things simple and conventional. Bold to bold something. (Amusing and apropos to the topic, HN's version of Markdown appears to not allow me to type star-starBoldstar-star. Not with backslashes or any other way I can find). If you want to build a LaTeX clone though, you need to decide: what's going to be the simple, easy-for-people convention we use to denote "don't put a line break here, because these two characters are someone's initials" and "stack these equations in a group, centered on the equal signs, and include the individual equations on lines 1, 3, 5, and 8 in the global numbering of equations, but not the others." You're going to have to define a stylesheet of some sort to govern the rendering engine's myriad options (do I indent the first line of a paragraph, or should everything be left-aligned, but with extra vertical space between paragraphs).

CSS is arguably already uglier, messier, and more complex, and while I'm sure it's improving all the time, as of about five years ago, I think the entire internet was almost exclusively composed of porn and articles about how to center something vertically, in roughly equal proportion. Epub is an HTML+CSS based format specifically geared at the kind of thing that you'd need, and just like every other technology we're mentioning, it's terrible unless you're doing left-to-right, top-to-bottom, figure-less, table-less, text where formatting doesn't matter. Just like CSS3, we can say, Epub3 supports more stuff now! Someone let me know when it's safe to buy ebooks with code samples in them instead of getting the paper version.

Derbasti · on Oct 26, 2014

CSS3 actually supports pagination, automatic numbering and referencing, hyphenation, justified text, and footnotes. At least that's what the spec says...

deong · on Oct 27, 2014

I'm happy to be corrected on that point, but then the question becomes: what gives us any confidence that the CSS spec is going to be followed in exactly the same way by multiple browser vendors consistently between now and 2034? Certainly nothing in the history of client side rendering on the web gives me any faith in that proposition.

Also, and this is probably just me, but I find CSS even harder to use for bespoke layouts than TeX. Which gets to the last point I made -- certainly you could replace LaTeX with an equally capable substitute, but it's not clear that the substitute wouldn't necessarily recreate a lot of what people hate about LaTeX. Markdown is almost universally loved precisely because it can't do very much. The more features you add, the more cumbersome the mechanism to select them needs to be, and at some point, you just have LaTeX with angle brackets and tag selectors instead of curly braces.

qznc · on Oct 26, 2014

For ACM style papers you need a two-column layout. On the first page at the bottom of the left column must be a copyright notice. As far as I know, CSS cannot do that.

Derbasti · on Oct 27, 2014

CSS now supports fancy footers and multi column layouts.

wanderingmarker · on Oct 26, 2014

> HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering.

You're joking, right? HTML+CSS requires heaps of workarounds to achieve the most trivial layouts. The people behind these standards have no understanding of documents and no taste in software: they deem the absence of variables in CSS a feature, and the result is Less, scss, and similar preprocessors.

Had the CSS committee at least the sense to copy the boxes-and-glue model from TeX, things might not be so grim. As is, we seem to be stuck with their clumsiness for a long time.

WallWextra · on Oct 26, 2014

The typesetting quality of web browsers doesn't even compare to that of TeX, which uses a dynamic programming algorithm to minimize the "badness" caused by line breaks in various places. This is aside from TeX's ability to typeset math.

Shorel · on Oct 27, 2014

> HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering

I think HTML+browsers is something that has been cobbled together as well. Many times. With the added joy of useful features killed by political or profit driven reasons.