Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(I'm a co-founder at Overleaf.com, which does collaborative 'LaTeX in the browser' in a different sense.)

I like the idea of a 'sane' subset of LaTeX that is easy to publish to the web. There are tools like LaTeXML and TeX4ht that try to convert general LaTeX documents to (X)HTML, but it's a very hard problem.

Some difficulties arise from the fact that TeX is just very hard to parse in general. Even the first stage of parsing TeX is Turing complete [1]. This makes it hard to write tooling e.g. for linting (though tools exist, e.g. chktex) or creating a WYSIWYG editor backed by LaTeX [2]. (edit: or creating a good LaTeX auto-complete [4])

Others arise from TeX's extensibility --- there are many thousands of packages that define their own commands and environments for different types of documents and different disciplines. This extensibility is on the one hand one of the main reasons that TeX and LaTeX are still actively used some 40 years after TeX's initial release, but on the other hand a major challenge for conversion to HTML. The LaTeXML project has many custom bindings [3] for these packages, but it's far from complete.

I guess the main question is whether we can find the right subset, and this project looks like a great start.

[1] https://tex.stackexchange.com/questions/4201/is-there-a-bnf-...

[2] https://www.overleaf.com/blog/81 --- my first attempt at rich text on Overleaf, many years ago

[3] http://dlmf.nist.gov/LaTeXML/manual/customization/customizat...

[4] https://www.overleaf.com/blog/523-a-data-driven-approach-to-...



I really like the idea of adding math support to Unicode via combining characters. It's more complicated than anything Unicode currently deals with, but not that much more complicated, and the idea of being able to put math into anything that currently accepts strings is just so enticing. We should treat math as it's own language, and rendering it as we would any other human language with an unusual way of laying out characters.


It's an interesting idea. At what point, though, do we draw the line between what a character set (like Unicode) should handle, and what should be handled by a higher-level layer? I'm thinking that things like boldness, italicisation, and super script aren't really the job for a character set.


Unicode already has 𝐛𝐨𝐥𝐝, 𝘪𝘵𝘢𝘭𝘪𝘤 and ˢᵘᵖᵉʳˢᶜʳⁱᵖᵗ variants of the Latin alphabet.


Maybe it’s just iOS, but the “bold” characters are serifed and the “superscript” one’s aren’t a consistent size between each other.


Unicode only defines the codepoints for characters, it doesn't require anyone to actually make them look good. Since those characters were specifically only included to represent mathematical texts where formatting needs to be preserved, it's unlikely anyone is spending much effort on making them look good as text.

Regarding serifs: there are actually two variants of both bold and italic (and bold italic), one serifed and one not. Wikipedia has a chart of the different options here: https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symb...


The bold and italic characters actually belong to the Mathematical Alphanumeric Symbols [0] block, so they're strictly meant for math notation rather than general formatting. The superscripts are part of the Spacing Modifier Letters [1] block, which is used for IPA. You'll also sometimes find other formatting quirks that are deprecated in Unicode and meant for compatibility purposes.

[0]: https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symb...

[1]: https://en.wikipedia.org/wiki/Spacing_Modifier_Letters



Yes, and I'm not convinced that it is the right thing to do.


I'd say if the formatting changes the meaning of the language, Unicode should support it. So if you are searching through text, any change to your query string that you would like to constrain the text that matches should be supported by Unicode. Unicode should at least support anything that affects the semantic equality of strings.


I'm thinking of Unicode as a character set, and that text exists on an abstraction level above characters.


I was thinking that it's analogous to ก็็็็็็็็็็็็็็็็็็็็ where the character dictates how the surrounding characters are rendered.


Hmmm, certainly Unicode ought to be able to represent mathematics as a script like any other. However, the complexity involved is non-trivial. To make things easier, whatever Unicode might do for math should have a mapping to and from TeX or MathAJAX. In any case, Unicode is rather complex as it is; I'm not sure I look forward to this extra level of complexity :(


Hi, I am the developer of MiniLatex. Overleaf looks like a fantastic tool.


> I like the idea of a 'sane' subset of LaTeX that is easy to publish to the web.

That's probably the only approach that really makes sense:

During the past decade I was surprised to learn that the writing of programs for TeX and Metafont proved to be much more difficult than all the other things I had done (like proving theorems or writing books). The creation of good software demand a significiantly higher standard of accuracy than those other things do, and it requires a longer attention span than other intellectual tasks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: