Hacker News new | past | comments | ask | show | jobs | submit login
Math on GitHub: Following Up (nschloe.github.io)
123 points by nschloe 50 days ago | hide | past | favorite | 44 comments

Just spotted this on the GitHub blog, posted 33 minutes ago: https://github.blog/changelog/2022-06-28-fenced-block-syntax...

> Users can now delineate mathematical expressions using ```math fenced code block syntax in addition to the already supported delimiters.

I really don't like overloading ```X as a way to render the output of running the code, because now you have no way to display a block of raw mermaid or math code which is the whole point of ``` in the first place. I would be much happier if they introduced a syntax variation to enable this feature more generally instead of overloading the purpose of backticks, something like ```!X (```!math or ```!mermaid) for "run the contents as X and render the output in the document".

KeenWrite[1] uses a "diagram-" prefix for text-based diagrams[2]. What's needed is a standards body for Markdown[3]. GitHub should not be imposing de facto standards.

[1]: https://github.com/DaveJarvis/keenwrite

[2]: https://kroki.io/

[3]: https://talk.commonmark.org/

I was confused about your comment at first - I was sure it was already a feature where you could use ```X to render the contents with a language. E.g. you could go ```java or ```javascript and get syntax highlighting for that language.

What you're saying is since ```mermaid now renders a mermaid diagram, there isn't a way to display a block of raw mermaid code with syntax highlighting.

That's a very valid point.

They could add a ```mermaid-source tag.

I agree my comment was difficult to understand, so I edited it to hopefully make it better. Thank you!

The simple fix for this is that you should use "```math render" or some such, the language is specified as the first word with everything after the space up to the newline currently ignored...

> everything after the space up to the newline currently ignored

Oh I didn't know that, that's perfect!

The big advantage of using code blocks is that is guarantees that their sanitizer doesn't mess with the contents. This is a pretty big plus.

Also, you can of course still have mermaid code blocks; just don't put `mermaid` after the three backticks. The code block won't have any syntax highlighting, but in the case of `math`, this wouldn't make sense anyway.

I wonder if the initial choice to $ and $$ syntax was to keep the syntax identical to VSCode's built-in KaTeX rendering for Markdown preview? You'd think that they could just port over the VSCode implementation directly which already works quite well without most of the inconsistency issues in the website's MathJax counterpart, especially considering that VSCode is their own Electron product too.

This is great news! It's not working for me quite yet, but I'm refreshing my browser every 5 minutes. Next stop: Proper inline math.

This sounds like a solved problem in theory, but someone has to fix the parser.

The commonmark spec defines left- and right-flanking delimiters, and provides a reference implementation that can get this right for backticks to represent inline code. Doing the same with dollar signs - and treating the content delimited by them, like the content delimited by backticks, as not to be processed further as markdown should handle most use cases.

Admittedly, the phrase "an apple costs $1 but an orange is 2$ for some reason" would be incorrectly recognised as math, but only because you're switching how you place the dollar signs halfway through.

Not recognising $[a+b](c+d)$ because the brackets are also link syntax sounds like the parsing operations are done in the wrong order. The $ comes first, so it should win.

I've used jupyter-book to typeset math-in-markdown before now, and it gets this just right, showing that it's a solved problem in practice too unless I'm missing something.

I agree: If you're choosing dollar signs for your math delimiters, you can't expect them to work as regular dollar signs anymore; just like backticks.

> sounds like the parsing operations are done in the wrong order.


> I've used jupyter-book to typeset math-in-markdown before now,

They're using $-math as well?

The `$...$' and `$$...$$' delimiters are used in TeX. LaTeX uses the more superior delimiter as `\(...\)' and `\[...\]'.

All the MathJax and KaTeX related markdown for math should use the LaTeX delimiters and avoid the TeX delimiters.

For more info see https://docs.mathjax.org/en/v2.5-latest/tex.html#tex-and-lat...

\(…\) and \[…\] would not be without problems in Markdown either, as backslash is used for escaping, and there are situations where square brackets and parentheses require escaping.

As a simple example, if you want to write something in literal square brackets, you can normally just write […], but if there’s a link target with a matching name or you’re in an environment that may introduce a link target with a matching name (e.g. rustdoc will see [Foo] and try to find an item to link it to), you may choose to write \[…\] instead. You only need to escape one of the square brackets, but if you’ve escaped both.

I understand this is useful for people who doesn't type math regularly, but as someone who write a huge amount of math, `$...$` is so much less friction than the LaTeX one `\(...\)`.

Even in LaTeX, mix and match. `$...$` for normal mode, and `\[...\]` for display math, and it works well for two reasons. Inline math are generally short, less prone to mistakes, so `$` saves a lot of time.

A GitHub bug I recently noticed that seems related:

Expected: When a repo's readme is named `README` (without the `.md` suffix), it is rendered as plain text. When a repo's readme is named `README.md`, it is rendered as Markdown.

Actual: When a repo's readme is named `README` (without the `.md` suffix), the presence of `$` causes parts of the file to be rendered as math. For a real-life example, see the readme in https://github.com/idianal/personal-site.

Can someone please point me where I can submit a bug report/issue for this?

There's a second bug related to this. If you click on any file in your repo, then click the browser's back button, instead of parts of your README rendered as math, you get a red box with 'Unable to render expression'. Neat.

They make it hard to find by pointing you to community support, but you can open a support (or bug) ticket here: https://support.github.com/request

In May, GitHub added native math support. Unfortunately, it left lots to be desired. Six weeks later, I'm taking another look. Have the major issues been fixed? (Spoiler: No.)

I have been using gist for my Spivak calculus problems. Often I write down on a notebook and then transcribe it to the $<latex>$ format. Along with the other markdown features it has been enough.

Plain latex has a steep curve and not friendly for mobile typing. Plain text is hard to read and hinders the actual math understanding. Therefore, i use gist markdown.

A few weeks ago I discovered Franklin.jl ([0], [1]), a static site generator that lets you render math easily, and I chose it because the alternatives either took too long to appear or are slow or take too many steps to set up. It has direct KaTeX support and I've been pleased with the results. There is no need for adding or tweaking things unlike Jekyll or Hugo. And KaTeX is faster than MathJax in general.

[0] https://0x0f0f0f.github.io/blog/newblog/

[1] https://franklinjl.org/

this feature is super garbage: `this costs 5$ but that is 8$, what do you think` is now math expression with all kinds of broken rendering

i had to go and fix like 20 markdown files

how can someone thing this is good syntax is beyond me.

Well, it _is_ (La)TeX syntax. With the exception that there, if you want a dollar sign, you have to type \$. But yeah, the syntax isn't made for Markdown.

I think the problem is the change. You didn't need \$ before, and now people's files are broken.

GitHub could have run their parser on a sample of existing Markdown file and check that the output doesn't change. They have such a huge corpus, with author date info, it is so easy to test that files authored before the feature don't trigger the feature.

I agree with backticks/code block being the solution. They also do for mermaid, so this would be consistent

> GitHub’s current choice of syntax goes against the Markdown grain.

I don't see that at all. It's pretty common. Far more common IME than all other choices combined. The problems are caused by parsing the markdown first, then slapping on math support to what's left, and of course that doesn't work.

This really is an easy problem to solve (and has been for ages). Handle the math sections first and make math support opt-in. The Gitlab syntax is great unless you want to do something other than have Gitlab render your markdown files. That's not much of a solution.

> Handle the math sections first and make math support opt-in.

Unless I'm misunderstanding what you mean, this isn't possible in a backwards-compatible manner: It's perfectly reasonable for pre-existing Markdown to have content like "Bob pays $1 for bananas, Susan pays $2," which would mis-render a normal sentence as if it had inline math content.

The blog post author's proposal is the most reasonable one: it's both backwards-compatible (the triple-tick "math" group was not already defined or, if it was, a new unique identifier could be used) and doesn't produce any ambiguities with other parts of the Markdown grammar or non-semantic text. Finally, it avoids parser composition, which is a source of all kinds of nasty differential bugs.

> which would mis-render a normal sentence as if it had inline math content

Not if it's opt-in. If you opt in to math support, you use \$, but that's a choice you make. Also, you can force an opt-in to use dollar signs vs \[ and \(.

> The blog post author's proposal is the most reasonable one

I gave the problem with that. It's not standard syntax, so markdown parsers like Pandoc stop working. And technically, that doesn't even fix anything, because someone could have been using that syntax already. It might not be common, but it isn't a foolproof fix.

> Not if it's opt-in. If you opt in to math support, you use \$, but that's a choice you make. Also, you can force an opt-in to use dollar signs vs \[ and \(.

How would you make it opt-in? With a pragma at the beginning of the markdown file? I don't mean that as a rhetorical question; good UX for these kinds of things is hard.

> I gave the problem with that. It's not standard syntax, so markdown parsers like Pandoc stop working. And technically, that doesn't even fix anything, because someone could have been using that syntax already. It might not be common, but it isn't a foolproof fix.

I think the cat is already well out of the bag with that: GitHub has supported triple-tick with the "geojson" and "topojson" tags for quite some time, choosing to render them as maps[1].

Renderers like Pandoc will handle the above gracefully by degrading to a code block, which is perfectly reasonable until they support it directly. In any case, there's precedent for it.

Edit: And it looks like they do support "```math" anyways[2]. It's just not well documented, and doesn't have an inline counterpart.

[1]: https://docs.github.com/en/get-started/writing-on-github/wor...

[2]: https://docs.github.com/en/get-started/writing-on-github/wor...

The “opt-in” would make it backwards-compatible, since any preexisting files would have math support disabled.

I wonder how pandoc[1] does this. You can convert Markdown to PDF (`pandoc test.md -o test.pdf`), it uses the same syntax as GitHub ($ signs only, no backticks) and it fares a lot better than GitHub in a few of the tests outlined in the article[2]. It's not perfect but clearly something better can be done.

[1]: https://pandoc.org/

[2]: https://nsood.in/hn-latex/test.pdf

Probably pandoc protects whatever is inside $...$ or $$...$$. GitHub doesn't. I would be curious to know how pandoc handles the other failing cases.

There are many approaches to this problem. E.g., Jupyter notebooks implement one that has matured in the wild over a decade. There's this very flexible markdown-it plugin that implements anther https://github.com/goessner/markdown-it-texmath, and my version of it here https://github.com/sagemathinc/cocalc/blob/master/src/packag... which I rewrote in typescript with a focus on the same semantics as Jupyter has, but for CoCalc. I've also been working on using unifiedjs to provide more general latex for Markdown (not just formulas) here https://github.com/sagemathinc/cocalc/pull/5982

Parsing math in markdown is easier if you use a plugin to an existing markdown parser, rather than trying to do some hack outside of that (which is what Github probably does, and also what Jupyter does).

My 2c:

Anki got it right with

    [$] ... [/$]

    [$$] ... [/$$]
I often change the mathjax defaults to these in my own documents. Never ever had a clash.

Worst case scenario? If you need to type a literal "[$]", insert an empty span or something.

No need to mess with "backticks vs no backticks" semantics at all.

Markdown is just generally awful because it’s not designed to be extensible. And so people make a total hash of things like this when trying to add custom inline syntax (because it’s not possible to do it compatibly), and abuse preformatted code blocks to do something other than show code. (Seriously, if you make ```mermaid … ``` turn it into a diagram, how am I supposed to show syntax-highlighted Mermaid code? Or ```math … ```, same deal. In this regard, I actually prefer the $$ … $$ GitHub have used, for all its problems.) Alternatives like reStructuredText and AsciiDoc are just worlds ahead in sanity. (Markdown’s HTML foundations don’t help, either.)

Markdown is a complete dead end.

I’ve been making a lightweight markup language of my own, and I thought long and hard about this kind of thing, with the goal of making something extremely consistent and easily parseable by human and machine alike. (All popular LMLs are surprisingly hard to parse correctly, so that text editors never have fully correct syntax highlighting unless they use something like LSP-backed highlighting with the real parser.) There’s a common problem with syntax extensions needing semantic understanding before you can actually parse their bodies. I’m using two different syntaxes for the bodies of what I’m calling macros (here shown without arguments to the macros, partly because I’m still not entirely satisfied with any of the syntaxes I’ve tried for them):

In the case of an interpreted macro body, it will be parsed fully (supporting both block and inline formatting) and fed to the macro so; in the case of a raw body, it will will be fed to the macro uninterpreted, just like with the `…` monospace code syntax. (If you wanted to pass a monospace code element as the body, that’d be @macro-name{`…`}. In the general context, the monospaced code syntax `…` is basically just shorthand for @code`…`, like **bold** can be shorthand for @bold{bold}.) This would lead to the shortest possible syntax for mathematics being @m`…`, which I think is acceptable, and much more syntactically robust. If you needed backticks inside the body, you’d currently have to use @m{…} syntax, backslash-escaping any special syntax, because I haven’t come up with any satisfactory other syntax (allowing delimiter repetition, like @m```…```, doesn’t solve all cases as you can’t use the delimiter at the start or end of the value, a problem that most LMLs that go this way seem to ignore, e.g. I think there are some things that you genuinely can’t express in reStructuredText because of this, and others have awful syntactic hacks like backslash space being special; I’m contemplating @m#`…`# and @m#{…}# with arbitrary but matching number of hashes, like Rust’s raw strings, but it’s still not as neat, so I could end up just leaving it at “use an interpreted body and escape everything”). All up, I think this raw/interpreted body distinction should work pretty well, and is sound.

Markdown is a dead end, and that's why I'm using it. There's only so much syntax you can put into plain text and remain readable more or less as text, as opposed to code.

I, too, would prefer some other language for complex expressions, although that's to keep Markdown simple rather than to obtain power.

I never want to decypher latex in READMEs I read, unless it's a readme for a latex library.

I’ve seen all kinds of awful README.md files because people are trying to shoehorn stuff into Markdown that doesn’t fit.

The problem with Markdown is that it isn’t simple; rather, it’s simplistic, and quite a few things that should be simple instead require terrible hacks to work around Markdown and use HTML.

reStructuredText is generally quite a bit better in this regard because it focuses on expressing semantics.

An interesting take, thanks for the input! As a layman (pretty much), I had always a bit frowned upon reST since I never got used to syntax. That was probably because Markdown was already so popular when I started using it.

One little remark: You can still have highlighted math code blocks in gh's Markdown. The lang here is ```latex. Anyway, I see this might not be satisfactory.

reStrutucturedText is still useful to look at for inspiration here. It had the concepts of extensible metadata ("field lists"), spans ("interpreted text"), and blocks ("directives"). Including things like applying metadata to spans (using essentially Footnotes to provide field lists to interpreted text sections, like but better than Markdown's reference style for hyperlinks which almost no one uses but were much more common in rST).

I still sometimes wonder if reStructuredText had better acceptance outside of just the Python community if it might have had a better run for "default" versus Markdown's quirkier approach.


I’m very deeply familiar with reStructuredText, having bent both it and Sphinx to my will in great detail a decade ago.

A decade ago, I thought that maybe if reStructuredText had had a non-Python implementation (most significantly at the time, PHP and JavaScript) it would have conquered instead of Markdown, given how clearly superior it is.

Now, I’ve decided that the fact that reStructuredText wasn’t ported is actually bound up in the reason why it didn’t conquer: it’s too fancy, too complex. Markdown is an awful kludge that was built on regular expressions, duct tape and HTML and knows it’s unsound. It’s very strongly web-native. reStructuredText is beautiful elegance and perfection of form, but aggressively medium-neutral, painfully so when you only care about the web, and difficult to implement, and unforgiving of content errors.

So I still wish reStructuredText had won, but I can understand more clearly now why it didn’t, and in fact never really stood a chance.

It may be worth looking at org-mode syntax. Yes, org-mode is part of Emacs, but the syntax of the file seems to meet some of your requirements, namely parseable by human and machine alike.

Is it possible to do all kinds of mathematical research upto ABC-hypothesis using GH only for publishing and maybe getting some help from random folks who also can read math?

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact