
MathML is a failed web standard - Mathnerd314
https://www.peterkrautzberger.org/0186/
======
anjbe
It’s instructive to see how different MathML is from other math markup
languages. Here’s the quadratic formula:

In troff, x = {-b +- sqrt { b sup 2 - 4ac}} over 2a

In TeX, x = {-b \pm \sqrt{b^2-4ac}} \over {2a}

In plain Unicode, 𝑥 = (−𝑏 ± √(𝑏² − 4𝑎𝑐))⁄2𝑎

In MathML,
<mrow><mi>x</mi><mi>=</mi><mfrac><mrow><mi>−</mi><mi>b</mi><mi>±</mi><msqrt><mrow><msup><mi>b</mi><mi>2</mi></msup><mi>−</mi><mi>4ac</mi></mrow></msqrt></mrow><mi>2a</mi></mfrac></mrow>

MathML is simply unreasonable to write by hand. Most of the time it’s only
ever used as an interchange format, automatically generated by tools.

Indeed, the only time I ever use it is with mandoc(1), the default manpage
formatter on BSD, Illumos, and some Linuxes, which converts equations to
MathML when converting manpages to HTML.

~~~
spicyj
Yeah. I think that's inevitable with an XML syntax. XML is good at some
things, but representing math expressions clearly is not one of them.

~~~
koolba
> Yeah. I think that's inevitable with an XML syntax. XML is good at some
> things, but representing math expressions clearly is not one of them.

What is XML good at? (by good I mean better than alternatives like JSON, YAML,
HAML, etc)

The only thing I that _might_ qualify is a long term/archival quality document
format like ODF/OOXML. The inherently embeddedable nature of XML does seem
like a nice fit but it gets very bloated very fast (deflated wrappers help
though).

~~~
flatline
XML grew out of SGML/HTML, and there's no denying that its age shows. It's
kind of like going into a house built in the 90s and seeing all the little
things that just scream "90s house - that looks so dated!"

Did you know that web browsers have native support for XML? Try fetching an
xml resource and pulling the responseXML value off the XHR - you have another
DOM object right in your hands, that you can treat just like a regular Node.

The reason that XML is better than the alternatives has nothing to do with the
syntax itself, but the tooling around it. Browser support, XQuery, XPath,
XSD/RelaxNG, XSLT - please tell me where I can find the equivalents for any
other markup language. You don't have to use them, either - but if you need
them, they are there in pretty much every framework. XML had the first to
market advantage, and was picked up in enterprise systems and made powerful
and ubiquitous. If you need batteries included, XML is right there, the others
are not. There is really nothing wrong with it, the syntax is clunky for some
applications but great for documents, whereas e.g. JSON would be horrible in
that scenario.

~~~
xaduha
> Did you know that web browsers have native support for XML?

Not only that, browsers have support for XSLT 1.0, so you can format it and
style it with CSS on the fly. Even mobile browsers support that, as far as I
can tell.

------
CaliforniaKarl
Blargh. It annoys me that MathML was even a thing. We've already had a
perfectly-fine and widely-used markup language for mathematical formulae; it's
called TeX.

Ideally, I'd prefer that HTML5 include a <math> tag, or something similar,
that takes TeX as input and produces formatted output. A side benefit would
mean that every browser, and every system, in the world would have TeX
installed!

At least MathJax can take TeX as input.

~~~
pcwalton
> A side benefit would mean that every browser, and every system, in the world
> would have TeX installed!

TeX doesn't have dynamic reflow, is defined by a macro system that involves
essentially monkey patching the parser as it goes, lacks anything resembling a
DOM, wasn't designed with Unicode or internationalization in mind, is
completely unparallelizable, doesn't interoperate with CSS, has its own
concept of block and inline layout that doesn't allow for text wrapping around
floats without hacks, requires multiple passes to compute various things like
tables of contents, and wasn't designed for interactive-level performance.
Something syntactically _like_ TeX may well be the solution, but jamming TeX
itself into the Web platform wouldn't end well.

~~~
avian
Considering amount of expertise that went into making TeX and the fact that
(in my experience at least) it far surpasses other solutions, it might simply
be that mathematical notation typesetting doesn't lend itself to dynamic
reflow, internationalization, parallelization, DOM representation,
interoperability with CSS and its concept of blocks and inline layouts,
wrapping around floats, single pass processing and interactive-level
performance.

~~~
hansen
This is about how I feel. Can someone who’s actually competent explain why
(La)TeX output looks so much better than what web browsers do? I’m not talking
about childish colors, sticky headers, and all the other annoying “modern web
features”, just the relevant content: some text and inline images.

~~~
ivan_ah
A few things that make LaTeX docs look good:

    
    
        - quality justified paragraphs making use of inter-word spacing and hyphenation
        - font ligatures (tex uses quality fonts have special characters for sequences like fi fi fl ffi ffl ft etc)
        - good heuristics about where to cut pages (e.g. flexible spacing around heading)

~~~
hansen
But that begs the question why web browsers don’t implement this. Is there a
technical reason or did at some point some people decided that decent text
rendering is irrelevant one should focus on other gimmicks?

~~~
TeMPOraL
The Web started as a tool for rendering text and occasional images pretty much
as soon as it comes out of the wire. TeX on the other hand, was meant to be a
typesetting package which _compiles_ textual description of the document into
a beautiful printable form. That compilation takes noticeable time even today,
I imagine it was unacceptably slow for the web back before. Also in TeX you
deal with paper sizes, in browsers you deal with adjustable viewports of
unpredictable size.

I guess that the Web just started simple and evolved from that, and then
nobody bothered to remake it for prettier text.

~~~
Mikhail_Edoshin
I remember an old publishing package, Ventura Publisher; the first time I saw
it was 1990 or 91 maybe and then it was at v3. It was a very versatile
publishing tool (all the perks of a professional typesetting package, tables,
equations, floating illustrations; lots).

What's interesting is that it could separate the content and the styling
information, so you could load the same content into two or more publications
with different styles and get two different typesetting from the same source
(e.g. one column small page for a book and two or three column large page for
an article in a journal). And it was very easy to change the format or paper
size: just change it and it would reflow the content accordingly. It didn't
even took much time, as far as I can remember, and I'm talking about IBM PC
286 here.

So, truly, it's not a technical obstacle; web could have a much better
typesetting engine working at high speed (and I'd say that it would be much
better to imitate paged media instead of scrolling; scrolling is really
inconvenient for reading). It's just that it grows wildly and in all
directions at once; it tries to be a "semantic" storage, a rendering medium,
and an application engine at the same time, and is not particularly good at
any of this.

------
dginev
I don't know how many of the people lamenting MathML ever came to exist are
active math-on-the-web developers. I for one am dealing with third-party math
equations on a daily basis and can only dream of the wonder of ubiquitous
MathML support in all major browsers.

The details as to why are here:
[http://prodg.org/blog/mathml_please/2015-09-16/MathML%20on%2...](http://prodg.org/blog/mathml_please/2015-09-16/MathML%20on%20the%20Web%20%E2%80%93%20Please)!

As things stand, math handling on the web is inconvenient at best and a
nightmare at worst.

Some quick points about the comments here:

\- MathML as a spec is akin to SVG, it's meant for the browser/machine first,
and not for direct human consumption. You don't read/write SVG by hand, and
neither you should MathML.

\- Calling MathML a "failed web standard" is fine by me, as long as you
continue the remark with "the Chrome and IE browsers have failed
mathematicians" and "the math-on-the-web developers have failed as a
community".

~~~
analyst74
I am with you.

Some people write their math by hand in LaTeX or other formats, but many use
visual tools and draw math similar to drawing diagrams. Then those tools
generate whatever source code necessary.

In those scenarios, MathML would be much easier to deal with than most other
formats. Also, Presentation MathML support is much easier to implement than
full-scale support of LaTeX or other more complete math solutions including
Content MathML.

~~~
thanatropism
> Some people write their math by hand in LaTeX or other formats, but many use
> visual tools and draw math similar to drawing diagrams.

Which people?

Everyone I know who uses equations in MS Word abandoned the clicky tools as
soon as the pseudolatex entry language became available.

~~~
pflats
>Which people?

Currently, high schools students taking the PARCC, SmarterBalanced, and other
web-based standardized tests.

And whoa man is it terrible to watch them struggle with it.

------
PaulTopping
Most of you guys seem to be suffering under a big misconception. MathML was
never intended to be typed by humans. Perhaps a few programmers might need to
do it but not mathematicians, students, engineers, scientists, etc. Saying
things like "I hate typing mathematics in MathML" is much like saying "I hate
typing my word processing documents in RTF". Just say no. Like most XML
languages, MathML is a computer representation meant to be read/written by
software. If you want to enter math notation, use TeX or an interactive math
editor like MathType (my company's product).

------
satbyy
One big advantage of MathML is the ability to copy-paste from webpages
directly into software like Mathematica. I don't think there is currently any
alternative to MathML in that respect. The author argues (rightly) about
styling and presentation of MathML, but doesn't propose an alternative to this
copy-paste problem. May be TeX, when you're using MathJax at least ?

~~~
elcapitan
Wikipedia seems to be using a mechanism where TeX code is rendered into
images, but on marking and copying into an editor becomes TeX again, wouldn't
that be a sufficient solution for copying?

~~~
TeMPOraL
Isn't it the default behaviour when you put alt-text in the image tag?

~~~
elcapitan
That makes sense, I just mentioned it as an example of how the result is easy
to achieve.

------
therealmarv
Many standards have failed for the web. Anybody remember VRML? Actually I like
it that we have not a broad big big standard which includes MathML which only
~0,1% of people care about in the browser. MathML can stand on its own
(encapsulated with JS like PDFs in Firefox) or you can abandon it entirely.
You are free to choose. It's the best way for everybody.

~~~
pavlov
_Actually I like it that we have not a broad big big standard which includes
MathML which only ~0,1% of people care about in the browser._

Did you read the article? That's exactly what we have right now. The author's
#1 complaint is that MathML is currently a part of HTML5.

He would like it to be removed from the HTML standard so that MathML could
evolve on its own, rather than being constrained by its status as a web
standard that's not actually implemented by browsers.

~~~
therealmarv
Maybe in W3C specification MathML is a part of HTML5 and browsers will not
crash when there is MathML inside (actually they also do not crash with custom
tags) but many libs are not supporting the MathML tags or respecting them or
working with them. Just one quick example... JQuery and open this in Chrome
[http://sdiehl.github.io/jquery-mathml/](http://sdiehl.github.io/jquery-
mathml/) So for the Chrome and Safari guys MathML is already dead and it even
has flaws in Firefox. But what is written in W3C documents is not always the
truth out there or common in the WWW ;) So yes I agree... rip MathML out of
HTML5 specification.

------
jwmerrill
Since we're discussing various math rendering technologies for the web, I'd
like to plug MathQuill [1]. The big advantage MathQuill has over other
solutions is that the math is live editable in the browser in typeset form. We
use it for input on the Desmos graphing calculator [2], and MathQuill is
definitely the best WYSIWYG math editor that I have used anywhere, on the web
or elsewhere.

[1] [http://mathquill.com/](http://mathquill.com/)

[2] [https://www.desmos.com/calculator](https://www.desmos.com/calculator)

~~~
bglazer
Desmos's MathQuill interface is great. By far the easiest to use math input
I've encountered. I haven't tried to input anything really exotic, but it's
extremely intuitive for the simple to intermediate level of complexity.

Thanks for your work on Desmos, it's a pleasure to use.

------
spicyj
"We need to get together with CSSWG/Houdini TF/etc to work out solutions that
help those developers who actually solve the problem of math on the web."

Unfortunately, Houdini seems to have deprioritized font metrics info currently
so this might not come soon. See this issue for more details:

[https://github.com/w3c/css-houdini-
drafts/issues/135](https://github.com/w3c/css-houdini-drafts/issues/135)

(However, for math we often use webfonts anyway so it is practical to send
down glyph metrics with the fonts. That's what both MathJax and KaTeX do.)

------
PaulTopping
One of my biggest complaints about Peter K's "MathML is a failed web standard"
is that most people read that without taking note of "web" in the middle.
MathML has been quite successful in publishing and as a computer
representation. As far as whether or not MathML is a failed WEB standard, I
have a lot more to say about that here:
[http://bit.ly/1ZLfCF8](http://bit.ly/1ZLfCF8).

~~~
macintux
Seeing WEB in a comment thread with extensive references to Knuth is amusing.

------
spankalee
I think the way forward here with math and other niche document formats (like
MusicXML) is to keep them out of the HTML standards and let people create web
component libraries that handle rendering.

As a practical matter this will bring implementations to all browsers much
faster than waiting for four large organizations to decide that a format is
important and allocate resources. Even better is the real-world experience
those libraries will gather so that maybe we get something better than what a
committee would have designed in isolation.

For mat specifically, TeX really seems like a better way to go for now. I
wrapped KaTeX in a web component so that it's easy to use in HTML:
[https://github.com/justinfagnani/katex-
elements](https://github.com/justinfagnani/katex-elements)

You use it like:

    
    
        <katex-inline>c = \pm\sqrt{a^2 + b^2}</katex-inline>
    

I imagine the equivalent for music will be better than MusicXML

------
intrasight
MathJax looks excellent.

~~~
spicyj
MathJax is very flexible and its output is certainly high-quality. However, it
is difficult to configure and package in my experience, and it's pretty slow.

We used it for a few years at Khan Academy before building KaTeX
([https://khan.github.io/KaTeX/](https://khan.github.io/KaTeX/)) which is
around 50x faster to render in our experience, not even counting download
size. We ran an A/B test for MathJax vs. KaTeX on the Khan Academy site and
people completed measurably more exercises with the latter, even though they
look identical. (KaTeX doesn't support everything MathJax does though – we
still fall back to MathJax for a small fraction of our content.)

~~~
WhitneyLand
Why do you open the KaTeX source, but not the source for your mobile apps?

~~~
spicyj
KaTeX is more likely to be useful out of the box to other developers, whereas
the source of the mobile apps is not going to be a drop-in component that's
useful; it'd have to be used more as a learning opportunity by interested
devs. That's fine, but it's also a different market. In practical terms, the
KA mobile apps also are much more KA-specific and need to be cleaned up before
open-sourcing, and there's a worry that people would take the entire app and
repackage it as their own (this happens all the time to open-sourced mobile
apps, unfortunately). That said, open-sourcing the mobile apps is still
something we want to do, so it may happen someday.

~~~
WhitneyLand
Sorry to hijack this thread and I appreciate your response.

Please keep in mind I don't ask about open sourcing the mobile apps to help
other developers, rather, it's to help KA. There are lots of contributions and
improvements that would be made by the community if given a chance, which
would benefit everyone. Contributions would have to be managed yes, but that's
something that done all the time for other projects.

~~~
spicyj
We've open-sourced several other mostly-KA-specific parts of the site,
including these:

[https://github.com/Khan/khan-exercises](https://github.com/Khan/khan-
exercises)

[https://github.com/Khan/perseus](https://github.com/Khan/perseus)

[https://github.com/Khan/live-editor](https://github.com/Khan/live-editor)

Contributions are few and far between; good contributions are even more so. It
hasn't been worth the time investment, unfortunately.

------
mccourt
"... MathML is effectively preventing mathematics from aligning with today’s
and tomorrow’s web."

I'm probably one of the few that still likes to print things out or buy a
physical copy of a book, but I also like reading/producing math for the
internet and I often wish there were a better standard out there. I really
appreciate how you've laid out several points that demonstrate the situation
clearly.

------
ga6840
I am building a project and doing research on math-aware search (my project is
hosted on [https://github.com/t-k-/the-day-after-
tomorrow](https://github.com/t-k-/the-day-after-tomorrow)) As for the search
engine for math, it is a pity that MathML has become a standard "input" for
mainstream research. The most famous conference on Math search: NTCIR, is
actually publishing its main dataset/corpus in MathML.

Converting MathML back into LaTeX is possible but error-prone for most
moderate-complex expressions (I tried it using haskell pandoc). This makes
math-aware search engines have to include a MathML parser. And the most
popular digital math document are still mostly written in LaTeX, math search
engine thus needs another tool (e.g. LaTeXML) to convert LaTeX to much more
lengthy MathML stuffs. As a researcher in this field, all I see is MathML
brings a lot overhead to our life.

I think LaTeX is still the ideal way to "input" math expression, it is human-
friendly and most commonly used math input. While WEB standard should focus on
"rendering" LaTeX. I have to point it out that I am pretty comfortable about
what MathJax provides, but if there needs to be a WEB standard on math, I wish
some day the standard way to write math expression in HTML is something like
this: <math> x = \frac{-b \pm \sqrt{b^2 - 4ac}} {2a} </math>

Instead of: <math display="block"> <mrow> <mi>x</mi> <mo>=</mo> <mfrac> <mrow>
<mo>−</mo> <mi>b</mi> <mo>±</mo> <msqrt> <mrow> <msup> <mi>b</mi> <mn>2</mn>
</msup> <mo>−</mo> <mn>4</mn> <mi>a</mi> <mi>c</mi> </mrow> </msqrt> </mrow>
<mrow> <mn>2</mn> <mi>a</mi> </mrow> </mfrac> </mrow> </math>

~~~
dginev
I am the person behind generating the original NTCIR math datasets, and
probably most of the research-produced MathML out there. We've recently
presented that we have more than 350 million formulas from arXiv converted
over to MathML, together with the rest of the papers as HTML5.

As someone who has stared at arXiv TeX/LaTeX for years, I can testify you
don't want to be looking at TeX math in actual latex documents, there is a lot
more that goes in there beyond the toy formula syntax used on the web.

As also someone who has worked on math search engines and math-rich NLP for a
few years, complaining that you have a structured machine-parseable
representation for mathematics and wanting TeX instead sounds naive. On one
hand, the MathML formulas in the datasets already could preserve the source
TeX (the TeX annotations may even be there, I can't remember right now),
should you need it directly. On the other hand, you can use any structured
methods, such as the ones used by content-based search engines such as
MathWebSearch, or handpick any relevant information from the MathML tree to
feed it back into a statistical algorithm, as done for example by the WebMIAS
search engine.

The most fundamental bit to understand if you're doing research on automated
processing of human mathematics is that formulas are two dimensional objects
best represented as trees, be they layout trees describing the presentation,
or operator trees describing the content, or some other hybrid tree that tries
doing both (such as LaTeXML's XMath spec).

~~~
ga6840
1\. In NTCIR (main) dataset, I see many cases where <m:math> does not contain
an altext (and thus no TeX). I asked LaTeXML author Bruce Miller
<bruce.miller@nist.gov> about this, he said LaTeXML will always put the same
TeX string as an altext attribute on the <m:math>. So I assume you guys are
using some out-dated LaTeXML version? I really want to plead NTCIR to ensure
the original LaTeX annotation is kept in main dataset, or please provide both
MathML and LaTeX version corpus for researcher to freely choose. This will
allow LaTeX-only math search engines being able to compare results with other
MathML search engines. You know it is hard to convert all of them back into
LaTeX correctly.

2\. I wish NTCIR corpus is not that difficult to download (I once wrote a
request for NTCIR corpus, but no one replies), please make it public
accessible just like what MIaS does:
[https://mir.fi.muni.cz/mias/](https://mir.fi.muni.cz/mias/)

3\. My search engine
([http://tkhost.github.io/opmes](http://tkhost.github.io/opmes)) is actually
using structural method, but I still give up MathML and go parsing TeX
directly instead. Why? In TeX I can just omit irrelevant command like "\color"
and "\mbox", and only focus on a handful math-related TeX subset, and the
result is great. Although my search engine can just handle "toy formula
syntax", but maybe it is better than MathWebSearch
([https://zbmath.org/formulae/](https://zbmath.org/formulae/)) and even beat
Tangent
([http://saskatoon.cs.rit.edu/tangent/random](http://saskatoon.cs.rit.edu/tangent/random))
in long query. But in MathML, I have no idea why I need to read its lengthy
spec, and I see no reason to write a MathML parser.

NTCIR-math conference (and its none-friendly website) makes me unwilling to
submit a single paper.

~~~
dginev
1\. Correct, the dataset was generated back in 2013 and will probably be
regenerated for the next NTCIR issue.

2\. There are annoying copyright issues with making the datasets available for
public use. We're working with arXiv to resolve that, it's out of our control
for now. It's a long-lasting frustration of mine that the datasets can't be
simply made public.

3.You can omit anything you like from the MathML, there is no inferiority to
omitting from TeX. "but maybe it is better than MWS" \- prove it, submit to
NTCIR, and beat everyone. Also, being better than MWS is not an argument that
MWS should be denied the very data it needs to run. At the same time you can
still obtain whatever degradation you need from the presentation MathML.
Failing to recognize any claim to correctness than your own without any
substantive proof is not a reasonable position and I urge you to reconsider.

"I have no idea why I need to read its lengthy spec, and I see no reason to
write a MathML parser."

You don't need to write a parser, you can use an off-the-shelf parser for
XML/HTML5 and handle the MathML reliably and appropriately. In fact you can
reuse that from any open source search engine for math, MWS included. Writing
a TeX parser on the other hand is something I will always roll my eyes at,
since actual real world TeX is not something you can "parse", or do anything
with reliably, unless you have a full TeX implementation underneath. Which is
1000x harder than using a parser to deal with MathML.

Finally, whining about NTCIR's UI being imperfect as a reason not to submit is
just childish.

~~~
ga6840
Thank you for informing me on my first two questions, so now I understand
NTCIR's problem.

At very first I tried to compare my results (MAP, recall, precision) with
participants in NTCIR, but I take a lot efforts to get dataset, after which I
find I cannot convert MathML back into TeX very confidently, most importantly,
my parser-generated tree structure is fine-tuned and very dependent on TeX
input, I cannot just take MathML tree structure directly, I need much more
efforts than just importing an existing XML parser. Because of these, I can
not compare my results with mainstream NTCIR researchers. But I definitely
tried very hard, sadly I give up. If NTCIR someday can provide (even if
request is needed) TeX data for competition, I will consider to (and able to,
willing to) compare my results with NTCIR participants (in order to "prove"
it).

Writing a TeX parser only for math search is not that difficult, I have
written it, it parses most user-created document on math.stackexchange.com.
Although I cannot convince you I get better results, I can argue parsing
search-interested TeX subset is effortless (if you only care math-related
TeX), I even opensourced my search engine TeX parser. Again, problem is not
that easy to grab a XML parser and reuse it in my project, I believe a good
math-aware search engine needs to get a tree structure very different from
that a MathML structure represents, you get a tree by reusing MWS praser, so
WHAT? That tree is not the tree I want, I need a lot effort to convert it, the
easy way for me is to convert MathML back into TeX (Since I have already done
that from TeX), sadly it turns out to be too complicated to worth giving a
shot.

~~~
ga6840
Lastly, I am more than childish to complain NTCIR and refuse submit a paper, I
give up putting unworthy and duplicated effort on implementing a MathML parser
_that generates the expression tree I need (this step is the most difficult,
rather than just parsing XML)_ , instead, focusing on finding another
conference to publish my efforts, it turns out my paper (a demo) get accepted
in ECIR 2016, so glad I did not waste too much time on NTCIR, otherwise I
would have missed ECIR.

------
_pmf_
> MathML is a failed web standard

Well, the line's over there.

------
jjuhyun007
any recommended toolbars available for katex or other equation output tools
that can be simple to integrate into a site?

------
StreamBright
I am not sure if the verbosity of XML brings anything to the table here. Why
not just implement a TeX based extension?

~~~
WhitneyLand
Because requiring an extension defeats the purpose of being able to run across
all browsers.

~~~
StreamBright
Sorry I was thinking about something like MathML but more like TeXML without
an extension that you need to install. Might not be feasible to implement
though.

------
cant_kant
What is wrong with using TeX ?

MathML is reinventing the wheel. Only, the MathML wheel is square, flat and
breaks when used.

~~~
wodenokoto
pcwalton has a good response for that above:

[https://news.ycombinator.com/item?id=11445369](https://news.ycombinator.com/item?id=11445369)

