Hacker News new | past | comments | ask | show | jobs | submit login
UnicodeMath – A Nearly Plain-Text Encoding of Mathematics (2016) [pdf] (unicode.org)
101 points by jcagalawan 4 months ago | hide | past | web | favorite | 56 comments

This is actually the method used by MS Word as far as I know, or a reduced dialect of it. It's a mix of convenient and inconvenient, can't make my mind up about it.

All I know is that LaTEX is much, much easier to manage than this jumbled mess...

It's the same one as implemented in MS Word (and PowerPoint, OneNote), this standard was written by an engineer who is on the Math in Office team and maintains a blog [1]. It's the best WYSIWYG equation editor I've used so far. LaTeX has its place, but it's hard to convince non-coders to use it and I'm glad it has the editor. I stumbled into this specification trying to format a constrained optimization problem and found the equation array format to be fairly straightforward similar to the bmatrix environment in LaTeX and thought it was neat that Unicode could represent so much math.

[1] https://blogs.msdn.microsoft.com/murrays/

It should also be noted that Murray Sargent didn't just do whatever he thought of. He has done at least one math layout engine before, consulted with Donald Knuth regarding TeX and some choices that were made back then and a lot of the differences between this and TeX have a fairly sensible reason.

For me, I'm able to use both LaTeX and this, but the main points that make this more convenient to me is that it relies less on characters that are present in awkward locations in many keyboard layouts (such as {}) and the equivalent formula is often much shorter due to somewhat more advanced tokenizing (of course, specific to math input, as the format doesn't have the same constraints as a general-purpose programming and markup language), e.g. x^12 works for x to the power of 12 instead of resulting in x^1 2.

The author of this paper, Murray Sargent, has a blog at https://blogs.msdn.microsoft.com/murrays/

> All I know is that LaTEX is much, much easier to manage than this jumbled mess...

LaTeX is beautiful and powerful, but mixed with unicode (in the right doses) it becomes astonishingly more powerful. For example, you can keep the block constructors {} _ and ^, but use unicode symbols such as α, ∂, ∫, instead of \alpha, \partial, \int

But you can't type α, ∂, or ∫ in a standard keyboard.

I feel like this is the sweet spot for the editor to do some of the heavy lifting, for example being able to type "\alpha" and have the editor automatically transform it into the symbol α. I write a lot of LaTeX, and I would absolutely love an editor which could just transparently deal with all of the underlying mathematics markup so I would never have to see it again, and just directly edit the damn equations. Serialising and unserialising mathematics markup mentally is basically the antithesis of mathematical notation.

In case you haven't already heard of it, LyX [1] does exactly that and it's phenominal. As an added bonus, when you move your cursor out of the equation it goes from a graphical almost-right visualisation to a proper LaTeX-generated absolutely-right preview, and manages other related LaTeX things like heading levels with a live clickable outline.

The big disadvantage to LyX is that it's not LaTeX editor but a LaTeX generator. So if you want to write your next article or dissertation and can commit to it then it's great, but if you want to import work so far or collaborate with someone not using LyX then it doesn't really work (it has an import function but it's almost unusable).

[1] http://www.lyx.org/

I've actually had good luck round-tripping lyx-latex-lyx repeatedly to collaborate with lyx-phobic co-authors. (As long as they don't mind equations they typed getting auto-formatted, etc.)

The julia interpreter does this. Or you have to press tab to do the transformation.

Emacs already has you covered. Good old C-x RET C-\ TeX will switch your input method to TeX, converting \alpha etc to unicode alpha, etc for most math symbols. Additionally, ^[char] or _[char] will become superscript/subscript unicode chars.

Once you've done that once in the session, you can just hit C-\ to toggle it on and off (C-x RET C-\ [input method] actually lets you select from a wide variety of input methods, and C-\ toggles on and off the most recently selected input method).

There are some editor plugins that do something like this. I know for Vim there is tex-conceal (https://github.com/KeitaNakamura/tex-conceal.vim), which collapses a lot of LaTeX symbols to their Unicode equivalent on lines other than your current line. It's a nice middle-ground, though far from perfect.

I use LaTeX with unicode-math regularly and I use emacs with quail mode to do exactly this.

If you like Atom, there is https://atom.io/packages/latex-completions

Vim digraph feature (ctrl-k) is very convenient for this. Digraphs are two-character mnemonics for symbols. They tend to be easy to remember and they are standardized in RFC 1345 [1].

Greek letters are always a latin letter followed by an asterisk. Arrows and comparison operators are what one would guess. Mathematical symbols generally make some sense:

    <C-k> a *     α
    <C-k> F *     Φ

    <C-k> = >     ⇒
    <C-k> - >     →
    <C-k> > =     ≥

    <C-k> 0 0     ∞
    <C-k> d P     ∂
    <C-k> I n     ∫
[1]: https://tools.ietf.org/html/rfc1345

> But you can't type α, ∂, or ∫ in a standard keyboard.

I have a standard keyboard and I type greek letters easily. ALTGR+a, and so on.

If somebody is not able to configure their keyboard to do this they are not probably writing LaTeX, either.

If memory serves me correctly, there was no way to type those symbols on an Acorn A5000 which is where I did the bulk of my LaTeXing back in the day (1992-95).

(Admittedly, this anecdata is even more useless than usual.)

Unfortunately ANSI (US layout) keyboards make this much more difficult.

In DrRacket, you can press Ctrl+\ to insert a lambda character.

Of course you can. Armed with only a text editor, a craft knife and some tippex, you can have any keyboard layout you like on a standard keyboard.

On a recent mac, you can enable Greek as a second keyboard, and tick the box there to make caps lock switch keyboards.

As long as you don't need caps lock as your escape key...

You can on a Mac. Holding the option key allows typing a whole host of symbols.


This is like asking someone to use Alt codes (https://www.alt-codes.net/) on Windows, it's highly unintuitive and cumbersome.

They aren’t terribly intuitive, but they are much easier than alt-codes. If you’re a touch typist it doesn’t take long to memorize a reasonable number of symbol positions. I still can’t temember any alt-codes though when I’m on a windows machine.

I know the few I use regularly and the rest are easy to get via ctrl CMD Space. I’ve added various symbol sets to the character viewer and it suits my work flow.

Right, and you can do the same on linux with the compose key. But it isn't very intuitive, discoverable or rememberable.

Badly discoverable? Sure. You can vastly improve the remembering part by picking appropriate mnemonics. The Plan 9 keyboard file is a nice start, I think.

It follows a set of rules so you don't have to remember a large number of sequences but can often guess the right one intuitively.

    ASCII digraphs for mathematical operators give the corresponding operator, e.g., <= yields ≤.
    Greek letters are given by an asterisk followed by a corresponding latin letter, e.g., *d yields δ.


There's a program (mklatinkbd) to convert it to a format usable with X11.

> you can do the same on linux with the compose key. But it isn't very intuitive, discoverable or rememberable.

Well, LaTeX itself is not very discoverable either

It’s pretty good IMHO. Greek symbols are all obvious. \Omega and \omega for upper and lowercase omega etc. Super and subscripts with ^{} and _{} also work with \sum and \int for indices and intervals.

Software like Latexit and Mathpix also help a lot.

You can do so relatively easily with Vim digraphs. For example, the alpha symbol can be obtained by typing

Ctrl-k a Shift-8

What I would like is a set of characters and a font that I could use to enter immediately readable math formulae into my text editor, so that I can take quick notes at a conference, or whenever.

I can already do it, to some extent, for instance, this is the summation equation from page 5 in the pdf above:

             ₙ   ⎛n⎞
  (a + b)ⁿ = ∑   ⎜ ⎟ aᵏbⁿ⁻ᵏ
             ᵏ⁼⁰ ⎝k⎠
But, the scale is off (the summation is too small) and the characters are a mix of sub- and super-scripts and modifiers (in other cases, diacriticals), and multiple characters in the case of the long parentheses. I have to go hunting each of those characters down every time I want to use them, because I'm never sure exactly what characters are available. And even when I do, I still have to look them up most of the time, because of course I can't remember the unicode number for Modifier Letter Small k off the top of my head every time I need it (although by now I've learned the ones for super/subscript i and n by heart, mostly)

Worse, there is basically only one free monospaced font that will display all of those characters (DejaVu Sans Mono) and even that doesn't cover all the symbols I might want to use. For instance, I use the "entails" symbol a lot and that renders as a ⊨ in DejaVu Sans Mono, so I have to put it togehter with what I got: |= or \= for the negative.

Each of the methods discussed in the pdf above (as far as I've read in it, which is not too far) are far from ideal. They're all markups, so they all require at least some mental rendering. Why can't I have a script that lets me write each special symbol as one character, or at worse, a few of them like for the large parens, above?


Edit: And you'll need to set your broswer's font to a monospaced font to see the equation above properly :/

> They're all markups, so they all require at least some mental rendering.

Unicode maintains the position that plain text is unsuitable for this sort of thing and markup languages are preferred. That's also why there are no "formatting" modifiers except in cases where the result actually has different semantics that are important to distinguish (cf. RTL and LTR override). Overall I'd say Unicode does a fairly good job keeping too weirdness out of the standard. Emoji are obviously a much-contended addition, albeit one that already had a history of existence and widespread use in plain-text. And while you find all symbols needed for math rendering in Unicode, some of the weirder ones sometimes came from older character encodings and their existence does not mean that full math markup should be part of Unicode.

Eh, I do still wish super/subscript characters in unicode were a bit more thorough. Several times a day I randomly remember the fact that every lowercase letter in the English alphabet except q has a unicode superscript character, and every single time it ruins my day.

My problem is still with the fonts- I can't find one that even has all of (i,j,k,x,y,m,n,p,q,r) as both super- and sub-scripts. I end up either using capitals or just sucking it up and writing X_i^r or something.

>> Unicode maintains the position that plain text is unsuitable for this sort of thing and markup languages are preferred.

Well, I think that's an unreasonable position. The most intuitive way to enter mathematical text with a keyboard is to enter the symbols you want to enter, directly, as characters. I mean, nobody asks me, as a Greek speaker, to enter (hypothetical markup) \{greek_letter_xi} for χ or \{greek_letter_ypsilon} for υ, etc, thank the gods. Why do I have to use markup for the simplest things, like fractions and exponentials?

All symbols for math may be in Unicode, but there is no font supporting all of them. It's not Unicode that's at fault here, of course.

Shameless plug I am working on something related. System with nice keybindings which feels sort of allright after 5-10 minutes.

The idea is to combine an efficient onscreen structure with key presses typed by the left hand. So press q,w,e, etcetera to select.

Currently I'm looking at MathML and Tex support, would anyone who uses Math daily be interested in UTF-8 support?

My instinct tells me that this UTF solution seems like a nightmare.



It's possible to input into the os, leave a comment of you're interested.

If this is merely an input method then MathML is pretty much all you probably need as it can be pasted into or consumed by most applications handling math, including Word. Plus TeX for, well, TeX.

This format is mostly geared towards input as well, in that you have defined automatic replacements as well as a format that can be typed easily and be built up to rendered math during typing. If you have your own completely different input method you wouldn't gain much from this.

I thought the format of the document looked familiar... and then saw "Microsoft Corporation". I've seen a lot of other documents from them with the same formatting, and it always feels a bit weird to me --- something looks a bit off about the (body) font.

Cambria. Personally I don't like it much, although I don't hate it as much as Computer Modern or Times. The main reason here is most likely that Cambria is also Microsoft's primary math typeface and having the body text and math use the same typeface is IMHO preferable.

Constantia pairs fairly well with Cambria as well, though.

> I don't hate it as much as Computer Modern or Times.

What (latex compatible) fonts do you like/recommend then?

In my opinion, it's kind of sad to see how strongly Computer Modern dominates the font choices in latex (and more generally the latex monoculture).

Well, this is personal preference and aesthetics, so I'm not sure others would have to follow. Palatino is nice, though and has a package where it's also the math font (mathpazo). That was as far as I got with fonts for plain old LaTeX.

Otherwise I'm currently using XeLaTeX (if I use LaTeX at all), so I'm not limited to compatible fonts and anything that's OpenType works.

Some previous discussion here: https://news.ycombinator.com/item?id=14687936

(Full disclosure: it's my thread)

dat c++ function on page 36 tho.. omg

It is much more readable than the version using full-length words for variables.

i totally agree, but living in the world of mostly ascii-limited code this looks exotic, weird and refreshing. and cool

> living in the world of mostly ascii-limited code this looks exotic

this is is not the world where everybody lives. I live mostly in the world of math books, where all variables are single letters (which is the main difference here, and not the use of unicode). I tend to find code written by non-mathematicians ridiculous and unreadable.

Instead of

      E = m * c^2
they seem to prefer bullshit like

      thing->Energy = thing->Mass() * np.math.pow(lib.constants.speedOfLightInVacuum,2)

Some part of that is bad namespace handling, remove the namespaces and it already looks better:

      thing->Energy = thing->Mass() * pow(speedOfLightInVacuum, 2)
And you can obviously clean it up further by using variables and just write

    auto mass = thing->Mass();
    auto c = speedOfLightInVacuum;

    thing->Energy = mass * pow(c, 2);

    // or thing->Energy = mass * c * c;
    // or thing->Energy = mass * c**2; (for languages that support it)
The issue is that in "E = m c^2", it's obvious from context what everything refers to. In programming languages, that's not necessarily the case. You can have "thing", but also "other_thing" and "array_of_things". And all those things can have many different properties. And those things can be either objects, or pointers to objects.

You're not wrong: it's a definite flaw of many programming styles (and programmers!) that they're "too wordy". But it's also true that mathematics can get by with a bit of ambiguity that humans can handle that computers can't.

> E = m * c^2

The problem then occurs when you work in a multidisciplinary group and 'everybody' wants to use 'p' or 'c' or whatever mean the the super obvious thing it means in their field and you have to remember what each letter means in 6 different fields and the 'p' in function f means something completely different from the 'p' in function g.

> the 'p' in function f means something completely different from the 'p' in function g.

And that's OK. The 'x' in mathematics means completely different things depending on the equation. That's what comments are for, to explain the meaning of the local variables on each function.

    // E: energy of the thing
    // m: mass of the thing
    // c: constant representing the speed of light in vacuum
    E = m * c^2
I cannot read multiple-letter variables without unconsciously interpreting them as the product of several single-letter variables. It requires a lot of conscious effort that obscures my understanding of the code.

I suppose you have the same problem when reading this comment too.

no, I do not mistake regular text with mathematical formulas. My problem is that I read code as if it was mathematical formulas.

You might be interested in the Fortress language (RIP), whose input allowed, and was canonical rendered using, mathematical notation.

So, for example, xˆy could be written as x superscript y. SUM[i<-1:n, j<-1:p, prime j] a[i] xˆj was more elegant as a big sigma with the ranges at the bottom, etc.


One of the best (or worst, depending on your preference) aspect of Fortress syntax was that it allows juxtaposition as a multiplication operator. Therefore `(n+1)/(n+2)(n+3)` is, as expected, same as `(n+1)/((n+2)*(n+3))`. The exact "reassociation" procedure for juxtapositions is type-dependent, so that `sin(x)(x)` is `sin(x)` multiplied by `(x)` even though both `sin` and the first `(x)` is followed by other expressions.

From the very little work I've done in Mathematica (which was years ago, and I'm not anywhere close to a "real" mathematician), the mathematical formatting was one of the great perks.

really ? it's much more painful for me

2016? I’m fairly sure I read this (or something very similar) around 2012.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact