
Image to LaTeX - cvphelps
https://www.wandb.com/articles/image-to-latex
======
knolan
So this is just like Mathpix?

[https://mathpix.com/](https://mathpix.com/)

I’ve never found typing out LaTeX equations particularly tedious, unless
you’re doing multi-page derivations in which case you’re spending more time
checking for errors.

~~~
3jckd
Yeah, I've been using Mathpix for almost a year now and this seems practically
the same.

Unless they are deliberately avoiding comparison to Mathpix, they are quite
unaware of the competition out there :|

~~~
JustCallMeAl
We were aware of Mathpix and we think it’s an awesome tool! We didn’t mention
it because there isn’t any publicly available performance results for the
model they use, so we weren’t able to compare it to ours in a quantitative way
:)

~~~
vinni2
But you could do the performance evaluation of mathpix using your benchmark
or?

~~~
JustCallMeAl
The evaluation sets involves ~20k samples. They only let you do like 100 for
free. Also I’m not sure there’s an API for it for me to do automatically for
such a large scale

------
rcfox
The stated use case is to allow scientists to quickly add equations from other
sources into their papers. However, if you look at output 2, there are some
subtle mistakes in the result. It seems like whatever time is saved on
transcribing equations would be lost on proof reading them.

It does seem like this could have value as a LaTeX teaching tool though, for
when you need to figure out how to write the markup to do a certain technique.

~~~
SiempreViernes
Yeah, for my domian I just download the latex source from arxiv and take
equations from there if there is any point at all to not just transcribing.

------
choeger
Unfortunately, typesetting the equations directly is often the wrong way. It
might sometimes work for lecture notes, but when you develop your own
scientific work, your notation is often still fluid. In such a case it is
sensible to express your mathematics with a domain-specific notation. Then you
can later a) change your notation and b) understand your source code.

As an example, instead of writing \\(G = (V,E)\\) directly you might want to
write \\(\graph{} = (\nodes{},\edges{})\\)

~~~
JadeNB
> In such a case it is sensible to express your mathematics with a domain-
> specific notation.

> As an example, instead of writing \\(G = (V,E)\\) directly you might want to
> write \\(\graph{} = (\nodes{},\edges{})\\)

This can be great for personal use, but it can be very bad for communicating
with others; (a) you won't be able to write a paper with a co-author unless
you agree on your domain-specific abbreviations in advance (and, if you think
that's easy, then I say from experience that it isn't), and (b) most journals
prefer a minimal number of custom commands, and some forbid custom commands
entirely. Even if journals permit custom commands, the more you define, the
more likely you are to conflict in some baroque way with their own style
files.

EDIT: Also, even if you do this only for personal use, it can be very hard to
go _back_ from it (just search for TSE questions on macro-expanders; there are
plenty, but all of them that I know of assume some well formedness constraints
that make sense to their authors but will almost inevitably conflict with
one's own ad hoc conventions).

~~~
Ar-Curunir
Eh? I use tons of macros in my papers with 5+ coauthors all the time; it's
preferable to random per-person notation that is inconsistent over the paper.

In fact it's the only way to scale a paper across tens of pages and over
multiple people

------
nickthegreek
If this works on handwritten equations it would be great. I used to work at an
online school and getting teachers to be able to write latex was next to
impossible.

~~~
sansnomme
There are quite a few educational apps that does this. E.g. look up
socratic.org's GitHub.

~~~
murkle
Sorry, I don't see it:

[https://github.com/socraticorg](https://github.com/socraticorg)

~~~
nicodjimenez
Socratic is powered by MathpixOCR :)
[https://mathpix.com/ocr](https://mathpix.com/ocr)

------
jdc
Project:
[http://lstm.seas.harvard.edu/latex](http://lstm.seas.harvard.edu/latex)

Source code:
[https://github.com/harvardnlp/im2markup](https://github.com/harvardnlp/im2markup)

------
lol768
Reminds me of Mathpix Snap: [https://mathpix.com/](https://mathpix.com/)

~~~
JustCallMeAl
Yep! In fact we chose to work on this project after seeing (and using)
Mathpix. We thought it solves a very interesting problem using deep learning
:)

------
sansnomme
Another way would be to use the neural network to do segmentation, run the
individual results through DeTeXify and finally reassemble. This would
probably yield higher accuracy (DeTeXify works with handwritten too).

~~~
JustCallMeAl
Interesting idea!

------
gumby
I Like this but I notice that even they examples in the paper are buggy.

But it could be like Google Translate: does a terrible job but makes a handy
starting point you can then fix up to be reasonable.

------
peignoir
This is awesome I was looking for such a tool. Indeed handwritten seems like
the next logical step.

~~~
golem14
Or scanning older math-y books that have no LaTeX equivalent.

------
skrebbel
This is only a problem that needs solving because LaTeX formula syntax is
pretty horrible.

I'd like to underline that, for all its warts, MS Word has an _excellent_
formula editor (which even supports common LaTeX commands as autocorrect). I
know that it's a radical idea, but if the goal is to let scientists spend less
time writing formulas, maybe we should consider letting them use modern
software?

ps. MS Word can copy formulas into the clipboard as LaTeX expressions, should
your journal require LaTeX papers but you dont want to edit your formulas like
a caveman.

~~~
et2o
It's extremely tedious to type formulas in MS Word... an insane amount of
clicking around the toolbars and then clicking in boxes. In my experience
Latex is far easier.

~~~
whatshisface
Further, this tool is designed to turn screenshots taken from papers into
LaTeX code. It's faster than _any_ editor because the user does not have to
enter the equation at all.

