
Symbolic mathematics finally yields to neural networks - Zuider
https://www.quantamagazine.org/symbolic-mathematics-finally-yields-to-neural-networks-20200520/
======
todd8
About 50 years ago I was fascinated to see a program that could do symbolic
integration. Now, programs like Mathematica can do much harder, more complex
integrals.

Professor Patrick Winston pointed out that most of the AI programs we were
going to look at in his class (chess, natural language, 3D vision in block
world, etc.) ended up, like integration, as simple code plus a database of
facts.

Back then, there was general optimism about the future of AI. No one
anticipated how slow progress would be while the capabilities of the hardware
grew a million-fold.

The work referenced in the article is interesting. It appears to be another
small step in advancing AI. It's a small, but difficult, step like almost
every other advance in AI, and I admire the work done by those working in this
field.

The problems of solving simple differential equations and symbolic integration
at the first year calculus level are not really advanced math. Humans solve
these problems with a relatively small bag of tricks that transform a symbolic
integral into a simpler form. A program can do the same thing with an even
more detailed database of transforms that can be attempted at each point in
the search tree until a simple solution is reached.

The article claims that the new program can solve difficult integrals. This is
interesting because hard to solve integrals are often associated with real
physical phenomena. See for example the triple integrals of W. F. van Peype
which arose while he was studying magnetism in different materials. These,
relatively, plain looking definite integrals stumped some of the world's most
famous mathematicians. See [1] and/or [2] for their interesting history.

[1] Paul J. Nahin, _Inside Interesting Integrals_ , Springer, 2015. Section
6.5, The Watson/van Peype Triple Integrals.

[2] I.J. Zucker, 70+ Years of the Watson Integrals,
[http://www.inp.nsk.su/~silagadz/Watson_Integral.pdf](http://www.inp.nsk.su/~silagadz/Watson_Integral.pdf)

~~~
amw-zero
Honest question. Are you fearful of the moral implications of AI? Whenever I
hear someone that is fascinated about AI and only thinks about it as an
intellectual pursuit, I’m curious if they are thinking at all about the
consequences of powerful AI.

From where I sit, the bad outweighs the good.

~~~
fao_
These types of worries always strike me as worrying about keyhole surgery
going wrong, before we are capable of making scalpels, anasthetic, or even
video cameras -- or hell, before we even know what a tendon is or what the
purpose of blood is. Or worrying about the challenger 2 explosion when we
can't even make gunpowder. Or worrying about the logistics of flying cars in
three dimensions and traffic crashes before we are even able to build a boiler
engine to drive a train.

We are _so far_ away from GAI at the moment that I don't for one second
_actually worry_ about the moral implications of General Artifical
Intelligence.

I don't see the point in worrying about something when we know literally
_nothing_ about that thing, and barely have a path to making it. It's very
likely that by the time we are capable of making GAI (Excusing the very
probable idea that we will be able to simulate a brain, but not at any proper
speed -- see the three body problem and the challenges with simulating
_literally any other_ physical systems), there will be half a dozen problems
we _do_ need to worry about, that we _cannot forsee_. There will also be half
a dozen limitations that mean that our current worries are _essentially
worthless_. It's the same with all new technology.

It's also interesting that people who tend to worry about GAI never worry
about _current levels of AI_ , especially in a military context. They seem
entirely unconcerned with being worried about literally _crappy_ and half-
baked neural networks being deployed for use in drones. They seem entirely
unconcerned with the lack of proper dataset balancing and sorting that ensures
that _current AI models do not have racial bias_ (or, indeed _other_ types of
bias).

Just last year I saw a twitter post about a startup that was re-creating
_literal phrenology_ , using AI to try and profile whether people were
criminals or not based on _facial shape_. The typical Less Wrong / MIRI folks
never seem to be worried about that, no, they spend their time in fear of
Roko's Baselisk and other currently-impossible scenarios. They literally
_purged_ posts, threads, and comments that made _any mention_ of that under
the utter and complete fear that maybe in the far flung future a very bad
simulation (Unless, their brains are cryogenically frozen, I guess, but it's
very likely that brain structure would degrade under the immense timespan
anyway) of them would be tortured for their current actions, by a _good AI_ ,
that had apparently gone so insane that it thought that torturing low-fidelity
simulations of people in the future could affect the past and cause it to be
created faster.

Speculating about the future can be a positive thing, but I don't see how this
is at all useful or healthy.

~~~
amw-zero
You can’t imagine any reason to worry? At all?

Worrying about the future of surgical technology is very different. The end
goal of surgery is to save a life or improve the quality of life, and it
involves restoring a single person back to working order.

The end goal of AI is to _think_. The upper bound on that is horrifying. Once
something can think it can build. Once something can build it can multiply.
The upper bound on AI is replacing the human species.

I’m not saying I’m nervous about this happening next year. I know how terribly
inept we are at true GAI. I’m thinking purely abstractly, and in that light I
think we should more serious about ground rules for AI.

~~~
fao_
Can you re-read my post more closely and actually critique it. You chose one
of my points, arguably the weakest (partly because it's an analogy --
analogies are mostly for flavour, they don't make a good argument but they
help you to appreciate where I am coming from) and ignored the stronger
criticisms I posted after that.

------
peterlk
This is one of the most exciting things I've seen come out of AI research for
my own interests. The way that they distributed the mathematical equations is
called an abstract syntax tree (maybe they mentioned this, but I was excited-
skimming), and is also how computer code is represented. It also happens that
there is _a lot_ more computer code laying around in computer-readable format
than high school and college math problems. With this, you get: metamorphic
code, better optimizers, more abstract programming languages (maybe), and
probably lots of other steps forward.

If anyone is working on AI+programming languages, please send me a message.
I'd love to work with others on this problem.

~~~
Jugglerofworlds
I'm planning on applying for PhD programs this fall to work in this area.
There are only a few places in the world right now that I know of working on
these types of problems. They are:

* Martin Vechev, ETH Zurich

* Dawn Song, University of California Berkeley

* Eran Yahav, Technion

* Miltiadis Allamanis, Microsoft Research Cambridge

If anyone knows other advisors looking for graduate students in this area,
please let me know. Due to personal circumstances I can most likely not apply
to ETH Zurich or Technion (I don't speak Hebrew anyway), which leaves me with
only one potential advisor in a program that I really want.

There is also the Python writing model that Open AI showed recently at the
Microsoft Build conference, so maybe there is some interest growing at other
places as well.

I was also recently working on a deep learning decompiler but was unable to
get my transformer model to learn well enough to actually decompile x64
assembly. I have the source code for the entire Linux kernel as training data,
so it's not an issue with quantity. If anyone is interested in helping out
with this project, please let me know in a comment.

~~~
p1esk
_I have the source code for the entire Linux kernel as training data, so it 's
not an issue with quantity_

Linux kernel is only ~30M LOC. That's a really small dataset. For comparison,
the reddit based dataset for GPT-2 is 100 times larger. Try using all C code
posted on Github.

 _decompile x64 assembly_

You can't "decompile" assembly. Either you decompile machine code, or you
disassemble assembly code. The latter is easier than the former, so if you're
trying to decompile executables, then perhaps you should train two models: one
to convert machine code to assembly, and the other to convert assembly to C.
Assembly code produced by an optimizing compiler might differ significantly
from assembly code which closely corresponds to C code.

~~~
tsimionescu
> perhaps you should train two models: one to convert machine code to
> assembly, and the other to convert assembly to C.

Is the step of going from machine code to gcc-produced assembly not trivial?
Is gcc actually producing assembly code that an assembler needs to do more
with than convert to the corresponding opcodes?

~~~
p1esk
There are two kinds of assembly: 1. assembly that corresponds to optimized
machine code, and 2. assembly that closely corresponds to the original C code.
As I said, these two assembly versions might look very different depending on
optimizations performed by the compiler. You can reduce the difficulty of
learning the conversion from machine code to assembly at the expense of
increasing the difficulty of learning the conversion from assembly to C code
(and vice versa).

------
_0ffh
To me, the article somewhat misses the point of what's interesting here. Using
ASTs to represent equations, or even whole programs has plenty of precedents
in ML/AI. I'd have liked to know how exactly they translate these trees to a
representation suitable for an ANN. Fortunately, the paper seems to be easy to
find and access (it's [1], I guess).

[1] [https://arxiv.org/abs/1912.01412](https://arxiv.org/abs/1912.01412)

~~~
orange3xchicken
It looks like they go from tree -> sequence via prefix notation. I'm curious
why Lample decided on this seq2seq approach when it seems that there might be
models which could be more naturally applied to the tree structure directly
[1, 2]

[1] [https://arxiv.org/abs/1902.07282](https://arxiv.org/abs/1902.07282) (an
AST translation system)

[2] [https://arxiv.org/abs/1609.02907](https://arxiv.org/abs/1609.02907)
(GCNN)

~~~
shmageggy
For the same reason people use huge transformers for sentences in natural
language (which are also tree structured): they scale really well. If you have
enough data, huge transformers have huge capacity. If you notice, this paper
is entirely about how to cleverly generate a massive dataset. There is no
novelty in the model -- they just use a standard approach described in two
paragraphs.

------
globular-toast
Symbolic mathematics (or computer algebra as it is more commonly known as) was
one the original driving forces for AI. Differentiation turned out to be quite
easy once things like ASTs and other data structures were developed to
represent polynomials and other elementary functions. Integration is way more
difficult and while AI research made great inroads the problem was eventually
solved by algebraic methods. But the full algorithm (the Risch algorithm) is
about 100 pages long and has never been fully implemented. The Axiom computer
algebra system is the closest AFAIK, but the premature loss of Manuel
Bronstein set it back a bit (that system is fascinating as well, it's a
literate program in Lisp).

The AI approaches always had one great problem: if they can't find an integral
you still have no idea whether an integral exists or not. The Risch algorithm,
on the other hand, can tell you for sure if an (elementary) integral doesn't
exist. Axiom is fully capable of saying "no", but can't always tell you what
the integral is if it does exist.

Using an AST to represent expressions isn't novel, by the way. I implemented
such a system as an undergrad computer science student (I also implemented
complete integration of rational functions).

------
augustt
Discussion about the paper a while back:
[https://news.ycombinator.com/item?id=21084748](https://news.ycombinator.com/item?id=21084748).
Conclusion seemed to be that the comparison against Mathematica was unfair
since Mathematica's execution time was capped at 30s.

~~~
hyperbovine
Interesting, since in my experience Mathematica's execution time is also
lower-bounded by 30s.

------
microtherion
Here's a somewhat more technical critique of the approach (and of the
evaluation cases the authors used):
[https://arxiv.org/pdf/1912.05752.pdf](https://arxiv.org/pdf/1912.05752.pdf)

~~~
newpycai
> It is important to emphasize thatthe construction of LC is entirely
> dependent on the pre-existingsymbolic processors developed over the last 50
> years by experts in symbolic mathematics.Moreover,as things now stand,
> extending LC to fill in some of its gaps (e.g. the simplification
> problemsdescribed in section 3) would make it even less of a stand-alone
> system and more dependent onconventional symbolic processors. There is no
> reason whatever to suppose that NN-based systemswill supercede symbolic
> mathematics systems any time in the foreseeable future.

That's the gem of the review.

------
neatze
From article: “You need to be confident that it’s going to work all the time,
and not just on some chosen problems” by mathematician Frédéric Gibou

If I understand correctly, mathematical solutions can be verified, while
neural network solutions would be very hard, if not impossible to verify in
reasonable time.

~~~
jfkebwjsbx
It depends on the problem.

For integration, you can just derive.

For infinitely many other problems... verification is way harder.

~~~
empath75
is it usually the case that derivatives are easier to compute than integrals?

~~~
danbruc
The general algorithm for calculating integrals [1] is rather complex and I
guess not suitable for humans so that calculating integrals sometimes looks
much more like a black art then calculating derivatives does. On the other
hand one could argue that there are algorithms for doing both and so there is
no real difference.

[1]
[https://en.wikipedia.org/wiki/Risch_algorithm](https://en.wikipedia.org/wiki/Risch_algorithm)

------
libeclipse
> The Facebook researchers compared their method to only a few of
> Mathematica’s functions —“integrate” for integrals and “DSolve” for
> differential equations — but Mathematica users can access hundreds of other
> solving tools.

> [...] it only included equations with one variable, and only those based on
> elementary functions. “It was a thin slice of possible expressions,”

> The neural net wasn’t tested on messier functions often used in physics and
> finance, like error functions or Bessel functions. (The Facebook group said
> it could be, in future versions, with very simple modifications.)

> Other critics have noted that the Facebook group’s neural net doesn’t really
> understand the math; it’s more of an exceptional guesser.

> Still, they agree that the new approach will prove useful.

~~~
uoaei
> exceptional guesser

Considering neural networks are inherently maximizing probabilities and
statistical descriptions of data, this should come as no surprise. This work
has not dissolved the dichotomy between rules-based and statistical methods,
but rather transmuted the syntax of rules-based expressions into a
representation that can be exploited by statistical machines in a way that
makes "guessing" more fruitful.

There are some examples near the end of the paper showing how the authors take
an initially intractable expression and are able to simplify it with their
approach so that Mathematica can actually perform the integral for them. It
seems much more appropriate to market this method as a preprocessor for
massive expressions to a more chewable size.

~~~
sdenton4
> exceptional guesser

This IMO describes how mathematics itself moves forward... A matematician is
an extremely well-trained 'guesser' who is also able to sink a lot of time
into formal verification.

The process is essentially: a) Find an interesting conjecture that you've got
a strong guess to be true. b) Check for obvious (or less obvious)
counterexamples, or conflicting theorems. c) Prove the thing is true.

A large part of the art of being a working mathematician is in part (a): you
need to make a really good guess. An ideal conjecture is correct AND proveable
AND leads to other interesting results, or says interesting things about
bigger problems.

So what happens when we apply really good versions of current AI to this area?
Picking out an 'interesting' conjecture is still Strong-AI-Complete: it
requires lots of domain knowledge, and an understanding of what this
particular conjecture would 'unlock.' But we could perhaps come up with good
'guessers' which quickly tell us whether a given idea might work out, perhaps
saving a bunch of effort. Perhaps we could even get to the point of generating
a proposed proof which can be fed to an automated proof checking system,
allowing for inspection and modification by the human in the loop.

------
KKKKkkkk1
So what's the earliest known AI solver for calculus textbook problems? I bet
it goes all the way to the 1950s.

This quote seems to be massively misleading:

 _But it’s clear that the team has answered the decades-old question — can AI
do symbolic math? — in the affirmative. “Their models are well established.
The algorithms are well established. They postulate the problem in a clever
way,” said Wojciech Zaremba, co-founder of the AI research group OpenAI.

“They did succeed in coming up with neural networks that could solve problems
that were beyond the scope of the rule-following machine system,” McClelland
said. “Which is very exciting.”_

~~~
rst
Grace Hopper was experimenting with symbolic differentiation of functions in
her A-2 compiler in the 1950s, and she may well not have been the first. My
reference for this is her paper in the proceedings of a 1957 British
conference on "Mechanisation of Thought Processes" \-- if you set up an
account at archive.org, they may let you page through it here:
[https://archive.org/details/mechanisationoft02nati/page/n9/m...](https://archive.org/details/mechanisationoft02nati/page/n9/mode/2up)

~~~
mkl
Not AI, though (or almost every computer program would be "AI").
Differentiation can be done with simple deterministic rule following.

~~~
rst
Earliest reference I'm aware of for integration is James Slagle's MIT Ph.D.
thesis work, from 1961.
[https://dspace.mit.edu/handle/1721.1/11997](https://dspace.mit.edu/handle/1721.1/11997)

------
brianberns
It seems that the neural net just spits out an answer, rather than deriving it
step-by-step like humans do. That's interesting, but would still get you an F
on a real calculus test.

~~~
_bxg1
I think the idea with this kind of thing is that ML can make pretty-good
guesses really quickly, and then a formalized process can verify them (usually
much more quickly than it could derive them). This hybrid model fits lots of
different kinds of problems.

------
nano_o
You might find the Sledgehammer tool for Isabelle quite interesting. It has
been using machine learning techniques to find proofs automatically since at
least 2013. It uses previous proof to learn how to select facts to send to
off-the-shelf automated provers in order to discharge a goal. See e.g.
[http://isabelle.in.tum.de/~blanchet/mash2.pdf](http://isabelle.in.tum.de/~blanchet/mash2.pdf)

On issue I have with it, and that automated proof tools based on ML are going
to have to solve, is that it's quite unpredictable. Even if it finds stunning
proofs from time to time, it is hard to use an unpredictable tool efficiently.

------
samuel2
How about using [http://us.metamath.org/](http://us.metamath.org/) as a db for
math theorems / definitions and do some heavy data mining there?

~~~
raphlinus
There's some work on this already, including Holophrasm and work by OpenAI on
proof shortening. These efforts are linked from the Metamath wiki:

[https://github.com/metamath/set.mm/wiki/Automated-
proving](https://github.com/metamath/set.mm/wiki/Automated-proving)

~~~
samuel2
very cool, thanks for sharing

~~~
dwheeler
OpenAI has managed to derive a number of Metamath proofs, some of which are
smaller than the human-created proofs, using machine learning.

Here is a gource visualization of metamath proofs overtime in the set.mm
database:
[https://m.youtube.com/watch?v=LVGSeDjWzUo](https://m.youtube.com/watch?v=LVGSeDjWzUo)

Note that near the end, one of the contributors is OpenAI, who is not a human
contributor.

------
zengid
This looks really interesting! Although, I'm not sure if this was "first". I
know of another approach tried by Hinton and Sutskever in their paper "Using
matrices to model symbolic relationships" [0]. I don't see a date, but I
remember Hinton mentioning it in a talk from a few years ago..

[0]
[https://www.cs.toronto.edu/~hinton/absps/ilyamre.pdf](https://www.cs.toronto.edu/~hinton/absps/ilyamre.pdf)

------
nullc
Just need to build a tool that predicts the next version of a program from an
earlier version and suggests improvements (ideally bug fixes).

Then you have [http://www.scp-wiki.net/scp-914](http://www.scp-
wiki.net/scp-914) for programs.

------
paulus_magnus2
It would be really interesting to see code simplification via neural network.

While working in enterprise I see plenty of code that after some massaging
collapses into a much simpler form (for the programmer) and equivalent for the
computer.

(fix typo)

------
clircle
What would a confidence interval for this type of prediction look like?

~~~
bglazer
A confidence interval is a sort of meaningless notion in this case. You can
just do the derivative of the algorithm's answer, and see if you get back an
expression that is equivalent to the input. If you did, then the answer is
correct.

~~~
clircle
I don't believe a model (neural network or otherwise) is capable of knowing
that its predictions are correct.

------
enricozb
Since ML outputs are usually "fuzzy", how does one check that the output is
actually correct when using a technique like this in the field?

