There may be other subtleties as well. Mathematica works in the complex domain by default, which makes many operations more difficult, but the authors discard expressions which contain complex-valued coefficients as "invalid", which makes me think they're implicitly working in the real domain. Do they restrict Mathematica to the real domain when they invoke it? Perhaps, but they don't say one way or the other. And do they try common tricks like invoking FullSimplify on an expression/equation before attempting to operate on it? I'd like to see more details of their methodology.
I had the same initial reaction as you, but then I realized that this is still extremely useful. In a ton of examples, only one direction of differentiation/integration is hard while the other direction is easy. You could build a system that attempted to solve it directly, and failing that attempted to guess-and-verify using this approach. My intuition is that such an overall system would be strictly superior to Mathematica's approach as it exists today.
I suspect what's happening, given the constrained space of problems, is the NN is matching input formats to solution templates. For that space, it would have learned some patterns of what undoes to what. Perhaps there's even something that's captured aspects of the product rule buried in those weights.
I would be curious to see how it does on problems outside the training set format. Or on integrals that require a trick, such as exp(x) * sin(x) or those with nested composition: A/(1+exp(-k*(x-a))). Is it sensitive to input size? Would a smaller but slightly tricky integral like 1/sqrt(x^2 - y^4) trip it?
Most important, since the bulk of real world indefinite integrals are not analytically tractable, it is vital that a practical system knows when to give up. If I put in exp(-x^2), it better not return an answer. This is my main worry for methods like this. In a real world setting covering a wider range of mathematical problems, you can't trust something that's not 100% on what it knows how to do.
It won't be 100%, but no system (including humans) would be able to deliver a 100% rate of answers on symbolic maths.
Alternatively, for the problems it knows how to do, a CAS should be as close to 100% as can be reached. But you are right, even Mathematica is not always correct so it is useful to be able to query a solver for its steps.
The benefit there is that bias control is relatively easy for genetic programming, because their adaptations are more-or-less-easy to wrap your head around.
I'd like more research into marrying cartesian genetic programming to something like transformers maybe, in part because genetic generators are still far beyond transformers in the novelty they can generate. It's relevant for e.g. generating interesting melodies that conform to some automated fitness evaluator, specifically because they are so extremely efficient iterative search algorithms for these high-dimensional spaces.
Using multi-hot encoding "piano-roll style", or even just a sequence of (pitch; volume; rise-speed (attack; how fast the pitch increases to nominal); duration) is not necessarily efficient, as even simple, basic nodes (for a typical tree-based representation) like double (or repeat n-times, where n can increment/decrement more easily than the node itself be changed completely by mutation), double but reverse the copy, double but shift the copy's pitch, but not really more (I'll link the paper later) yield evolution efficient enough to train a neural discriminator from zero by supervised online operation.
The NN suggests a ranking by quality (the worst are culled; just normal genetic programming evolution stepping), and the software itself uses the human who's opinion the NN shall learn to correct the ranking by playing pairs of samples and asking which is better/worse. If you use some statistics you should be able to use very few verification comparison on top of the quicksort/mergesort approach to guard against those algorithms playing poorly with a comparison operator that only probabilistically conforms to a partial order and asymptotically to a total order. Later it might be possible to use the discriminator's scoring to put more effort into close calls, or just adapting the statistics to respect that not all comparisons have the same probability split.
It could be great to try such for solving problems who's inverse is easy. "Throw" a couple mathematicians specializing in butterfly-effects regarding solvability/solutions for the type of equations at it, and let them write rules to guide the trial and error of a genetic programming suggestor. The fancy DNN is trained to recognize particularly promising candidates to submit as feedback to the evolution step. One major benefit is massive parallelism in-model and between candidates, as well as the afaik unmatched ability to efficiently find truly novel solutions to problems.
Just remember how good AFL is as long as there is no cryptography/proper hashing involved. And part can be circumvented by marrying it to an SMT/SAT solver like Z3 to find interesting inputs for evil code like hashes or fill-in problematic fields in the testcase to get past this.
If anyone is interested in code for the melody generator, hit me up and I'll ask the original author to license it so I can publish my fixed version. It's a bit over 20 years old, so it was unlisted on the web and didn't compile / run on a modern linux. It's still 32bit-only, but I guess a bot could replace all then-64bit-types with the 32bit ones where not risking truncation of results from foreign functions.
Brad Johanson, Riccardo Poli: GP-Music: An Interactive Genetic Programming System for Music Generation with Automated Fitness Raters
That said, I'm still extremely surprised this works that well. I would be very curious to test this interactively.
More importantly, I'm curious if there are problems that Mathematica knows it can't solve but for which this system silently gives wrong answers.
Another interesting extension to the experiments would be a longer timeout -- 30 seconds seems a bit arbitrary and quite low for a CAS. However, I suspect the reason for that time out is the fact that Mathematica licenses are insanely expensive. Otherwise the 5,000 (actually, only 500) test problems could be run for at least a few minutes at pretty trivial cost. Maybe there's a Mathematica employee here who can suggest Wolfram donate some compute (or at least limited licenses) for a small evaluation cluster. Especially if the authors decide to do follow-up work.
In any case, this is really interesting work. I think deep learning for symbolic mathematics is going to be a super interesting area to watch for a least the next few years. Good work, anonymous author(s).
To explain: the thing that's super interesting to me about this paper (i.e., "strong result" vs. "best paper contender") is not integration per se. It's the possible applications of the method to problems with much, much, much higher computational complexity than integration. On those problems, validating the correctness of a solution is also intractable. In those cases, a sound function approximation approach would be an absolute game changer for symbolic methods.
(Not that integration isn't interesting as well.)
Still, I fear, the numbers are currently too small to get past the information bottleneck (mere thousands). We'll see.
I had trouble finding the test cases they used. Where'd they list them?
I have used seq models for several character recognition and generation projects, and seq RNN/LSTM style models are surprising in what they can do in terms of modeling language, building representations, synthesizing data like JSON as text data, etc.
I find deep learning models like this one are trying to do too much in a single "step" or module.
I am not trying to underestimate the value of the work but those problems are very much what DNN are trained to solve, are they not?
The fact that they are "symbolic" is just an artifice of the problem: they are no more symbolic than a series of colors, if you tell me.
That is why they can be automated (mostly) and why "tricks" to solve families of them can be developed.
this article is still unpublished and was only submitted for peer review yesterday!
good idea to use the cas to generate though. I'm always trying to think up tasks that operate on procedurally generated data (e.g. inverse graphics) because then the labeled data is cheap.
Then what would be your complaint?