Programming with a Differentiable Forth Interpreter

hacker42 · on May 24, 2016

I've forgotten where, but I've recently read a blog post/opinion that we might be encountering the dawn of an applied mathematics winter. Instead of meticulously crafting algorithms for particular problems, we just apply stochastic gradient descent to a computational graph to evolve whatever program solves the problem defined by some training data.

Exciting times.

Animats · on May 25, 2016

I don't fully understand the article, but they're applying it to bubble sort and addition. This is about where things were with Lenat's Automated Mathematician of 40 years ago.[1] Lenat was doing something similar, but on LISP programs.

After a few years, it turned out that this approach only worked on problems for which hill climbing worked really well. You need a well chosen metric for "sorted", and that guides the hill climber to converging on a bubble sort. (But not Quicksort.)

This needs to be demonstrated on a harder problem.

[1] https://en.wikipedia.org/wiki/Automated_Mathematician

aab0 · on May 25, 2016

I would be more optimistic here. AM/Eurisko never showed any replicated results outside of one or two toy domains of simple math & the Traveler's game, and there's always been a lot of questions about how much of that much was even AM/Eurisko and how much was Lenat since he refuses to share source code and his followup project Cyc is notorious for not delivering anything. While on the other hand, deep networks have delivered astounding results on a huge variety of domains when implemented in different frameworks by different people around the world often taking quite different deep approaches. A result from AM/Eurisko doesn't mean much. A result from a deep network may be a crack in the dike which is about to explode and solve longstanding challenges like Imagenet.

DonaldFisk · on May 25, 2016

In AM: A Case Study in AI Methodology by G.D. Ritchie and F.K. Hanna (https://www.researchgate.net/publication/220547429_AM_a_case...), a paper critical of Lenat's work on AM, it is, however, stated that "Lenat has attempted (more than most other workers in AI) to render his program available to public assessment, both by making it available for running and by supplying such detailed appendices in his thesis. The whole discussion in this paper could not have commenced if Lenat had not provided this unusual level of documentation. "

As for Eurisko, I'm unaware of the source code being made publicly available, and several people appeared to have asked for the source, without success. However, I think enough has been published on Eurisko to reproduce it, or something similar, from scratch.

OpenCyc (http://opencyc.org/) was released in 2012, and is available on SourceForge. Cyc was released earlier this year (https://www.lucid.ai/press-releases/mit-tech-review-an-ai-wi...). Lucid (https://www.lucid.ai/) has several case studies of applications of Cyc by Cleveland Clinic, the U.S. Forest Service, a Large Global Bank, and a Global Energy Company.

gambler · on May 25, 2016

You're comparing a single system developed years ago to everything ever produced with neural networks (which refer to a whole family of different architectures).

> A result from AM/Eurisko doesn't mean much. A result from a deep network may be a crack in the dike which is about to explode and solve longstanding challenges like Imagenet.

Sounds like extreme and unsubstantiated bias. Statements like this is why I am highly skeptical of the current neural network hype.

aab0 · on May 25, 2016

> You're comparing a single system developed years ago to everything ever produced with neural networks

I am comparing a single system and all its variants and followups to another family. Oh wait, there aren't any variants and followups to AM/Eurisko except Cyc. Huh. How about that.

> Sounds like extreme and unsubstantiated bias.

ImageNet? AlphaGo? SOTA on language parsing, classification, and prediction tasks? Human-level performance on scores of Atari games? High-quality image synthesis, unsupervised and from textual descriptions? Predictions of visual cortex activations? Program synthesis? If you aren't impressed, you aren't paying attention.

gambler · on May 25, 2016

You've watched far too many Hinton videos.

Cyc belongs to the family of rule-based expert systems. Expert systems were successfully used in medical diagnostics, chemistry, biology and various branches of engineering. Not to mention countless "trivial" applications in planning and logistics for businesses. I could also make a case that DeepBlue was an expert system, and thus add "superhuman performance in chess" to the list.

Saying that results from an expert system don't matter (simply because it's an expert system), while believing equivalent results from an ANN will "explode and solve longstanding challenges" (simply because it's "neural") makes no sense. ANNs are not magic.

joe_the_user · on May 25, 2016

It seems like that kind of transition could happen badly.

I've done pure math to the graduate level and I've a modest amount of programming. But even with both these backgrounds, I've bitten hard by efforts to do applied math. It seems that applied math is very the domain of people who both know algorithms and know how to "squint correctly" at a given problem and see whether X algorithm is appropriate. And that seems to be a very necessary skill. Where the validity of one approach stops and another begins is fairly opaque until one is an expert.

The thing is that neural network appear to be fairly similar. They're somewhat generic but tuning again requires high level expertise and tutelage.

The worst case scenario is that deep learning solves a broad but still limited class of problem, expanding somewhat the nebulous area cover by applied math. Then all the expertise and credibility goes there and gradually, the nebulous area covered by deep learning stops expanding and progress actually becomes harder since there's less real knowledge, just general intuition about a single, quite general approach.

giardini · on May 25, 2016

* ...people who both know algorithms and know how to "squint correctly" at a given problem ...*

* ... tuning again requires high level expertise and tutelage.*

Appears to be more luck (or "serendipity", if one is generous) that is required!

As new algorithms crop up, everyone moves to them to redo older problems. That continues until the next new thing pops up. Pedro Domingo's book, The Master Algorithm", describes this somewhat:

http://www.amazon.com/Master-Algorithm-Ultimate-Learning-Mac...

current_call · on May 25, 2016

You might enjoy this paper if you haven't already read it. http://arxiv.org/abs/1410.5401

Jump to the experiments section to get an overview of what it can do.

ilyaeck · on May 25, 2016

There is value and demand for both approaches. For the foreseeable futures, we are going to need lots of applied mathematicians to figure out computational models that will eventually put them out of business (same as programmers ;)) Someday, but not anytime soon.

GFK_of_xmaspast · on May 25, 2016

What does that even mean?

current_call · on May 25, 2016

The slope of a function at some point provides information about a local maximum. A positive slope means a peak is to the left and a negative slope means a peak is to the right. If you want to find the input that maximizes or minimizes the output of your function, you can use the slope to guide you. Calculate the slope, change the input accordingly, and repeat. This is gradient descent in one dimension. If your function takes multiple arguments, then instead of a single slope you use a vector of slopes with respect to each argument. This vector of slope(s) is your gradient.

GFK_of_xmaspast · on May 25, 2016

I know what gradient descent is, thanks, I was referring to the rest of that mess.

hacker42 · on May 26, 2016

Do you really expect anyone to sensibly respond to your passive-aggressive tone?