Hacker News new | comments | show | ask | jobs | submit login
Machine learning is teaching us the secret to teaching (nautil.us)
163 points by dnetesn on Sept 28, 2014 | hide | past | web | favorite | 24 comments

Although the article makes no mention of them, George Lakoff and Mark Johnson's research on conceptual metaphor[0] seems to be intricately tied to this, somehow. I recently read two books by them: Metaphors We Live By (see Peter Norvig's description here[1]), and Philosophy In The Flesh[2] and found the ideas in them on how humans use metaphors to make sese of things very interesting and intuitive. It actually left me wondering when and how AI would use these insights.

[0] http://en.wikipedia.org/wiki/Conceptual_metaphor

[1] http://norvig.com/mwlb.html - because it stays focused on language it manages to make a clear and convincing case

[2] http://www.nytimes.com/books/first/l/lakoff-philosophy.html - by comparison a flawed but nonetheless good read. I agree with the critiques laid out this review: http://lesswrong.com/lw/871/review_of_lakoff_johnson_philoso...

I started reading "Louder Than Words: The New Science of Meaning" which is based on embodiment. Instead of making up these grand hypotheses like Lakoff tends to do (not necessarily a bad thing) Bergen ties embodiment and more specifically simulation to new empirical research. I think that we could probably benefit from cross-pollination between cognitive linguistics/psychology and more technical AI.

Yeah, that's a valid criticism of Lakoff especially. This blog sums up the problems with his writins quite nicely, without losing sight of the good bits and the contributions of his work:


Thanks for the tip, I'll check out "Louder Than Words"

I found some slides [1] explaining how this works.

A less poetic example of privileged information: if you're training on time-series information, you can include events from the future in the training examples, even though they won't be available while making predictions in production.

Apparently this helps the machine learning algorithm find the outlying data points when the data isn't linearly separable.

[1] http://web.mit.edu/zoya/www/SVM+.pdf

I found these slides very helpful:


In particular, I was unsure after reading the original article whether the additional information--for example, the poetry--was available to the learner on test inputs. The above slides explicitly state that it is not.

Unfortunately, I don't find that reference very helpful - it's just pages of annotated equations.

What's an example of pseudocode that would actually implement this? Surely you don't load a natural language module in order to parse the pathologist's notes (in the example given in the reference about biopsies)?

(I should also note that the original article is devoid of any technical examples, making it completely opaque to me what it actually entails.)

This might be somewhat better. Here are slides from a course that the researcher taught about this algorithm:


Here's a directory containing the data for the number learning example:


It's worth pointing out that Vladimir Vapnik is the inventor of Support Vector Machines. The short version of what he's done here is he's come up with a way of formulating them that allows him to make use of extra information at training time (that is not available at test time).

It really is a very innovative approach IMO.

That is an important point - the extra info is only available at learning time (otherwise you need a physician sitting next to the "cancer-scanning" computer slowing down the clock speed by doing the analysis themself.) This seems obvious once you say it, but it had not occurred to me before, thanks!

Yes, I mostly don't understand the math either. But apparently with the poetry, they converted it into a vector somehow based on the appearance of keywords. Perhaps someone will find a friendlier example.

I think an example is this: for a concrete set of data, see the distributions on page 28 of this document (page 40 of the PDF) http://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/harpur_thes...

These different distributions are difficult to distinguish with statistics. However, if you can see the shape snd therefore know the "rule" or the structure of the distribution, it's easy to design/train a network to recognise them.

(note that the content of that PDF is not about this problem per se)

I imagine a keyword decomposition would work very well with a pathologists' report too - essentially using the biological feature-names as "tags" on the image would eventually allow a program to correlate an image of that feature with a name much faster than a general "good v bad" fitness function.

If this is how it works, it's a shame this isn't made clearer in the texts. You could get someone to tag images whimsically but consistently, rather than go the whole trouble of writing poetry.

    “In many cases, humans use their own knowledge about
    actions to recognize those actions in others,” he told
    me. “If a robot knows how to grasp, it has better chance
    of recognizing grasping actions of a person,” he said.
    Metta is going one step further, by teaching iCub to
    follow social cues, like directing its attention to
    an object when someone looks at it. These cues become
    another channel through which to collect privileged
    information about a complex world.
This article kept getting better and better. I thought this was particularly interesting because it leads to a world where the AI software is available to all, but what's of value is the rules or metaphors you've developed for your robots. That would give garage hobbyists a chance of coming up with their own special formulas and possibly be the ones to figure out something novel or world-altering in AI.

This could potentially provide insight into Autism cases? Training an AI to be attentive to social cues, and then training another one to be not as attentive - and see what kind of results come out of each. This could also point us in a direction of what type of cues are the most important to pay attention to.

As I see it, the natural extension of identifying this "secret to teaching," is to try and extrapolate it to "unlearning," or memory loss, dementia and Alzheimer's.

If we can mimic and model the neuronal pathways and firings within humans, using AI, then we should also be able to study the relationship between the failure and degradation of pathways. AI pathways are subject to break and fail as well, though the causes are of course different. However, it would be imagined that there must be some shared structural stresses that result in the "unlearning" and failure to fire or function.

The potential for AI is immeasurable. If we can teach a robot, surely we can stress its system enough to "unteach" it. Despite being unable to force-feed it junk-food and/or vitamins/minerals, we can replicate environmental stressors, and once having generated the "unlearning" process, examine how to halt the degradation and perhaps reverse the trajectory.

> His advice gave no specific information on what angle the bow should describe, or how to move the fingers across the frets to create vibrato

And it's a good thing he didn't - violins don't have frets /nitpick

I wouldn't fret too much about it.

I'll get my coat...

Yes, perhaps you should bow out.

Aaaand the Reddit ~ HN equivalence is complete.

Actually it wasn't technically complete until your comment- the self-aware reference to the progression is the final step :)

I stopped reading where it says that SVM is used for big data. SVM can only process about 10000 examples.

Then you ignored all the inaccuracies before that sentence, and all of the insights after it.

Have you ever trained an SVM? Because 10000 examples is a laughably small dataset. It's not a problem for modern implementations.

Linear SVMs can be trained on millions of examples.

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact