

Peter Norvig on Innovation in Search and AI - sinamdar
http://www.notjustrandom.com/2009/12/09/peter-norvig-on-innovation-in-search-and-artificial-intelligence/

======
ratcliffco
He did a similar talk at Startup School 2007 (I think) where he showed how
relatively dumb algorithm can build a database of simple facts. It worked very
simple: crawl a TON of web-based text and look for patterns like "XXX such as
AAA,BBB and CCC" -> this way you'll learn that cats, dogs and monkeys are
animals.

------
Smerity
The most important detail noted is training over more data gets higher results
than using a better algorithm. One of the recent pushes in the field of
machine learning and natural language processing has been trying to bootstrap
larger training corpora from smaller initial sets.

As an example, one of my friends did her thesis on trying to use simpler
sentences (that you're either confident you have parsed correctly or have gold
standard (i.e. correct) training data for) to parse more complex but related
sentences (see [1]) This is useful as even if you don't have a huge amount of
gold standard training data statistically the parser is far more likely to get
the derivation correct for shorter sentences than for longer. Using those
shorter sentences you can help in parsing longer sentences.

That's why Google is so powerful. I spent a summer internship there and they
have two really powerful things - data and the tools and techniques to handle
it. In one afternoon a single employee could run through more data than entire
companies would use for months.

[1] "Mozart was born in Salzburg in 1756" vs "Wolfgang Amadeus Mozart was born
on the 27th of January 1756 at 9 Getreidegasse in Salzburg" (the latter is a
slightly modified example from Wikipedia)

------
dmix
Be careful listening to this one with headphones w/ high volume. The sound is
pretty bad.

~~~
peropaal
The sound is okay, it's just clipping in the introduction. Skip to 00:02:30

------
anon42389475
audio is clipping for me

