

Why Are We Still Waiting for Natural Language Processing? - pesenti
http://chronicle.com/blogs/linguafranca/2013/05/09/natural-language-processing/

======
lake99
Geoffrey Pullum is a reputed professor of Linguistics. Yet, the article comes
off as written by an awfully ignorant person.

Forget pragmatics. There won't be a generic (cross-domain) solution for that
any time soon.

What happened to Powerset? Doesn't really matter. The brains behind it haven't
been killed off. For example, Ronald Kaplan, the top-brain behind Powerset,
now works for Nuance Communications. I suppose he will focus on making sure
your voice interactions with computers will be smoother.

"How could we have drifted into the second decade of the 21st century with
absolutely no commercial NLP products?"

What the heck does _that_ mean? Our smartphones understand voice commands. We
have decent TTS and speech recognition. The people behind IBM's Watson seem to
be targeting medical knowledge discovery. With a quick scan on Wikipedia, I
came across Distinguo, automated essay evaluation, Google Translate makes my
life so much easier already, etc.

If Pullum meant that he is still not able to buy an oven that understand his
voice commands, sure. But saying that we have "absolutely no commercial NLP
products" is braindamaged.

------
nemothekid
Didn't PowerSet become Bing, for the most part? I remember when Bing was
originally launched, they flaunted it because it could show useful information
instead of just "search results." (The feature seems to be scrapped now, but
just in case you don't believe me - here is an ad that 'claimed' this
<http://www.youtube.com/watch?v=NHmzzLt8WFA>).

It seems NLP is a problem that the author seems to underestimate. IIRC, real
speech recognition is a Strong AI/AI Complete problem. For a machine to detect
the nuances in regular speech requires that the machine be as smart as at
least a small child.

I don't have a lot of knowledge in AI so take my statements with a grain of
salt. (Artificial Intelligence: A Modern Approach was a great book, but it
made me realize I am not really interested in the field).

If we ignore the pragmatic step, you aren't really doing NLP anymore, but just
really fancy keyword matching. The query "Can your site be accessed without
using Internet Explorer?" is just broken down with its syntax ("Your site",
"accessed", "without", "IE", I'd imagine) and then each term is either
translated into a keyword or an operator (semantics). The you end up with
Google:Hackernews accessed NOT Internet Explorer (maybe) when what you really
wanted was something closer too "list of supported web browsers" union
"internet explorer".

Without the pragmatic step there isn't any way to translate "accessed .. IE"
to "list of supported web browsers"

Maybe I'm missing something, but the whole pragmatic step seems to be the
bread and butter of NLP, anything else just seems to be advanced querying. The
whole pragmatic step could answer :

"Can your site be accessed without using Internet Explorer?"

"Does your site support Internet Explorer"

"Is Internet Explorer supported"

"Can yr site b acccsseed wit IE?"

------
Sven7
Simple answer - Google has no competition.

And the competition has no chance given the scale required to just match, let
alone surpass "google quality".

This isn't going to change any time soon, unless they create a search market
place by opening up their index.

If one were to ask Larry Page what his number one priority was, it is a good
bet his answer won't be search.

Compare what Google has on offer with this
<http://www.wolframalpha.com/examples>

Why such a big disparity?

~~~
adventured
Sure, after all, who could possibly compete with IBM. Or Microsoft. Or
AltaVista. I mean, AltaVista - backed by DEC - spent a relative massive amount
of money at the time on what seemed like an index of insurmountable scale.

Who could ever compete with Facebook? You'd have to spend billions to
replicate what they do, just to get off the ground.

The reason you're wrong, is because paradigm shifts is what matters, not
chasing the past. That's the opening that makes it possible to replace Google
(not compete head-on with them). I don't care how well organized Google is,
they're so large now, they will run into the same clumsy elephant problem that
every other mega company in world history has.

The problem isn't that Google has no competition. You should not be trying to
solve the problem Google solved in 1998. That is over, they're riding the
benefits of a solution from 15 years ago, extrapolated forward and polished.

You never compete head on with the past, you're guaranteed to die that way.
You chase the future and replace the past. Look for the inflection, and bury
Google in the past accordingly.

Need a better example? Windows was feared as a practical god of wrath in
technology circa 1997/98. Who could ever hire enough engineers to write the
tens of millions of lines of code to compete with that? It was the omnipotent
titan that could never be challenged. Want the Windows obituary? It reads as
follows: it never was challenged, it was replaced by the future that looked
nothing like Windows 95/XP/Vista/8/etc.

The future begins with a new - obvious in hindsight - solution to a massive
problem. Typically this solution will start out simple in concept, but can
scale massively. It is almost always extremely cheap to instigate (relative to
the value long term). See: PageRank.

The future does not begin by going after a leviathan like Google and playing
the game the way they do, and spending money like they do. It never works that
way. Chasing the past is usually the most expensive thing you can ever do.
Just ask Bing.

~~~
Sven7
Barrier to entry makes all the difference.

You are not going to see a 20 year old Woz or a Torvalds today build an NLP
system of any serious utility. They just do not have access to the kind of
infrastructure required to do it. So comparisons to OS evolution, MS etc are
moot.

You need data, clean data and massive amounts of it to do accurate natural
language processing. Very few org's have the data required to pull it off.

And that can be seen in the limitations of Siri, WolframAlpha and IBM's
Watson. The human brain takes a decade or more of processing all kinds of
data, to build up a language model good enough to have a productive
conversation. So even if these companies come up with the better learning
algorithms, required to build up a decent language model, the bottle neck will
always be access to quality data.

Google has the data, but since they are off busy defending their advertising
empire, innovation in search and therefore NLP has taken a backseat.

For the paradigm shift in NLP to happen, things like Siri, Watson and
WolframAlpha would need access to Google scale/quality data.

Google can spur things along by building a marketplace around search so people
can access the data.

------
pesenti
Actually NLP has made great progress in the last few years and there are
commercial products now based on it: Watson, Siri, Google Now are good
examples.

~~~
hvs
This is amusing because just today, while I was driving decided to attempt to
use Siri (something I do about once a month to remind me why I don't use it).
I said, "Nirvana Nevermind". "Sorry, I don't know how to do that." "Play
Nirvana Nevermind". "Sorry, I don't know how to do that." "Search music
Nirvana Nevermind". "Sorry, I can't search for music."

Great stuff, that NLP.

~~~
indubitably
Well, your last example is evidence that there was a successful interpretation
of the sentence.

Things like machine translation, text to speech, and natural language
understanding will only gradually get better.

If you compare how well something like Siri does now (not very well) to
similar “tools” 10 years ago (awful), it’s hard to deny that some progress has
been made.

~~~
rhizome
I'd say it's not too far advanced beyond where IVRs were 5-10 years ago. It's
a hard problem.

------
BjoernKW
Processing natural language is a complex problem. Most syntactic parsing
algorithms have a complexity of O(n^3). Add semantic reasoning and pragmatic
understanding on top of that and you're way beyond what any computer system
available today can process in reasonable time.

There are approximations using statistical algorithms and heuristics such as
chunk parsing and those work quite well. In fact, this might very well be the
way how human language understanding works as well, i.e. human brains most
likely don't need to completely parse a sentence for understanding it but use
statistical reasoning and heuristics, too.

Nevertheless, while studying grammar is a discipline almost as old as
philosophy, linguistics that is the search for a true understanding of the
building blocks of human language is a rather young discipline. When we still
don't even have a comprehensive grammar of the English language including all
its idiosyncrasies and varieties how can we expect to accurately model
languages in computer programs?

~~~
indubitably
Human language understanding certainly doesn't require a listener to
completely parse a sentence before beginning the process of interpreting it.
There is a lot of evidence from psycholinguistics that humans begin the
interpretation process of a given word before the end of the word is even
pronounced.

~~~
whatshisface
The only reason humans can do this is because our language interpreter is
separate and parallel to our language parser. Unless we are making a language
processing chip (transistors can be fully parallel) we want to avoid wasting
work.

~~~
Ihmahr
Why "separate and parallel to our language parser"?

~~~
whatshisface
Neurons are a bit like transistors, it's a lot harder for them to rewire than
for them to perform their already-assigned task. As a result, it is most
efficient to have blobs of neurons configure into task regions and do the same
thing over and over.

If you've already got a hardwired parser and a hardwired interpreter, there's
really no reason to not have them run in parallel. (We have plenty of calories
to run all the neurons that we need to, so any small chance of a benefit is
worth a try.)

~~~
indubitably
You're pulling all of this out of your ass, right?

------
kdavis
We at the startup www.forty.to are working on just this problem, question
answering. We've made some progress too.

If you're interested here's a YouTube demo video <http://youtu.be/VDUw4oLU7no>

------
pfortuny
There is this simple thing linguists do not get: real language is much harder
than linguistics.

Real life phenomena tend to be more complex than the equations "describing
them". Imagine if instead of equations you have Linguistics...

