

Ask HN: Is artificial intelligence/natural language processing a futile pursuit? - fezzl

Hi. I am a sophomore with a start-up that builds sentiment analysis technologies. We currently work on sentiment summarization and classification. Using papers and patents as a starting point, we try to create our own algorithm that we hope will outperform state-of-the-art approaches. We have been working on it for the past 2 months. Call it an irrational mindset, but we're adamant on building something that is not only tedious to copy but also fundamentally hard to duplicate.<p>We have concerns regarding what we are doing, namely: technology and market risks.<p>1) Technology risk
We are worried that we would hit a dead-end and not be able to build what we set out to build. AI is a hard technical problem, and Marc Andreessen has described AI as an "equivalent" of rocket science. We are making progress but our chances of technologically outperforming Google or other AI-based companies are inherently not good. We fear that we might have picked the wrong beast to mess with but we're a little too involved right now to switch paths.<p>2) Market risk
Other than social media monitoring tools, I haven't come across any solution that deals with sentiment analytics. The social media monitoring scene is crowded, so we hope to apply what we build in other unexplored areas. Our vision for now is to tackle the problem of information overload <i>within</i> consumer review platforms: think Amazon reviews, Yelp reviews, IMDb reviews, etc. We hope to offer an on-site (within-the-page-itself) dashboard that categorises raw data by mentioned concepts and their frequencies, the sentiments (positive/negative) with regard to said concepts and selected quotes that incorporate said aspects, along with sentiment insights in visualised forms (e.g. pie charts). End-users can interact with the dashboard and use any variable as an anchor to obtain insights (e.g. give me the raw data that mention concepts 1 and 2 in a positive light). This, we hope, would help people consume content more representatively and faster, without having to read through every single review from top-left to bottom-right. Hopefully, this would increase user engagement and shorten sales cycles. Naturally, the fear is that nobody wants our product, much less pay for it. Personally, as an end-user, I would find something like that useful, but I'm obviously biased.<p>Looking forward to some input for my current situation.
======
terra_t
I don't think NLP and AI are dead-ends, however, I agree with you that
sentiment analysis is a crowded area.

Personally I find I need sophisticated domain-specific heuristics to evaluate
consumer reviews in particular spaces. For instance, when I buy a lens for my
camera, I'm going to look at reviews, but it's tricky because there are always
good and bad reviews for any lens. Some of the people who get bad reviews had
a camera with a screwed up AF or they never really understood how to use the
lens or what its limitations were. Then you look at say, a Sigma lens for the
Canon platform and you'll see that different people are having wildly
different results and you'd better just forget about it.

Dashboards and stuff like that is a waste of time. What I really want is
something that creates a super-expert opinion that's maybe 3 sentences to a
paragraph wrong. A bit beyond the state of the art.

\----

More generally, I think Doug Lenat had the right idea with Cyc, but he went
about it the wrong way. Had Doug not been able to make a comfortable living
doing work for the government, he would have been forced to produce a
revolutionary product, but he wasn't.

I think that the linked data space around the semantic web is going to explode
and ultimately produce the "commonsense" knowledgebase that it takes to build
real NLP systems.

Most of the people I know in the knowledge-management space are trying to
develop expensive projects for government, pharma, legal discovery and such. I
think, however, they are in the "pay a lot, get a little" businesses that are
going to be disrupted by the next wave. On one hand you've got Google,
Microsoft and a few biggies that are going to develop large-scale but low-
margin products. The other side is going to be a vast, largely low-margin
market of operators who pull from and add to the great linked data pool...
which is going to grow like a Katamari ball until we reach the Singularity,
maybe around 2025 or so.

------
Zak
We got rocket science pretty much right 50 years ago. Don't let the fact that
it sounds hard discourage you. Somebody's going to get this right eventually -
why not you?

There have been a great many successes in the AI field. It's easy for people
to forget though; once something works, nobody calls it AI anymore.

------
syntience
Hi,

I have decades of experience with old style AI and a decade of the new kind
:-). I specialize in language understanding algorithms and near-pefect
sentiment analysis is something I expect we'll be able to do eventually using
the methods I've invented.

The top level bit to worry about is whether you are attempting to re-do
something that is already known not to work. Litmus tests: Are you using
models of language such as grammars that are explicitly programmed in? Is your
system specific to a single language so that switching to another would
require complete re-coding? Do you employ linguists? If you answer yes to
these, then you are in trouble.

I discussed how to get a modern AI education in an early blog entry at
<http://monicasmind.com>

I propose a shift in direction of AI research in the second video at
<http://videos.syntience.com> and explain why that's needed in the first one.
Three more videos discuss details.

I have a theory/motivational site (6 pages or so) at <http://artificial-
intuition.com>

I'm available for high level consultation on these issues. I worked at Google
(I quit 2006) and although I cannot talk about what they do, I certainly will
have an idea about what will be required to outperform them, both short term
and long term.

If anyone wants to support Syntience Inc. in our effort to get true
understanding to computers, please get in touch.

    
    
      - Monica Anderson
      http://syntience.com

~~~
limist
As a non-specialist, I enjoyed your blog and site writing, thanks for sharing.
In particular, your classification of problems and problem domains is
compelling, and another confirmation that reductionist approaches seem to be
hitting diminishing returns in understanding the world.

------
chegra84
The fundamental problem with AI is the lack of parallelization. There are more
connections in the human brain than there are atoms in the universe.

I am sure someone can write an AI program that is comparable with human
intelligence on paper at this moment in time. But, the complexity of the
algorithm would probably be exponential, hence massive parallelization is
necessary.

I think the ibm blue brain project has the right approach for AI.
<http://en.wikipedia.org/wiki/Blue_Brain_Project>

~~~
bobdole2695
Uh. Wouldn't those connections need to be made of atoms?

~~~
fezzl
Good point. Though I'm still sure that he's right in that there are many, many
connections in the human brain.

~~~
duke_sam
There are more potential connections in between neurons in the human brain
than there are atoms in the universe.

Or at least that is the way I heard it.

~~~
wlievens
There are more potential connections between my cat and your dog than there
are atoms in the universe. It's meaningless; just exponentiate something big
enough and you get over the atom count :-)

------
trevelyan
I work on NLP for Chinese text with the Adso project
(<http://popupchinese.com/tools/downloads>). This is a natural language
processing engine that handles segmentation, sense analysis and semantic
regexp for Chinese text.

In my experience, the complexity of most NLP applications work against them in
the sense that they're competing with simpler approaches that are less
computationally intensive. Good technology will let you do things other people
can't do, but if you want to make it a business you'll need to know how to
compete against Google doing 80% of what you can do with simple pattern
matching. Your results will need to be orders of magnitude better before you
have something people will use, let alone for which they'll pay. And be
careful not to get trapped in a field where you'll be competing against
companies paying serious cash for commercial databases to which you do not
have access.

I'm personally skeptical there is much of a market for sentiment analysis
incidentally, but the same tools are pretty useful for search (preprocessing,
etc.). I think you'll find it difficult to get third-party adoption unless
your product drives direct revenue for someone or can very visibly improve
their product. But the problems are important and worth solving!

------
danielh
1) Trying to solve any hard problem bears the risk of failure. But as you
already mentioned, if you succeed, you might have a competitive advantage,
because it can't be easily duplicated. I think the risk of Google entering
your market affects every internet-related venture. But at the same time, the
fear is often unjustified. How often has that happened? Orkut did not kill
Facebook, Buzz did not kill Twitter, etc.

2) I would build the technology and a cool showcase, but I would not directly
target the consumer, but rather someone who can use your technology to make
money.

Your technology might be interesting for advertisers to ensure that ads are
only displayed next to articles with a positive sentiment towards to product.
E.g. to avoid Toyota ad next to the latest news on stuck gas pedals. Maybe
show a Volvo ad instead. I don't know if something like this is done at
present.

~~~
fezzl
Hi danielh,

Yes, there are already advertising networks that do semantic-targeted
contextual advertising, e.g. peer39.com, collective.com. Not sure how they are
doing though. Either way, it is something that we will look into. Thanks.

------
chasingsparks
Cupitor impossibilium.

I've been working on alternative pricing algorithms for 8 years. Most people
call this a waste of time. Recently, one of my algorithms started showing
great promise. (I'll know within two months.) Do what fascinates you.

------
msbmsb
I think it's more than quite a leap to go from worrying about the
marketability of your specific idea to asking about the futility of two very
broad fields of research.

I work in NLP, my company has a sentiment analysis product. It's a very small
part of what we do, and it's focused on a particular application. NLP itself
has a very wide range of applications, some theoretical, some practical, some
already in use all day long at very popular websites. Two months is a very
short time for complex problems in NLP, believe me.

Regarding tech risk, I think you're taking your particular product niche and
casting it to something unnecessarily broad. Polarity detection is just one
possible task in NLP, which is itself a sub-domain of AI. It can be done
reasonably well without using any real NLP techniques even (hence the crowded
social media monitoring scene). That paragraph made it sound to me like you
are stressing yourself out unnecessarily about 'solving AI'. It's sentiment
analysis (which can be a challenge to get right), focus on that.

Regarding market risk, marketing new technology is a challenge for any new
technology, and any given startup will have the same questions that must be
asked and answered. The application of your idea may be marketable or it may
not be, that has little bearing on the viability of AI/NLP. The best you can
do is make sure the company has done due diligence and has a plan, hopefully
with some market research to back it up.

------
JamieEi
I'd start by trying to validate your business model.

1) You need to find out who the paying customers are in this space and what
features they really want. It seems very possible that average consumers would
have no interest in your service but that power users, marketers or some other
segment might. Once you know where the interest is you can make better feature
decisions. For example, you might find that reporting, not AI, is what drives
marketing sales.

2) Fundemental improvements in AI are really hard. Look at the results of the
first Netflix prize. Lots of world class teams worked on that problem and the
winning solution only produced something like a 10% improvement in the
results. If that's the only edge your business has over the competition I
doubt consumers would even notice. On the other hand, AI has matured to the
point where it's pretty easy to produce good results. I'd rather bet my
business on predicatble, good results and treat it as a pleasant surprise if
we make a fundemental breakthrough.

Good luck!

------
aufreak3
"We are worried that we would hit a dead-end and not be able to build what we
set out to build."

Listen attentively to your intuition. Your "worry" can stop you faster than
any statistical or third opinion about whether AI is a hard problem. Doesn't
mean you should stop worrying and continue. Spend time imagining your ideal
solution in great detail --- touch, smell and taste it if you can. Judge based
on how you feel about that end point.

I heard someone said "AI is what hasn't been done yet." You don't need to
prove your worthiness to anyone or any community by "solving a hard problem".
Instead, as chasingsparks put it, "do what fascinates you".

------
noelwelsh
Don't worry about the technology risk. People already do this, so it isn't
impossible. They probably don't do it well, but you only have to do better
than random to provide benefit.

Market risk is more of a concern to me. If you're presenting results directly
to users they'll probably care greatly about the quality of your algorithm. If
you're just using aggregated data to drive, for example, marketing campaigns
then the noise will be washed out given enough data.

------
pguerin
No, it's not futile! Finding ways to get good information fast with the data
explosion is one of the challenges we face in computer science. Anyway, the
good folks at GATE are building a great set of tools for NLP for several
years. The tools are similar to what you are doing. You are not crazy and we
need more people like you to advance computer science! <http://gate.ac.uk/>

------
fezzl
I was probably careless when I did my research, but I just found a company
that does the exact type of sentiment analysis on Amazon reviews (as I have
independently envisioned):

[http://techcrunch.com/2008/06/30/pluribo-is-cliffsnotes-
for-...](http://techcrunch.com/2008/06/30/pluribo-is-cliffsnotes-for-amazon-
reviews/)

Not sure how to react to this but sigh.

------
sdrinf
I'd like to address the market risks:

Did you know, that large retail sites increasingly employ people to post
merited positive, and no-merit negative reviews to funnel consumers into a
buying decision? And, by economy of scale, they do this in volume -usually
massively overwhelming genuine on-site reviews. Working out the impact is left
as an exercise for the reader.

------
ffhix
Hmm. you should probably look at <http://adaptivesemantics.com/> . I just
heard their founder present at SXSW, who was introduced as a "machine learning
guru". they do machine learning for sentiment analysis

------
simon_
Certainly they're not futile to pursue in limited domains.

There's a paper I'm trying to find for you about analyzing affect in news
articles for the purpose of trading stocks - I think they got things working
fairly well.

------
arethuza
You need a catchy application that lets people understand the benefits of this
technique, why not use it to rank the in real-time what things celebrities on
Twitter are talking about... or something like that.

------
zackattack
This sounds like a really neat product and you sound really smart. I would
love to chat with you because I have some (hopefully) unique ideas about how
to tackle this problem, and would love to share. My email is
zackster@gmåil.com

