
Ambitious Artificial Intelligence Project Operating In Near-Secrecy For 30 Years - velodrome
http://www.businessinsider.com/cycorp-ai-2014-7
======
SilasX
Doug Lenat is, to me, the most confusing case of someone in AI. He programmed
a general problem solver (Eurisko) to use (meta^n)-heuristics that solved a
major strategy game, coming up wih a creative plan no human thought of, and
yielding insights on the field of heuristics and the "Representation Language
Language".

... and then "went dark", "officially" working on this tedious, brittle
attempt to compile common sense into a graph, which yielded nothing in
practical application.

It's like we're in a movie and we're about to get a big reveal that he's
really been using Eurisko to solve major untouchable problems.

~~~
kolev
Since when people call him "Doug"? :) Eurisko had some amazing
accomplishments, but some questioned their authenticity saying Douglas and
students help it a bit. Anyway, Douglas Lenat is my all-times idol and I hope
this prolonged "stealth mode" was the intermission of something great. Isn't
Freebase.com his project as well?

~~~
candeira
> Since when people call him "Doug"?

One datapoint:
[http://archive.wired.com/wired/archive/2.04/cyc-o.html](http://archive.wired.com/wired/archive/2.04/cyc-o.html)

So for more than 20 years.

~~~
kolev
When I was reading about his work, it was the 80s, early 90s. :)

~~~
kolev
I'm not sure why I got downvoted, but, really, back at those times, maybe out
of great respect, but he was "Douglas". :)

------
fiatmoney
If you read "Why AM and Eurisko Appear To Work" [1] and the precursor papers
it becomes clear that they relied heavily on humans annotating what counted as
an "interesting" concept, both for training and for extraction of concepts to
publicize. Not necessarily a bad thing (every supervised algorithm need
supervision) but it was a little hype-y ("my robot has independently
discovered natural numbers!"). They also ran into major tractability problems
as the heuristics got more involved.

Under the hood it was basically genetic algorithms over a meta-object protocol
to extract well-scoring arrangements from ontologies, which is damn
interesting, but combinatoric complexity bites you every time.

Kenneth Haase published a couple of papers dealing with some of the issues in
more depth.

[1] [http://eksl.isi.edu/files/library/Lenat_Brown-1984-why-AM-
an...](http://eksl.isi.edu/files/library/Lenat_Brown-1984-why-AM-and-EURISKO-
work.pdf)

------
brianstorms
Ha, near secrecy. Sheesh, Lenat is a well-known figure in the AI world and Cyc
is a very well-known project that has been written about for years. I don't
care if the BusinessInsider author is what, 28 years old, he should do some
decent reporting and not use such a stupid headline.

~~~
waterlesscloud
As soon as I saw the title, I knew who and what it would be about. :-)

------
tbenst
I tested OpenCYC three years ago while working for a startup that did semantic
tagging & recommendation of text-based content.

Essentially, we would take something like:

Gov. Rick Perry has said he will no longer wear cowboy boots, which some
believe is part of an attempt to soften his gunslinging image as he considers
another run for president.

And map it to something more machine-readable:

    
    
      "Gov." -> http://dbpedia.org/page/State_(polity) (10% confidence)[1]
      "Rick Perry" -> http://dbpedia.org/page/Rick_Perry (95% confidence)
      "Rick Perry" -> http://sw.opencyc.org/concept/Mx4rM7N6iOeUSpGar4HaqXF3zg (95% confidence)
      "Cowboy boots" -> http://dbpedia.org/page/Cowboy_boot (75% confidence)[2]
      "President" ->http://dbpedia.org/page/President (90% confidence)[3]
    

Let's dive into what you get from OpenCyc vs DBpedia (ontology sourced from
Wikipedia).

DBpedia: Tons of machine-readable information like Party, Alma Mater,
birthday, spouse, etc. Extensive categorical links (dcterms:subject) like
category:United_States_presidential_candidates,_2012.

OpenCyc: Knows he's a politician affiliated with "Republican Party,"
"Democratic Party," "The Republican Party," and "The Democratic Party"

I highlight only one example, but this is all over the place. Opencyc has
duplicate terms where dbpedia/wikipedia does not. Opencyc has far less
information. Opencyc has more incorrect information.

This is inevitable when you consider the two approaches. Wikipedia has tens of
thousands of people making connections and updating the resource, where
opencyc relies more on scripts. Opencyc, and quite possibly Cyc, is already
antiquated by Wikipedia.

[1] Note that the confidence was estimated from heuristics I wrote based on
how the ontologies were put together

[2] OpenCyc typically does not have mappings for plurals, while wikipedia has
a very convenient redirect system for string mappings

[3] OpenCyc matches President with 60 concepts. Too much noise to do anything
with

~~~
jfdixon
I am not surprised by your results. However, the system is more robust than
your experience would indicate.

OpenCyc is a subset of ResearchCyc, which itself is a subset of (Full)Cyc.
OpenCyc is primarily used for mapping between ontologies. It contains 239k
concepts from ResearchCyc, but only the basic rules for definitional
relationships between them. These relationships include part/whole,
disjointness, etc.

You mention DBPedia as being superior for your purpose, but I would counter
that the two are complementary. There is a mapping between DBPedia and OpenCyc
within the Linking Open Data cloud. In fact, it was one of the first
ontologies contributed to the W3C's LOD initiative[1][2].

The concepts in OpenCyc are _rigorously_ organized from most general (e.g.
Thing) to more specific (e.g. board game). Each concept may have specific
_instances_ (e.g. Yahtzee, Trivial Pursuit, Scrabble, etc.) These primitives
all live within a custom Lisp, where they may be reasoned over. DBPedia's
structure arises naturally from user activity. It is organized primarily by
Wikipedia's category system and includes individual pages.

Unlike Wikipedia, the Cyc project does not aim to contain every instance of a
concept. The relationships between concepts are what matter. Once one knows
that something belongs to a given Cyc concept, one can leverage the system's
knowledge to reason about it.

OpenCyc's reasoning capability is limited by a lack of assertions (facts and
rules) -- ResearchCyc's is not. ResearchCyc contains over 5 million assertions
not present in OpenCyc. (Things like: water is wet, a dog is a mammal, mammals
have hair, etc.) It also contains Natural Language tools not present in
OpenCyc: parsers, taggers and more. With these tools, one can go from natural
language to a formal logic representation. Or, given a formal representation
generate natural language. These capabilities exist today in real world
applications[3][4].

[1] [http://lod-cloud.net](http://lod-cloud.net)

[2] [http://lod-cloud.net/versions/2007-10-08/lod-cloud.png](http://lod-
cloud.net/versions/2007-10-08/lod-cloud.png)

[3]
[http://videolectures.net/coinplanetdataschool2011_witbrock_c...](http://videolectures.net/coinplanetdataschool2011_witbrock_cyc/)

[4]
[http://videolectures.net/coinactivess2010_witbrock_lkc/](http://videolectures.net/coinactivess2010_witbrock_lkc/)

------
kevin_thibedeau
No it hasn't. Cyc has been widely reported on outside of academic circles for
decades. It last got a lot a buzz when Watson was preparing to debut on
Jeopardy.

------
cwhy
Low profile? Really? Or still in the winter of the old AI? Cyc more like
hardcoded more than "taught" as stated in the article.

~~~
gnudon
Agreed. Cyc is far from low profile. It was in Omni magazine back in the day -
and it's a common example of (problematic) GOFAI approaches.

~~~
zenbowman
Yeah it is hardly low profile. It was a very well funded project that promised
the moon and failed to deliver despite absorbing a massive amount of DoD
money. A lot of AI researchers at the lab I worked at weren't big fans as
there was the sense that Cyc was a shining example of how money spent on
symbolic AI was just wasted.

------
TrainedMonkey
Examining Company news section of the cyc website is way more interesting that
the article: [http://www.cyc.com/media-coverage](http://www.cyc.com/media-
coverage)

And here is how that page looked in 2003:
[http://web.archive.org/web/20031204204518/http://www.cyc.com...](http://web.archive.org/web/20031204204518/http://www.cyc.com/news.html)

Quite a few of the publications are missing from most recent one. It is also
worth noting that news page link was broken from 2003 until recently.

------
Alupis
30 years in the cooler? I would be very afraid of finally revealing 30 years
of hard work, only to find out it's irrelevant now...or that we pursued in the
wrong direction.

Does not seem to be a very good way to go about revolutionizing any
technology...

------
pitchups
Funny story - when I first started using Twitter a few years back, I had
tweeted about some AI news. Pretty soon after that , I had a new follower that
was trying to have a conversation with me - it was retweeting stuff randomly
and sending some weird replies to me - it all seemed quite odd. Turned out
that it was @cyc_ai - the Twitter handle of the the Cyc AI system, presumably
trying hard to emulate a person - but failing unfortunately!

------
jnord
When I read this story the first thing that sprung to mind was Ted Nelson's
Project Xanadu, described by Wired magazine as "the longest-running vaporware
story in the history of the computer industry".

As any entrepreneur will tell you, it is never a good idea to have long-
running projects with few public deliverables. Projects like this need new
blood, new ideas and continuous user validation in order to remain relevant.

~~~
jfdixon
Xanadu is an unfair comparison. Cyc is deployed in real world
applications[1][2].

[1]
[http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClin...](http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClinic/)

[2]
[http://videolectures.net/coinplanetdataschool2011_witbrock_c...](http://videolectures.net/coinplanetdataschool2011_witbrock_cyc/)

------
mark_l_watson
The free to use OpenCyc is fairly nice, I have been playing with it
(occasionally)for years. My writeup of using the OWL/RDF dump:
[https://markwatson.com/blog/2013-08/opencyc-owl-
stardog.html](https://markwatson.com/blog/2013-08/opencyc-owl-stardog.html)

------
chatmasta
> "It's not done by any means, but it's useful."

Is it? How do we know? You've been working on a product "in stealth" for 30
years, shown practically nobody outside your company, and provided us with
nothing more than a vaporous description of its capabilities (in
Businessinsider, no less). I'm left wondering not only how seriously I should
be taking this claim, but also exactly what the claim even is.

Artificial intelligence is _hard_. I did an internship at Numenta [1], where
Jeff Hawkins is approaching AI from this same biological-first model. He
hypothesized how a small subset of the brain works, constructed a model of it,
and admirably hired dozens of engineers to build it. The guys behind the
Numenta software are some of the best engineers I've met, with combined
centuries of experience, and it's taken them almost _ten years_ to get the
software to its current, extremely primitive state. Right now, the model is
implemented, and you can use the API to apply it to specific applications
(predictive analytics, anomaly detection). But we are a _long_ way off from
the capability of applying it to "general input."

The fact is, AI is not going to impress anyone until it can handle an
equivalently general class of inputs to what a human can handle. And think
about the inputs that humans receive. The five senses are just the beginning.
They're the _concrete_ inputs we receive from the world. In addition to them,
there's a near infinite class of more abstract inputs that build on top of
them. As humans, we interpret social cues, hormonal feedback, the emotions our
body often inexplicably generates, and dozens of other "second-order" inputs
that are derived from our fundamental sensory ones, but seem equally
fundamental to us.

Ok, sure. Maybe you can model the brain, and maybe it can respond to inputs
from its sensory environment. But those inputs are the absolute lowest
building block of the totality of input into our brain. We have an entire
subconscience dedicated to processing those fundamental sensory inputs, and
generating derivative ones for our consciousness to process. The difficulty in
AI, in 2014, is modeling that subconscience. How do we go from basic
environmental input to its infinitely more complex derivatives? How do we
build a subconscience that our current AI can interpret as an input of its
own?

We are a long way off from this capability. Nobody is going to figure it out
on their own, and neither is a company of a few dozen. To me, it seems
incredibly wasteful to spend 30 years working on this product without
revealing any of the journey. After all this time, what do you have to show
for it? Basically nothing. Your AI can apparently complete tasks of a similar
complexity to state of the art that's been developed in less than a decade.

I want to see models, and I want the community to discuss them, problem solve
with them, and build on them. Enough of this closed source, proprietary,
snail's pace AI development.

Software is constrained by computing limitations. Fundamentally, there can be
no computational model that even approaches the complexity of a brain. At
least not today. So why not forget about trying to build Skynet, and work with
the community to further the field together?

[1] www.numenta.org

~~~
Houshalter
There is so much hostility in AGI research. Everyone always arguing that
everyone else is doing it wrong and that their approach is the One True Path.
Progress is slow. When it happens it's debated if it's actually progress, or
another step towards a dead end.

The only real feedback is failure. One camp goes a long time without producing
an AGI. That's considered "proof" their approach will never work and that the
New Idea will.

I'm just going to make a fictional analogy. It doesn't prove anything, but I
hope it illustrates my frustration:

"Look at these idiots trying to build artificial flight by putting together
large piles of feathers. Can't they see feathers are just pieces of a much
more complicated puzzle?

And over there another group is dropping objects off buildings. Trying to find
things that can stay in the air the longest. Even if they find something, it
will never actually fly, just fall slightly slower.

There are people inflating balloons with gasses. Have you ever seen a bird
made of balloons? It's absurd. Maybe eventually they will get something that
hovers off the ground. But it will never fly gracefully like an eagle. It will
never fool pigeons into thinking it's another pigeon.

One group has managed to make really strong artificial wings. They have gotten
them stronger and faster than even real birds. But they need an entire steam
engine to power them. And they don't even use feathers.

My group is going to build an artificial bird from the ground up, based on
real biology. Teams of mechanics are working on designing tiny joints and
artificial muscles. We have a prototype that can flutter across a room. At
this rate we may only be decades away from true artificial flight."

~~~
mtdewcmu
That's probably where flight research would currently be if a few of those
approaches hadn't achieved stunning success. Those successes would have
silenced any groups that were holding out for artificial birds.

------
yottah
If you look up their reviews by ex-employees you will find they are extremely
negative, talk about lots of workplace bullying, insufficient resources and a
vicious upper management that doesn't respect their workers.

------
savdi
Human level artificial intelligence is a distance dream but can't say, virtual
assistants like Google Now
([http://www.google.com/landing/now/](http://www.google.com/landing/now/)) and
Braina
([http://www.brainasoft.com/braina/](http://www.brainasoft.com/braina/)) are
already doing a decent job. May be we have to wait for 10 more years to
achieve what is called AI-Complete or Strong AI.

------
chazu
This is pretty funny to me, because Cyc is actually a pretty high-profile
project. Cursory searches on the topic of knowledge engineering return links
to material on Cyc, not to mention that older AI textbooks mention Cyc as an
example of a nascent large-scale knowledge engineering project.

------
sirseal
Hmmm....smells like BS.

