Open Cyc (open source common sense)

elblanco · on Nov 17, 2010

The Cyc project, while fascinating, is basically a serious attempt at a general purpose semantic graph. It's been in the works since the early 80's.

And you know what? It should act as a warning to anybody thinking seriously about building large-scale semantic systems like the semantic web.

Cyc is pretty much useless (or at least the utility it provides is not exactly setting the world on fire). Until somebody can figure out how to employ an already very mature semantic graph like Cyc and do something(s) truly useful with it, don't waste anybody's time advocating the semantic web.

More imporantly, Cyc is a highly controlled semantic graph, with all of its information carefully curated. Imagine how terrible the results of semantic inference engines will be in the wild-west of the web!

gibsonf1 · on Nov 17, 2010

I looked at CYC a few years ago and found the underlying epistemology of the effort flawed in such a way that the result is a mass of contradictions in the project. Their solution to that was to try to zoom in on tiny areas to avoid the contradictions with other areas - but in the factual world, there really aren't these contradictions. The other clear sign of serious trouble was their explanation that adding definitions to the system was taking longer and longer - if anything it should accelerate if their underlying structure was good.

I think the CYC project should be used only as an example of spending a huge amount of time and money on a flawed structure as opposed to a condemnation of semantic approaches.

stcredzero · on Nov 17, 2010

but in the factual world, there really aren't these contradictions.

The internal mental furnishings of most human beings are seemingly full of these, however.

gibsonf1 · on Nov 18, 2010

This is why a factual digital ontology corresponding to the real world would be so valuable as opposed to us fallible human beings.

stcredzero · on Nov 18, 2010

Really? So why do so many human beings get paid 10's of thousands of dollars a years to exercise theirs, while it's relatively few digital general-purpose "digital ontologies? that are worth paying for.

(The highly contextual ones do pretty well in the market, though.)

elblanco · on Nov 17, 2010

I don't disagree re: Cyc.

I am however, trying to draw an analogy between Cyc and an open Semantic Web. If Cyc has issues like the ones you describe, these issues will only be magnified ad infinitum in a completely open graph where there are effectively no controls.

gibsonf1 · on Nov 18, 2010

It's a great opportunity for a company to nail the ontology in a scalable way, especially if the solution can use industry standard triple stores like allegrograph. Serious epistemology geeks are needed for the effort, however.

jackfoxy · on Nov 17, 2010

I couldn't agree more. I've been following Cyc off and on for 20 years. It has yet to produce anything that interests me. I have yet to see a good project I can get my hands on that gets anything OUT of Cyc. Last time I checked, after decades of input into Cyc all the papers on the Cyc sites were still about new ways to put into Cyc. They started an annual contest a few years ago for Cyc projects. I actually hacked together a not very good proposal to attempt to get value added information out of Cyc; and the winner? An academic proposal on another method of feeding the Cyc knowledge base.

elblanco · on Nov 17, 2010

It's Achilles Heel of all general purpose semantic graphs, "it'll work if only we add more stuff to the graph!" except I'm not convinced that the approach will ever work, no matter how large the graph is.

jimcog · on Nov 18, 2010

(non-computer tech here) Critical points taken, but what about the value to the Cleveland Clinic's cardiology dept.? See article by Clinic's IT semantic web project staff at http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClin.... Isn't that the utiility of a deductive reasoning engine - taking disparate data overwhelming to humans - and cranking through it as queried in human language terms? I'm just asking.

goodside · on Nov 17, 2010

Before you try to encapsulate common sense in a collection of obvious facts, you might want to start with a simpler task: Collect a database of elementary-school math problems and their associated answers until the database is so large that it coalesces into a fully functioning CPU.

If this seems like a bad idea, it's because it is.

mjw · on Nov 17, 2010

I'm not convinced by Cyc either, but this argument is a bit off. A system like this would need to define axioms and inference rules. But it wouldn't need to collect upfront a database of every possible inference which might be made from them.

(In mathematics you don't technically need very many axioms or inference rules, although you do need a large body of heuristics and hints if you want a proof system which confines itself to proving inferences that are actually useful in some sense of the word, rather than proving a combinatorial explosion of trivial theorems. Dealing with the body of elementary arithmetic problems, however, wouldn't necessarily be particularly intractable -- last time I checked software proof systems can already deal with proving theorems in areas of mathematics like this and a fair bit further.)

iwr · on Nov 17, 2010

Maths is a self-consistent and closed system. From that point of view, the nature of axioms is irrelevant. You can have an unlimited kinds of axiomatic systems, each consistent and each explanatory in its own way. What is a theorem in one system can be treated as an axiom in the next as long as the system itself leads to no contradictions.

Given that human knowledge encompasses more than one axiomatic system, it would be foolish to endow a system designed to replicate human knowledge with an immutable set of axioms.

Please watch this presentation of Richard Feynman on the nature of maths and physics: http://www.feynmanphysicslectures.com/relation-of-mathematic...

jerf · on Nov 17, 2010

Your argument has a fundamental flaw; the information content of a large set of elementary math problems is minimal, the information content of obvious facts is much, much higher. A few hundred bytes in any decent programming language can generate the first on demand in a fraction of a second, if you can do that for the second set you deserve every accolade you will receive. It is not at all obvious that any property of the first will apply to the second.

I think Cyc is a joke, too, but your argument doesn't hold.

goodside · on Nov 18, 2010

The information content of elementary school math problems is quite high. They contain lots of names of hypothetical multi-ethnic children, statements about adding and taking away apples, amounts of money needed to purchase carpeting for rooms of particular dimensions, etc. Excluding this information because you know in advance it's useless for building a CPU is cheating, since it uses your knowledge of what a functioning adding machine looks like.

thesz · on Nov 17, 2010

You wouldn't believe it, but principal Cyc developer Douglas Lenat started almost exactly with that. Eurisco, a predecessor to Cyc, was a deduction machine that operated with minimal human intervention over minimal sets of number theory axioms. It rediscovered many theorems on its own.

Douglas Lenat: http://en.wikipedia.org/wiki/Douglas_Lenat Eurisco: http://en.wikipedia.org/wiki/Eurisko

giardini · on Nov 17, 2010

Lenat never released the source code to Eurisko. There has always been a question as to how much he did and how much the program did.

Ideas that Lenat promoted with Eurisko and Cyc have made him successful by most criteria. But it would be better had he published a proven finished product, including it's innards. My feeling is that, in publishing, one should show the code or it never happened.

some1else · on Nov 17, 2010

Do I understand this correctly? OpenCYC now offers the entire ontology AND assertions for free? As I understand, until recently, OpenCYC only featured a subset of the entire CYC database, just the ontology without assertions. Can someone please explain this in greater detail?

SlipperySlope · on Nov 17, 2010

Easy. The OpenCyc ontology includes all the simple terms (Cyc constants) and composed terms (Cyc non-atomic reified terms) that full Cyc does. OpenCyc includes that subset of of full Cyc assertions which match the definitional statement types of OWL (the web ontology language). E.g. the class cyc:Action is a sub class of cyc:Event which is a sub class of owl:Thing. What is left out of OpenCyc, but included in ResearchCyc (full Cyc) are the ordinary assertions that relate concepts, e.g. cyc:DougLenat cyc:president cyc:Cycorp.

OpenCyc's purpose is to spread the use of the Cyc ontology in the Semantic web and Linked Data.

gojomo · on Nov 17, 2010

When they say, "Release 2.0 of OpenCyc includes... English strings (a canonical one and alternatives) corresponding to each concept term, to assist with search and display", do they mean strings like the sentence at their example URL...

http://sw.opencyc.org/2009/04/07/concept/en/Game

"A specialization of DevisedStructuredActivity. Each instance of Game is an abstraction of a game that is played according to a semi-rigid set of rules. Each instance includes both the rules (see GameRulesFn) and a specification of any physical components required for play (instances of GameBoard, Ball, etc.). Neither Events of playing games (instances of PlayingAGame) nor any physical components required for play (e.g. GameBoards) are instances of Game."

Or do they mean something else, and if so, how would I find that something else (what formats/names) in their downloads?

iLikeCyc · on Nov 22, 2010

Both. If you View Source of the page web page you site, you'll see there's really RDF behind it. The long string you quote is a Comment that's associated with each concept. But there are also Label and PrettyString clauses that provide a primary string associated with this concept ("game") and other strings (here, only "games"), but see http://sw.opencyc.org/2009/04/07/concept/en/Dog for some better examples of alternative strings.

jimcog · on Nov 18, 2010

(non-computer tech here) Critical points taken, but what about the value to the Cleveland Clinic's cardiology dept.? See article by Clinic's IT semantic web project staff at http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClin... Isn't that the utiility of a deductive reasoning engine - taking disparate data overwhelming to humans - and cranking through it as queried in human language terms? I'm just asking.