Hacker News new | comments | ask | show | jobs | submit login
Ontology Is Overrated: Categories, Links, and Tags (2005) (shirky.com)
217 points by zdw 29 days ago | hide | past | web | favorite | 89 comments



I enjoyed this piece, mostly because I disagree so strongly with it :-). Clay takes the topic of Ontology, contextualizes it into an impractical domain, and then calls it overrated. Cars are overrated too because they can't cross oceans right?

When done correctly, ontology combined with knowledge graphs can make an impractical problem much more practical. Consider for the moment the problem of 'fake news'. What characterizes 'fake news' is that it has facts or concepts that are not in alignment with "true" facts (authoritative). And without getting into the debate of what is and isn't true, you can take a document, a knowledge base populated with trusted facts, and a topic ontology and score that document for alignment with the existing data set. Whether you use that for checking Wikipedia articles or flagging fake news its a very powerful combination.


> Consider for the moment the problem of 'fake news' ... without getting into the debate of what is and isn't true

This comment reminds me of another Shirky essay [1]. He says:

> [This is] the pattern for descriptions of the Semantic Web. First, take some well-known problem. Next, misconstrue it so that the hard part is made to seem trivial and the trivial part hard. Finally, congratulate yourself for solving the trivial part.

The difficult part is figuring out exactly what is and what isn't true. If we had a perfect map of reality it would be easy to detect fake news. In fact we wouldn't need any news at all or journalists or anyone else except the people who build the ontology.

[1] http://www.shirky.com/writings/herecomeseverybody/semantic_s...


I don't find this dismissal persuasive. If I understand you're point, you are arguing that without perfect truth, no partial solution has any value.

If that is indeed your argument, it is remarkably similar to the argument "Perfect security is impossible, this system is worthless." I think it is demonstrably obvious that there is a related parameter in both cases, which is the cost of overcoming the system is greater than the gain from overcoming said system.

As an example of how that argument fails in practice, consider textbooks. When I was working on this problem full time the organization had a relationship with a major publisher of textbooks. Consider the system which can construct an ontology and knowledge graph based on the contents of a text book. It may not be 'true' in the Aristotle sense, but it is an effective tool for evaluating whether or not a paper or essay correlates that ontology and knowledge graph or doesn't. In such a system there is value in evaluating the extent with which the knowledge in the textbook has been communicated to the person writing the essay.

Being able to automate the analysis has other benefits as well which extend beyond flagging documents which disagree a previous established canon.


That is now one of my favorite essays. Thanks for posting.

It does amuse me how much this framing of problems happens. Both in the annoying cases of arm chair quarterbacks, and in the cases of folks that take on problems previously thought of as too difficult.


I know this might be fuel to the fire on here, but I think Tanushree's work at Georgia Tech in her paper "A Parsimonious Language Model of Social Media Credibility Across Disparate Events" [1] is a good stab at the "fake news" problem with a lens of "credibility".

As a disclaimer, we tried to get this model off the ground in YC summer 2017 batch but were rejected after the phone interview. I did not assist in the research; only an effort to publicize.

I whole-heartedly believe this is our best bet at combatting "fake news" online for the moment. Taking the lessons of "The Most Human Human: What Artificial Intelligence Teaches Us About Being Alive" [2], we might tease out credibility instead of fact or fiction to help guide our biased/opinionated labels.

From the abstract:

"In other words, the language used by millions of people on Twitter has considerable information about an event’s credibility. For example, hedge words and positive emotion words are associated with lower credibility"

[1]: https://credcount.com/whitepaper.pdf

[2]: https://www.amazon.com/Most-Human-Artificial-Intelligence-Te...


The resiliance of any fake news detection heuristic is impossible to prove until it is consequential enough to incentivise subversion.


The problem is that "Fake News" is a misleading category in itself. The primary issue with "fake news" isn't political it's advertising. Fake news is made mostly to lure you into clicking or sharing which in turn increases the advertising revenue. A lot of it is saying outrageous conspiratorial stuff but in essense it's not different than tabloid news which are often also "fake news"

On top of that what might be considered fake news by some doesn't have to be by others because it might have utility.

And lats but not least. Combatting fake news would mean that you would be able to cover all contextual interpretations of a given statement which we simply don't have access to.

So from my perspective you can't battle fake news in any meaningful ways which won't end up doing more harm than good as "fake news" is not really fake in some objectively defined way.


> what might be considered fake news by some doesn't have to be by others because it might have utility

Do you have an example of this in mind?


I think our best bet at combatting "fake news" is to repeal the censorship and copyright, and to formalize belief systems (this may seem like arduous work, and it is, but I believe machine learning will soon be able to formalize texts, science, physics, math to the point where we will have package managers for believe systems)

Ten years ago, I would have said that this would not happen within 20 years, but now I am much less certain, due to a couple of profound breakthroughs: word embeddings (Glove etc), and their famous linear analogies, and sense disambiguation.

These have not only been achieved in the emergent "look how amazingly bizarre" sense, but have over the last few years gotten a much more rigorous footing: notably the papers:

[1] Linear Algebraic Structure of Word Senses, with Applications to Polysemy (RECOMMEND reading both first and last version, as some things are more lucid in the first version, and some in the last. The main takeaway is that the word vectors are linear combinations of a low number of sense vectors, and these sense vectors can be recovered by sparse coding)

[2] Towards Understanding Linear Word Analogies (Which finally explains the "king" + "woman" - "man" = "queen" phenommena)

Now consider physics for example, running text and equations, then consider

* an "overscored n": is this 1) normal vector or 2) an antineutron or 3) ... ?

* a greek beta symbol: is this 1) v/c as in relativity 2) kT as in thermmal or statistical physics or 3) ... ?

* an e symbol: is this the exponent function, or the electron or ... ?

With today's tools it should already be possible to make this: type a LaTeX formmula, or perhaps even a picture of a formula, then mouse-over an ambiguous symbol to see the sense of the symbol, or alternatively generate a wikipedia style list: "where ... is ..."

It should be possible to do this for math as well, and pretty soon software should be able to formalize quasi-formal mathematics books, papers, derivations, proofs ... into MetaMath, and the beautiful part? MetaMath contains the ground truth upon verifying so it can keep training the formalizer while formalizing without supervision! The verifier will become the supervisor!

Of course MetaMath will not tell you if your postulates and axioms are correct, but it will check if your reasoning is sound. The axioms or postulates will be what characterize the belief systems, and any rammifications or inconsistencies will be linked to in the "belief package" managers so that whenever you encounter a formalized claim you can then reply with how the claim does not follow in your own belief system, and it will hence become netiquette to not just share vague claims, but formalized ones and not just formalized ones but accompanying a reference to the claims proof and in which belief systems they hold, it will become much easier to identify exactly where people who disagree with you differ in assumptions or axioms, and hence easier to collaborate on identifying inconsistensies in the endless stream of forked belief systems to avoid being filtered by the reader's or platform's access to gossiped inconsistency proofs...

A corollary is that the semi-crackpot semi-true realpolitik interpretations from the lower regions of society will force the higher regions of society to either formalize explicit injustice and oppression, or else democractically cede the definition of the boundary between just and unjust to the populace, so that we can have a formalized egalitarian society...

Just like the printing press democratized communication, machine learning and formalization will democratize interpretation.


I'm not sure about that. In Shannon's Information Theory, stuff that aligns with the existing knowledge base contains 0 bits of information. Things that contain new information cannot line up with the existing knowledge base; "news" is exactly the word for such things.

You might be able to say that fake news doesn't align with true facts that aren't part of the facts being reported as news. That is you can tell that "the color of the sky unexpectedly changed from green to yellow yesterday" is fake news, but you can't use true facts to tell that ""the color of the sky unexpectedly changed from blue to yellow yesterday" is fake news. (I mean, yes, you might still use true facts about atmospheric composition, light scattering, and the spectrum of solar output, but you can't use "the sky is blue".)


I think that is exactly right. So lets say you consider a news story as a 'stream' of bits. And you can demodulate those bits using some formula and there are enough extra bits in the message that you can correct for errors in the channel.

If you get 5 streams of bits and four of them correlate with some epsilon number of new bits, and the fifth has 10x epsilon new bits, one of things is likely true. Either the odd stream out is corrupted or the stream has changed.

Apply that to news stories where the ontology (O) and knowledge graph (KG) are combined to "error correct" the content of the story. "Error bits" would be defined as relationships that don't match what the ontology says can be true, or facts that are different than those facts already in the knowledge base. You can flag the odd news story as lacking alignment with the existing KG&O, further you can quantify that lack of alignment fairly precisely.

So if the news story has 0 bits of information it correlates very strongly with other existing stories. If it has many bits of information it contrasts with other existing stories. At some number of stories on the same topic you can begin to reason about which are more likely 'true' and which are more likely 'false.'


Pure contrast isn't enough to hint at falsity, I could tell you a technical detail about what I did today and it would have a 0% overlap with any article on CNN.com. In addition, correlation with your pre-existing beliefs is not sufficient evidence for truth, because today you are wrong about at least a few things. If you look at any given point in history and assume you aren't too special, the first thing you'll realize is that one if not many of the things that are repeated daily in our society are almost certainly false if not outright lies: after all, that claim is true about every other historical period and people group, so why shouldn't it be true about us today?

To suggest a specific example, imagine a person living inside WWII-era Germany correlating a propaganda news article with every other news article they have read, all of which are also propaganda. Correlation-based news epistemology isn't even based on a prevailing social consensus, it's based on the consensus of broadcasters.


That is, whoever controls the corpus of "established true facts" controls the determination of whether something is or is not fake news. Which probably gets us to competing corpuses (corpi?) of "established true facts", run by those who push competing meta-narratives. Which looks rather similar to the current situation...


Exactly so. The operator of the machine gets to control what it is told is 'true' and 'false'. That said, assuming a nominally good actor and an inability to have people process every article, this process provides a means to quantitatively analyze for coherence with your controlled corpus.


"Quantitatively analysing the coherence of a controlled corpus," is not exactly a solution to fake news, although it's probably useful for a few other things.


It certainly helps a lot due to the fact that today you can't be expected to practically sift over all the available information yourself because there's simply too much of it.

If we had a system like this that worked, we could at least leverage it to be able to quantify the coherence of a much larger set of inputs and with many controlled corpora at once.


I agree with this too, and certainly articles about different topics will have high contrast and low correlation. IBM patented[1] some work we did at Blekko to figure out the topic of a document from its components and that is an essential first step.

It also fails when there are few streams to compare, as when a new story first breaks. Or when there is coordination of actors.

My suggestion is not that it is fail safe or perfect, only that is a useful application of an ontology which disputes the original author's thesis that they are overrated.

[1] https://patents.google.com/patent/US10157178B2/


>It also fails when there are few streams to compare, as when a new story first breaks. Or when there is coordination of actors.

All of these things are hallmarks of fake news: it is claimed to be new (hence "news",) is not published very widely (you can't compare a aquariusrising.biz article on Green Space Aliens to a CNN article on the same, because there isn't a CNN article on that), and it's usually coordinated between copycat opportunists. Even worse, none of these things are distinguishable from a grassroots report unless you know the true news a-priori. This better not be a mechanism to ban everything that isn't on CNN.com.


Ontologies are problematic because they are almost invariably hierarchical and binary.

If your ontology is probabilistic and multi-dimensional, it's much more likely to represent truth.

I actually think ontologies are a trap for smart people. For people who like to systematize, they have an allure. Ontologies give you the illusion of putting the chaotic world into order. Even phrases like "true facts" - in reality, facts are hard to come by and change over time, and the closer you look at them the more woolly they get - predicates are far too abstract and simplistic, and when you dig in, the fractal messiness overwhelms.


I agree with you and I would like to point out if the 16th century Catholic church had used the same process they would have concluded that Copernicus was 'fake news'.

It's a hard problem.


Can you point me to some examples where it’s been done correctly?

I’ve followed this topic off and on for 20 years, and I’ve yet to see a big success. I’ve only ever seen this cycle:

1) Someone discovers ontology and rdf,

2) They implement some obvious examples,

3) They get hyped and tell everyone,

4) They try to generalize their examples to real-world problems,

5) They get silent and disappear.


Ontology is a reduction of reality and shows the limits of our minds.

Fake news is a great example of how sloppy categorization can end up being. Fake news is mostly about creating clickbaity stories which will make people share and thus increase advertising revenue, yet today most people think about it as some sort of political propaganda.

Reality is more nuanced than our categories allow us to explore and often times they are limiting our ability to think about the world around us.

Clays point sees to be that the value of information is not in categorizing it but allowing for new relationships to be made, which is exactly what ontology can often hinder.


Putting aside the difficulty of building that database of true facts, and of parsing unknown facts out of English text, that solution feels like it's inviting an arms race. I can write a bunch of uncontroversial "true" facts into an article containing one or two false facts.


Diluting lies with verifiable but only tangentially related truths is a well-established tactic to fooling humans as well.

Someone with an agenda in the symbolic vs statistic AI question could take this parallel as an example of how close ontology-based AI approaches are to the way humans think. And then someone with the opposite agenda would point out that the example is all hypothetical.


>And without getting into the debate of what is and isn't true

Nice try, but this is missing the whole point. This is the crux of the matter and the problem with the category of "fake news", and precisely why this ontology is garbage.


I must absolutely disagree with you: in my opinion the problem of fake news stems from compartmentalisation of news sources (lack of their diversity), rather than lack of actual facts in the news sources.

The fact that you can read about the same topic in several (more often opposing) ways is already a reason to red-flag the piece in the first place. This isn’t readily available through any of the current newspapers. Everyone has an opinion.

I would happily pay for an aggregator that simply ejects facts (stated facts, not truths) from the stories. Then combines that with similar other pieces.


> I would happily pay for an aggregator that simply ejects facts

There is already the AP news ticker, or Reuters. Bloomberg has the same thing for financial matters. If you read only that, you will die of boredom.


Well, truth is boring, isn't it?

Would you otherwise prefer to read Russian propaganda speaking of German authorities policing Germans of Russian heritage? Because that's exactly what they are currently doing.

0 facts, amazing opinionated narrative in multi-paragraph form. Pictures on top of that. But that definitely sounds more fun. Moreover - that sort of narrative is way closer to the general sentiment of the working classes who are generally predisposed towards learning about how "the Man" is taking advantage of them in the first place.


I can't be so boring it leads to complete apathy. Unless you are a computer, there's a limit to the boredom you can take.


Would you rather be bored or manipulated by the foreign propaganda machine?


The AP news ticker has its own bias and narrative, as does Reuters. There is no truth in any text, only narrative. So long as you are reading trying to find truth in a single text you will fall into error.


If they contain only facts, there is no narrative.

The only narrative is the selection of what facts to report. Without that you get lost in the deluge and get equally uninformed.


I've been working on a tag-centric document manager for the last 90 days:

https://getpolarized.io/

It allows you to tag and manage everything you're reading and the primary management metaphor is the 'tag'.

It's a central repository for all your reading. PDFs, web content, and it supports annotation (highlights, comments, etc).

There are a bunch of clear wins here.

Related tags are awesome. You can build a bunch of really nice derived metadata just by managing your tags on the documents.

For example, I've been reading a lot about growth hacking as I'm trying to get Polar to 100k users. (We're officially 10% of the way there today!)

Since I have a bunch of documents named 'growth' Polar recommends startups, venturecapital, and marketing as secondary tags.

Also, since Anki supports tags (and Polar supports Anki sync) these systems are compatible.

Polar tags are also Twitter tags. While we haven't enabled this feature yet your annotations could be exported to Twitter and the systems are compatible.

We're still getting users that want hierarchy though. I'm not sure what I want to do here.

I'm considering building out algorithmic hierarchy.

For example, if you have two documents with the first tagged 'tech, microsoft' and the second tagged 'tech, linux' we can infer that tech is the parent of 'microsoft' and 'linux'

... and it would still support tags.

the problem I have now is that folder hierarchy is still being requested by our users but I'm resistant to implement it yet.


Tags do not fit spatial memory. Tags are utterly worthless in any spatial context.

You're at the mercy of a search engine's implementation and index foibles for finding things through search queries.

A hierarchy maps onto our spatial memory, which we use to figure out where we've been and go back to it.

There's a lot to be said for maintaining spatial metaphors in information space.

Another little detail here is that spatial organisation (in meatspace) is a shared environment. A single 'shared' ontology is useful even if it's nonsense, because it's still a shared frame of reference in which navigational directions can be shared.


If you use modern NLP you can embed tags into vectors and then just use dot product between query and targets to rank them. There's even a fast library for that in Python called 'annoy'. It is capable of relating any query to any tag.


Did you read the last part of my comment?

I clarified that there's a workaround to give hierarchy.


> For example, if you have two documents with the first tagged 'tech, microsoft' and the second tagged 'tech, linux' we can infer that tech is the parent of 'microsoft' and 'linux'

How do you account for the availability effect [0]?

To use your example, let's say that I tagged a document 'tech, microsoft' and the second is tagged 'linux, tech', because of the availability effect -- it's been a long time since I tagged documents causing me to add the 'tech' tag as an afterthought rather than as the first tag?

Better to keep things simple.

0: https://en.wikipedia.org/wiki/Availability_heuristic


> We're still getting users that want hierarchy though. I'm not sure what I want to do here.

Have you considered hierarchical tags?

As an example:

https://dmitryfrank.com/projects/geekmarks/article

I use it for my bookmarks, and I find it is a suitable balance between having some kind of hierarchy, while allowing for resources to be classified under several categories.


All I see is a giant ad for ED pills.

This is what I see: https://archive.fo/rZchC over time: https://archive.fo/http://www.shirky.com/writings/ontology_o...


Lack of HTTPS most likely, meaning an infected ISP or alike doing a MITM attack.

Interestingly the wayback-machine snapshot is not affected: https://web.archive.org/web/20190123000016/http://www.shirky...

After digging a little it seems like the redirection is happening server side (302, 0 bytes e.g. not JS) and its controlled from a malicious site called google-static(.)com, if anybody wanna investigate this further it's this one:

    http://google-static.com/search/?niche=unknown&page=http%3A%2F%2Fshirky.com%2Fwritings%2Fontology_overrated.html&h=shirky.com&p=%2Fwritings%2Fontology_overrated.html&ref=https%3A%2F%2Fwww.google.co.uk%2F


Really common malvertising attack, server is probably compromised: https://www.welivesecurity.com/2013/05/07/linuxcdorked-malwa...


There may be malicious software on your computer.


this redirected on both GP and archive.fo's servers... I can't replicate, but I think we can rule out malicious software on GP's computer.


That makes sense.


I see the same and not malicious software in Android.


That is rather odd, on Chrome/Win10 I see the original article, and looking at the page source it seems rather innocuous (no scripts or ad banners etc).

What browser and OS are you using?


I don't think it's malware, at least not on the client. I see the same thing, and I'm using w3m on Linux.


Same here. Definitely not malware, on an enterprise-managed system. :) DNS hijack perhaps?


Me too. Sibling comment says it might be due to malware. Should I be concerned?


I saw that earlier this morning as well. It's fixed now.


Ontology's key strength is that it allows for discovery. Yes it's a form of curation, with all the bias that implies. Sometimes I want a domain expert to curate hierarchical lists for me to explore.

Both Ontology and Tagging have their place. The net is vast and infinite, sometimes it's nice to have a map.


This article is 14 years old, and dates back to a time when the debates about the "semantic web" had a lot more wind in their sails. (I made similar arguments around that time...)


I had a whole semester on the semantic web in 2008. Having grown up with the internet as it was, I thought it was useless nonsense


I enjoyed reading this article, but was perplexed at the statement "Cities are real. They are real, physical facts. Countries are social fictions." City names and boundaries are just as, if nor more, socially constructed as those of countries.


This is true of names and borders, but the cities themselves remain where they are even when names and borders change, so they're more persistent than countries.


That might seem logical on the surface, but dig just a little deeper and the flaws become readily apparent.

For example if you're looking for the population of New York between 1850 and 1950, you would see a potentially dramatic shift of the population around the turn of the century, which would be misleading if you didn't already know the caveat about the consolidation of the city in 1898.

However that shouldn't discourage the tagging of data by city. It just means that a city, as metadata, is no more or less functional than country, as the grandparent comment suggests.


Well, yes, few things are entirely immutable. Cities do grow and (rarely) shrink or disappear. But this is rare enough that it's useful to hang other data off of their names.

For example, take the locations used in time zone databases, where an official way to name a location is something like America/Los_Angeles. The assumption is that major cities don't get split across time zones and users know which cities they are nearby that they share a time zone with. Country names aren't used.

A lat/long pair would be more precise, but sometimes precision isn't needed.


> You need a hierarchy to manage a file system.

I find this a strange assertion after several paragraphs which vociferously assert hierarchies are redundant in the virtual space. In fact, OS/400 is a 'single-level' store using objects, so it has even been shown in practice that you do not need filesystems to build a useful OS, if I am not mistaken.

Am I mistaken?


I also found this assertion surprising. I often wish filesystems had good native support for tagging and not just a strict hierarchy. Even hardlinks / symlinks seem like a poorly supported afterthought, especially on Windows, and generally not worth the effort to use them for user-level data organisation. But I'm not sure how a purely tag based would work in practice, so thanks for the pointer to OS/400.

Another example is the Windows Start Menu. The hierarchy is basically useless and a huge pain to use. I have give up on maintaining a sensible hierarchy long ago. Windows 10's search is much better, except when it fails you're forced to go back to the hierarchy. One thing missing here is that you can't navigate from a search result to the hierarchy.


"when people were offered search and categorization side by side fewer and fewer people used categorization to find things"

Two thoughts pop into my mind which guide me gently to the conclusion we need to be careful here.

  1) every configuration is a set of trade-offs
  2) any parameter or paradigm pushed to extreme produces perversion
On point number 1: categorization contains the benefit of discoveribility when you lack the information necessary to form the search query in the first place.

On point number 2: search forces you to generate the query and direct your attention towards a given subject. However, some of the most important discoveries you have ever made in your life were when other people held your attention in conversation and kindly directed it in a direction it wasn't going but that arrived at an illuminating destination.

We need all implementations to some degree to get all possible benefits. I suspect this is why suggestion algorithms have flourished but in doing so they create echo chambers because they are a charicature of yourself bending your own ear.

If only algorithms could suggest what you needed to hear and not what you wanted to hear.

Some books I have bought at my local bookstore I never would have found on Amazon. It's amazing what being able to direct our own attention and have the world organized in exactly that way has done for us, and it is equally as amazing as the paths we are lead down when we let other people direct our attention.


As a solution developer, I find the most bang for my buck is when I produce solutions for people who come to me with a problem rather than a solution.

Imagine if Google made you answer the question "Why?" five times before it showed you results. I wonder if a well trained network would produce such accurate results we would no longer need to see a list of results, and could simply hit the lucky button.

But good luck finding a user who's willing to answer "Why?" five times rather than brute-forcing a search engine into their "echo chamber," as you accurately described it.


Well theres no disagreement from me that what you describe is how it plays out.

My caution is that we have found the equivalent of cheap, fast, microwavable meals and we are now consuming them almost exclusively and in 15 ~ 20 years it's going to become painfully obvious what deleterious side-effects arise as a result of that. We can't see it now, so we don't feel the need not to do it, and that simply is what it is, but in time it will act as a control signal that nudges us to modify our behavior.

This is both a great and terrifying time to be alive. The 20th century was full of some pretty hefty social experiments and the thing with those were they were obvious and in your face. The 21st century versions are much less obvious and reveal themselves very slowly.


This article has made the rounds so many times that surely Ontology must be underrated for now.


I think this highlights exactly the dangers of social media and communities centered around fixated niches.

I've been thinking lately of how to remedy that and this post came up at just the right time to get the mind juices flowing. At a high level I see a decentralized community that doesn't rely on forums/tribes/subreddits and incentivizes tagging like old school del.iciou.us. This enables higher chance of cross pollination and less group think. I could really see it working well with some flavor of ethereum/ipfs/incentives.

Something something, everything is in cycles? Hah.


Eh, people will resist levers of control. Even levers with good intentions.

People seek a payoff. Even if you have some interconnect, that forces patterns of use, why are they crawling through your pipes?


The same is so true for programming, in particular OOP. In my experience, code is much easier to change when a loosely coupled set of interfaces is built as opposed to an hierarchy of classes.


Indeed. This is also why I think OOP is missing something to scale better in terms of flexibility and domain size. The ideal code structure for a particular need is rather arbitrary. OOP "prefers" things in a hierarchy, both in terms of inheritance, and in terms of an object being less powerful than a class (in most languages). You can force or use OOP outside of these, but it's unnatural in my opinion.

For example, why is it not easy to add an "onClick" event handler method to a particular button in a Java GUI? One has to use lambdas instead. Attempts to reorg the GUI engine to remedy such create side-effects. Java's OOP model is simply not powerful enough to do GUI's naturally. Conway's Law seems to apply to domain structures as well. Fit matters.

Something like Lisp allows one to create behavior-containing "structures" that are customized to a domain or need, but can get too confusing. Lisp can easily get too hard to read and follow for most mortals, and is why it never became mainstream despite being around for 60-ish years.

We need something in between traditional OOP and "full meta".


You can do OOP without inheritance. I do it all the time, including in Java. It works extremely well alongside dependency injection. You still get all of the other parts of OOP, like encapsulation and polymorphism.

The only real trick is to use delegation patterns instead of inheritance, i.e. dispatch requests to class members rather than super/subclasses.

It doesn't feel awkward or unnatural.


It does feel awkward to me. I tend to view it more like RDBMS design and queries where the structures fit the domain and one can query for subsets or dispatching using something as flexible and simple as SQL (relatively speaking).

Then you could easily answer:

A) Show me all the GUI event handling code for all buttons. B) Show me all the event handling code for Form X. C) Show me all the event handling code for all buttons within forms having at least one drop-down list.

Similar goes for executing code (running business logic), not just code inspection. Then one doesn't have to crawl object graphs as often. Early databases did graph crawling, but then Dr. Codd showed a better way, and graph-DB's mostly withered. I'm waiting for the Dr. Codd of code and behavior dispatching management to come save the day.

I'd like to explore using relational modelling for behavior, not just attributes (data). I used to do such in xBase (dBASE, FoxPro, etc.) because of its dynamic nature and ease of editing tables, including code in tables. I thought that direction was the future, it looked bright to me. Then OOP came along and killed the seeming birth of table-oriented programming.

(It might seem like a security risk to put code in tables, but the difference between a file system and database are not different enough to say putting them in a database is "bad" while putting them in a file system is "good". I'd like to blur the distinction between a file system and database, but maybe that's a diff topic. Plus, one doesn't necessarily have to put code "in" the tables, just references to it in tables. One just needs a system/convention to track and map each to the other.)


I think most ML-derived languages strike the right balance - functional languages that incorporate enough OOP principles to get things done in the real world.

I increasingly advocate F#, and many people recommend Elixir for these purposes too (though I'm not familiar with it myself).


I've yet to see useful things F# can do in my domain that couldn't be easily done other ways. Most of the examples are in esoteric domains or "lab toys". I'm not saying it's not useful, it's just that it hasn't been presented in a way clearly useable to my domain, which is "typical" business applications. For example, much of the issues related multi-tasking and parallelism are typically farmed off to the RDBMS and A.C.I.D. transaction handling. It may be a boring domain to many, but it's important for commerce.


For example, why is it not easy to add an "onClick" event handler method to a particular button in a Java GUI?

Lack of pointers. An event reference is a double pointer: to the method and to the object that is "self" or "this" for the execution of the event handler. Since they refused to have pointers in the language, they devised that convoluted solution.


When I try to design an "ideal" GUI system on paper, I start with a relational model. Relational is quite adept at dealing with "pointers" in terms of foreign keys. A given snippet of behavior (such as an event-handler method) can be easily associated with multiple things; something OOP has a hard time with. Thus, we don't have to go back to direct address pointers to solve this issue, but rather learn from relational modelling, as I hinted at in a nearby reply.

(One problem with traditional RDBMS and GUI's is that different widgets need a variety of similar and often overlapping attributes {columns}. It's not realistic to have a table dedicated to each widget "type". Thus, I propose using "dynamic relational" instead, in which column existence is optionally situational.)


I think avoiding deep class hirarchies is an accepted best practice nowadays. In academia, people still don’t know it (I left in 2017) but it is accepted by everyone I talk to nowadays. I do not use any implementation inheritance anymore, only interface inhertance.


The problems with implementation inheritance are quite a bit deeper than "ontology is overrated", though - even where ontologies are appropriate, it's still bad!

It's "sold" as enabling extensibility and code reuse, but the only feature it adds to good old-fashioned composition is 'open recursion' a.k.a. late binding, which inherently leads to the "fragile base class" problem-- making behavior of base-class implementations dependent on preserving complex and generally unspecified invariants in the derived classes - so it's basically never what you actually want!

Full-blown OOP (including implementation inheritance and polymorphism) doesn't work. It's a terrible idea that only ever became popular because we didn't quite grok its implications.


Interesting, the url picks up as:

"Female Viagra UK, Buy Viagra - Online Pharmacy, Best Offer"

When pasting in a notes app. Couldn't tell if it was an inside joke with regards to the subject matter.


The site is now displaying spam :(


This is a great post.

The problem with the thesis is that just because classical ontology doesn't work for the entire internet, doesn't mean that we can't apply it in many domains.

Formal ontology could be used very well in a variety of areas.

Informal, or less concise ontology again in many areas.

And leave tagging for the rest.


Aren't tagging and "informal ontology" pretty much equivalent?


Well there are surely other informal ways of doing 'ontology'.

The point being - more rigorous methods can definitely be valuable in areas, and I don't think it's a good idea to ignore this fact.

It's a bygone issue because there wasn't a critical mass in place to make it work, but there could be.


Haha, if you don't enjoy this you will enjoy it anyway.


This is one of my favorite articles, which I first read when it was new and still informs how I think about organizing information and the web.


Anyone know what Clay is up to at the moment? Seems to have gone very quiet in the last 5 years.


Reddit is basically a flat ontology for images, if you think about how most people use it.


Queue up the quotes from "Zen and the Art of Motorcycle Maintenance"


[2005] should be added to the title

> This piece is based on two talks I gave in the spring of 2005 -- one at the O'Reilly ETech conference in March, entitled "Ontology Is Overrated", and one at the IMCExpo in April entitled "Folksonomies & Tags: The rise of user-developed classification." The written version is a heavily edited concatenation of those two talks.


Google has this dated as [2014] - June 14 to be exact. Google also shows the exploit-affected site title as mentioned in other comments...

https://www.google.com/search?source=lnt&tbs=cdr:1,cd_min:1/...


2005. did what he predicted happen? I think not?


Might be good to add a date to the title.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: