
Ontology Is Overrated: Categories, Links, and Tags (2005) - zdw
http://www.shirky.com/writings/ontology_overrated.html
======
ChuckMcM
I enjoyed this piece, mostly because I disagree so strongly with it :-). Clay
takes the topic of Ontology, contextualizes it into an impractical domain, and
then calls it overrated. Cars are overrated too because they can't cross
oceans right?

When done correctly, ontology combined with knowledge graphs can make an
impractical problem much more practical. Consider for the moment the problem
of 'fake news'. What characterizes 'fake news' is that it has facts or
concepts that are not in alignment with "true" facts (authoritative). And
without getting into the debate of what is and isn't true, you can take a
document, a knowledge base populated with trusted facts, and a topic ontology
and score that document for alignment with the existing data set. Whether you
use that for checking Wikipedia articles or flagging fake news its a very
powerful combination.

~~~
cdbattags
I know this might be fuel to the fire on here, but I think Tanushree's work at
Georgia Tech in her paper "A Parsimonious Language Model of Social Media
Credibility Across Disparate Events" [1] is a good stab at the "fake news"
problem with a lens of "credibility".

As a disclaimer, we tried to get this model off the ground in YC summer 2017
batch but were rejected after the phone interview. I did not assist in the
research; only an effort to publicize.

I whole-heartedly believe this is our best bet at combatting "fake news"
online for the moment. Taking the lessons of "The Most Human Human: What
Artificial Intelligence Teaches Us About Being Alive" [2], we might tease out
credibility instead of fact or fiction to help guide our biased/opinionated
labels.

From the abstract:

"In other words, the language used by millions of people on Twitter has
considerable information about an event’s credibility. For example, hedge
words and positive emotion words are associated with lower credibility"

[1]:
[https://credcount.com/whitepaper.pdf](https://credcount.com/whitepaper.pdf)

[2]: [https://www.amazon.com/Most-Human-Artificial-Intelligence-
Te...](https://www.amazon.com/Most-Human-Artificial-Intelligence-
Teaches/dp/0307476707)

~~~
ThomPete
The problem is that "Fake News" is a misleading category in itself. The
primary issue with "fake news" isn't political it's advertising. Fake news is
made mostly to lure you into clicking or sharing which in turn increases the
advertising revenue. A lot of it is saying outrageous conspiratorial stuff but
in essense it's not different than tabloid news which are often also "fake
news"

On top of that what might be considered fake news by some doesn't have to be
by others because it might have utility.

And lats but not least. Combatting fake news would mean that you would be able
to cover all contextual interpretations of a given statement which we simply
don't have access to.

So from my perspective you can't battle fake news in any meaningful ways which
won't end up doing more harm than good as "fake news" is not really fake in
some objectively defined way.

~~~
mac01021
> what might be considered fake news by some doesn't have to be by others
> because it might have utility

Do you have an example of this in mind?

------
burtonator
I've been working on a tag-centric document manager for the last 90 days:

[https://getpolarized.io/](https://getpolarized.io/)

It allows you to tag and manage everything you're reading and the primary
management metaphor is the 'tag'.

It's a central repository for all your reading. PDFs, web content, and it
supports annotation (highlights, comments, etc).

There are a bunch of clear wins here.

Related tags are awesome. You can build a bunch of really nice derived
metadata just by managing your tags on the documents.

For example, I've been reading a lot about growth hacking as I'm trying to get
Polar to 100k users. (We're officially 10% of the way there today!)

Since I have a bunch of documents named 'growth' Polar recommends startups,
venturecapital, and marketing as secondary tags.

Also, since Anki supports tags (and Polar supports Anki sync) these systems
are compatible.

Polar tags are also Twitter tags. While we haven't enabled this feature yet
your annotations could be exported to Twitter and the systems are compatible.

We're still getting users that want hierarchy though. I'm not sure what I want
to do here.

I'm considering building out algorithmic hierarchy.

For example, if you have two documents with the first tagged 'tech, microsoft'
and the second tagged 'tech, linux' we can infer that tech is the parent of
'microsoft' and 'linux'

... and it would still support tags.

the problem I have now is that folder hierarchy is still being requested by
our users but I'm resistant to implement it yet.

~~~
desc
Tags do not fit spatial memory. Tags are utterly worthless in any spatial
context.

You're at the mercy of a search engine's implementation and index foibles for
finding things through search queries.

A hierarchy maps onto our spatial memory, which we use to figure out where
we've been and go back to it.

There's a lot to be said for maintaining spatial metaphors in information
space.

Another little detail here is that spatial organisation (in meatspace) is a
shared environment. A single 'shared' ontology is useful even if it's
nonsense, because it's still a shared frame of reference in which navigational
directions can be shared.

~~~
visarga
If you use modern NLP you can embed tags into vectors and then just use dot
product between query and targets to rank them. There's even a fast library
for that in Python called 'annoy'. It is capable of relating any query to any
tag.

------
mayneack
All I see is a giant ad for ED pills.

This is what I see: [https://archive.fo/rZchC](https://archive.fo/rZchC) over
time:
[https://archive.fo/http://www.shirky.com/writings/ontology_o...](https://archive.fo/http://www.shirky.com/writings/ontology_overrated.html)

~~~
oftenwrong
There may be malicious software on your computer.

~~~
jszymborski
this redirected on both GP and archive.fo's servers... I can't replicate, but
I think we can rule out malicious software on GP's computer.

~~~
oftenwrong
That makes sense.

------
hnzix
Ontology's key strength is that it allows for discovery. Yes it's a form of
curation, with all the bias that implies. Sometimes I want a domain expert to
curate hierarchical lists for me to explore.

Both Ontology and Tagging have their place. The net is vast and infinite,
sometimes it's nice to have a map.

------
wheels
This article is 14 years old, and dates back to a time when the debates about
the "semantic web" had a lot more wind in their sails. (I made similar
arguments around that time...)

~~~
dekken_
I had a whole semester on the semantic web in 2008. Having grown up with the
internet as it was, I thought it was useless nonsense

------
thedailymail
I enjoyed reading this article, but was perplexed at the statement "Cities are
real. They are real, physical facts. Countries are social fictions." City
names and boundaries are just as, if nor more, socially constructed as those
of countries.

~~~
skybrian
This is true of names and borders, but the cities themselves remain where they
are even when names and borders change, so they're more persistent than
countries.

~~~
cheschire
That might seem logical on the surface, but dig just a little deeper and the
flaws become readily apparent.

For example if you're looking for the population of New York between 1850 and
1950, you would see a potentially dramatic shift of the population around the
turn of the century, which would be misleading if you didn't already know the
caveat about the consolidation of the city in 1898.

However that shouldn't discourage the tagging of data by city. It just means
that a city, as metadata, is no more or less functional than country, as the
grandparent comment suggests.

~~~
skybrian
Well, yes, few things are entirely immutable. Cities do grow and (rarely)
shrink or disappear. But this is rare enough that it's useful to hang other
data off of their names.

For example, take the locations used in time zone databases, where an official
way to name a location is something like America/Los_Angeles. The assumption
is that major cities don't get split across time zones and users know which
cities they are nearby that they share a time zone with. Country names aren't
used.

A lat/long pair would be more precise, but sometimes precision isn't needed.

------
hydrox24
> You need a hierarchy to manage a file system.

I find this a strange assertion after several paragraphs which vociferously
assert hierarchies are redundant in the virtual space. In fact, OS/400 is a
'single-level' store using objects, so it has even been shown in practice that
you do not need filesystems to build a useful OS, if I am not mistaken.

Am I mistaken?

~~~
1wd
I also found this assertion surprising. I often wish filesystems had good
native support for tagging and not just a strict hierarchy. Even hardlinks /
symlinks seem like a poorly supported afterthought, especially on Windows, and
generally not worth the effort to use them for user-level data organisation.
But I'm not sure how a purely tag based would work in practice, so thanks for
the pointer to OS/400.

Another example is the Windows Start Menu. The hierarchy is basically useless
and a huge pain to use. I have give up on maintaining a sensible hierarchy
long ago. Windows 10's search is much better, except when it fails you're
forced to go back to the hierarchy. One thing missing here is that you can't
navigate from a search result to the hierarchy.

------
james_s_tayler
"when people were offered search and categorization side by side fewer and
fewer people used categorization to find things"

Two thoughts pop into my mind which guide me gently to the conclusion we need
to be careful here.

    
    
      1) every configuration is a set of trade-offs
      2) any parameter or paradigm pushed to extreme produces perversion
    

On point number 1: categorization contains the benefit of discoveribility when
you lack the information necessary to form the search query in the first
place.

On point number 2: search forces you to generate the query and direct your
attention towards a given subject. However, some of the most important
discoveries you have ever made in your life were when other people held your
attention in conversation and kindly directed it in a direction it wasn't
going but that arrived at an illuminating destination.

We need all implementations to some degree to get all possible benefits. I
suspect this is why suggestion algorithms have flourished but in doing so they
create echo chambers because they are a charicature of yourself bending your
own ear.

If only algorithms could suggest what you needed to hear and not what you
wanted to hear.

Some books I have bought at my local bookstore I never would have found on
Amazon. It's amazing what being able to direct our own attention and have the
world organized in exactly that way has done for us, and it is equally as
amazing as the paths we are lead down when we let other people direct our
attention.

~~~
cheschire
As a solution developer, I find the most bang for my buck is when I produce
solutions for people who come to me with a problem rather than a solution.

Imagine if Google made you answer the question "Why?" five times before it
showed you results. I wonder if a well trained network would produce such
accurate results we would no longer need to see a list of results, and could
simply hit the lucky button.

But good luck finding a user who's willing to answer "Why?" five times rather
than brute-forcing a search engine into their "echo chamber," as you
accurately described it.

~~~
james_s_tayler
Well theres no disagreement from me that what you describe is how it plays
out.

My caution is that we have found the equivalent of cheap, fast, microwavable
meals and we are now consuming them almost exclusively and in 15 ~ 20 years
it's going to become painfully obvious what deleterious side-effects arise as
a result of that. We can't see it now, so we don't feel the need not to do it,
and that simply is what it is, but in time it will act as a control signal
that nudges us to modify our behavior.

This is both a great and terrifying time to be alive. The 20th century was
full of some pretty hefty social experiments and the thing with those were
they were obvious and in your face. The 21st century versions are much less
obvious and reveal themselves very slowly.

------
PaulHoule
This article has made the rounds so many times that surely Ontology must be
underrated for now.

------
dfischer
I think this highlights exactly the dangers of social media and communities
centered around fixated niches.

I've been thinking lately of how to remedy that and this post came up at just
the right time to get the mind juices flowing. At a high level I see a
decentralized community that doesn't rely on forums/tribes/subreddits and
incentivizes tagging like old school del.iciou.us. This enables higher chance
of cross pollination and less group think. I could really see it working well
with some flavor of ethereum/ipfs/incentives.

Something something, everything is in cycles? Hah.

~~~
refusing
Eh, people will resist levers of control. Even levers with good intentions.

People seek a payoff. Even if you have some interconnect, that forces patterns
of use, why are they crawling through your pipes?

------
sharpercoder
The same is so true for programming, in particular OOP. In my experience, code
is much easier to change when a loosely coupled set of interfaces is built as
opposed to an hierarchy of classes.

~~~
tabtab
Indeed. This is also why I think OOP is missing something to scale better in
terms of flexibility and domain size. The ideal code structure for a
particular need is rather arbitrary. OOP "prefers" things in a hierarchy, both
in terms of inheritance, and in terms of an object being less powerful than a
class (in most languages). You can force or use OOP outside of these, but it's
unnatural in my opinion.

For example, why is it not easy to add an "onClick" event handler method to a
particular button in a Java GUI? One has to use lambdas instead. Attempts to
reorg the GUI engine to remedy such create side-effects. Java's OOP model is
simply not powerful enough to do GUI's naturally. Conway's Law seems to apply
to domain structures as well. Fit matters.

Something like Lisp allows one to create behavior-containing "structures" that
are customized to a domain or need, but can get too confusing. Lisp can easily
get too hard to read and follow for most mortals, and is why it never became
mainstream despite being around for 60-ish years.

We need something in between traditional OOP and "full meta".

~~~
twblalock
You can do OOP without inheritance. I do it all the time, including in Java.
It works extremely well alongside dependency injection. You still get all of
the other parts of OOP, like encapsulation and polymorphism.

The only real trick is to use delegation patterns instead of inheritance, i.e.
dispatch requests to class members rather than super/subclasses.

It doesn't feel awkward or unnatural.

~~~
tabtab
It does feel awkward to me. I tend to view it more like RDBMS design and
queries where the structures fit the domain and one can query for subsets or
dispatching using something as flexible and simple as SQL (relatively
speaking).

Then you could easily answer:

A) Show me all the GUI event handling code for all buttons. B) Show me all the
event handling code for Form X. C) Show me all the event handling code for all
buttons within forms having at least one drop-down list.

Similar goes for executing code (running business logic), not just code
inspection. Then one doesn't have to crawl object graphs as often. Early
databases did graph crawling, but then Dr. Codd showed a better way, and
graph-DB's mostly withered. I'm waiting for the Dr. Codd of code and behavior
dispatching management to come save the day.

I'd like to explore using relational modelling for behavior, not just
attributes (data). I used to do such in xBase (dBASE, FoxPro, etc.) because of
its dynamic nature and ease of editing tables, including code _in_ tables. I
thought that direction was the future, it looked bright to me. Then OOP came
along and killed the seeming birth of table-oriented programming.

(It might seem like a security risk to put code in tables, but the difference
between a file system and database are not different enough to say putting
them in a database is "bad" while putting them in a file system is "good". I'd
like to blur the distinction between a file system and database, but maybe
that's a diff topic. Plus, one doesn't necessarily have to put code "in" the
tables, just references to it in tables. One just needs a system/convention to
track and map each to the other.)

------
sbr464
Interesting, the url picks up as:

"Female Viagra UK, Buy Viagra - Online Pharmacy, Best Offer"

When pasting in a notes app. Couldn't tell if it was an inside joke with
regards to the subject matter.

------
sonnyblarney
This is a great post.

The problem with the thesis is that just because classical ontology doesn't
work for the entire internet, doesn't mean that we can't apply it in many
domains.

Formal ontology could be used very well in a variety of areas.

Informal, or less concise ontology again in many areas.

And leave tagging for the rest.

~~~
maxerickson
Aren't tagging and "informal ontology" pretty much equivalent?

~~~
sonnyblarney
Well there are surely other informal ways of doing 'ontology'.

The point being - more rigorous methods can definitely be valuable in areas,
and I don't think it's a good idea to ignore this fact.

It's a bygone issue because there wasn't a critical mass in place to make it
work, but there could be.

------
edoo
Haha, if you don't enjoy this you will enjoy it anyway.

------
munificent
This is one of my favorite articles, which I first read when it was new and
still informs how I think about organizing information and the web.

------
Angostura
Anyone know what Clay is up to at the moment? Seems to have gone very quiet in
the last 5 years.

------
johnchristopher
The site is now displaying spam :(

------
empath75
Reddit is basically a flat ontology for images, if you think about how most
people use it.

------
platz
Queue up the quotes from "Zen and the Art of Motorcycle Maintenance"

------
citilife
[2005] should be added to the title

> This piece is based on two talks I gave in the spring of 2005 -- one at the
> O'Reilly ETech conference in March, entitled "Ontology Is Overrated", and
> one at the IMCExpo in April entitled "Folksonomies & Tags: The rise of user-
> developed classification." The written version is a heavily edited
> concatenation of those two talks.

~~~
seriocomic
Google has this dated as [2014] - June 14 to be exact. Google also shows the
exploit-affected site title as mentioned in other comments...

[https://www.google.com/search?source=lnt&tbs=cdr:1,cd_min:1/...](https://www.google.com/search?source=lnt&tbs=cdr:1,cd_min:1/1/2005,cd_max:&tbm=&q=inurl:http://www.shirky.com/writings/ontology_overrated.html)

------
jrochkind1
2005\. did what he predicted happen? I think not?

~~~
dannyobrien
Might be good to add a date to the title.

