Hacker News new | past | comments | ask | show | jobs | submit login

I've been a librarian for more than 15 years and I can only speak from personal experience when I say that I am the apex predator of nothing. Every once and a while I will get it in my head to systematize my personal knowledge base with a controlled vocabulary and ontology and I just fall on my face. I really want it for some twisted reason, though.

Turns out LC subject headings -- for all their failures -- are pretty good.




Library of Congress classifications and subject headings (those are two separate things, for those unfamiliar) are not perfect, but they're pretty good, apply to a huge copus, and to my mind most importantly, have evolved over a bit over a century under numerous circumstances, including an absolute explosion of published materials, substantial changes to understanding organisation and classification of knowledge, and an awareness of the social and cultural aspects of these (as well as the institutional bias that's often embodied within them). That is, they have evolved a change management process.

The Classifications are substantively hierarchical, though that's really an outgrowth of the fact that they're used to locate books within physical shelf space, in which a record must occupy an address (physical space), and given that the Library's settled on subject classification as its storage and retrieval basis, this maps what's effectively a folded linear structure (shelf space) onto the multidimensional subject classification. It's not ideal, but it's workable. And many of the quirks of the LoCCS come out of the fact that it addresses both the composition (comprehensive, but still US-centred) and process (shelving, search, and retrieval) of the Library.

The Subject Headings are not hierarchical, though they're structured. In particular, they're relational, with numerous subject headings referring to others. There's some parent-child relations (though the top level hierarchy is broad), numerous retired classifications, and many "use that instead of this" notes.

(I've made ... some progress ... at a structured parsing of the subject headings, though that work's been stranded Because Reasons.)


I've learned to accept that my personal life and knowledge management is going to be a mess. (I'm also a librarian). I just don't want to do more organizing when I get home. I do also feel the temptation to do it 'right' once in a while, but it never sticks. I'd wager a lot of it has to do with the fact that managing an ontology completely on your own just sucks.


> controlled vocabulary

Are you using English? English words can almost mean whatever you want them to. Perhaps design your own language that removes ambiguity. Probably requires a knowledge of philosophy to distinguish between say concrete and abstract, good luck.

Maybe start with correcting the ontology of: https://cuberule.com/ (which takes a geometric approach to defining food types).

Also perhaps decide whether you want to work top-down like a directory tree (or Dewey Decimal?): resulting in standard book classification issues. Or bottom up: resulting in conflicts and discrepancies - https://news.ycombinator.com/item?id=33254025


> Perhaps design your own language that removes ambiguity.

That's what a controlled vocabulary is. It's essentially a set of tags which are clearly defined. So instead of #horses being defined purely by the word "horses," it has an attached definition along the lines of, "The category 'horses' includes equine biology, sports relating to horses, the cultural history of horses, and all other topics involving real horses. Metaphorical horses such as saw horses are not included." Tags like #horse would be redirected to #horses, since there is only one canonical horse tag in the vocabulary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: