Hacker News new | past | comments | ask | show | jobs | submit login
An Open Version of WordNet (en-word.net)
73 points by syats 6 days ago | hide | past | favorite | 20 comments

Wordnet is already open, I think the advantage of this is that it is being actively maintained.

From Wordnet:

> Permission to use, copy, modify and distribute this software and database and its documentation for any purpose and without fee or royalty is hereby granted, provided that you agree to comply with the following copyright notice and statements, including the disclaimer, and that the same appear on ALL copies of the software, database and documentation, including modifications that you make for internal use or for distribution.

Wordnet is like a dictionary in that it contains definitions and synonyms of words. It goes beyond a dictionary in that it also records relationships like hypernym (broader) and hyponym (narrower), which can be useful for "understanding" (what ever that means) text. It is a graph, that connects different senses (called synsets), and also senses to words. It used to be released under some close license and poorly maintained, now there's a fork of it on github to which all can contribute.

Wordnet has a "gloss" field but it’s very lacking if used as a traditional dictionary. Its value lies in the graph of synsets.

The problem with the gloss or example field is that it is per synnet and not per (word, sense or synnet) pair as it would be in a normal dictionary.

This means if you try to use it as a normal dictionary the glosses tend to not contain the word for which you are listing the senses.

An animated graph of wordnet links may help explain why it's useful: https://www.wordsapi.com/

What exactly can you do/produce with this graph and connection of words? I can't understand the benefits.

You can build a company called Applied Semantics and then sell your tech to Google so they can develop products called "AdSense" and "AdWords" and make trillions of dollars of revenue. However, first you'll need to invent a time machine that can take you back in time 20 years.

It used to be used in NLP, but never to great success. Word embeddings are a far more powerful way to achieve a lot of similar goals, but with easier computation, easier scalability to other languages, and accommodation for new/personal words that aren't (yet) in the dictionary.

I agree with you, sort of, but WordNet based systems can be explainable. Word embedding (and sentence embedding, etc.) is great, but like deep learning models is a black box.

I wouldn't assume even that... "X was chosen because a matrix decomposition suggested it was relevant" isn't so great either... WordNet is/was rarely used as just a single hop, those hops are somewhat arbitrary to begin with, and ultimately it never really materialized as a useful tool. If it can't be used for anything, then there's nothing to explain!

Using WordNet used to be a very popular way to perform "knowledge-rich" NLP around late 90s upto around 2010 (approximate timeline). "Knowledge-rich" meant you could start with some understanding of the language and not rely solely on the data at hand. Much like the use-case that pretrained models like GloVe serve today (WordNet probably is closer to Dependency based word vectors [1]). Some interesting uses were query expansion [2], sense disambiguation [3], word similarities (popular: wu-Palmer similarity, check out NLTK), and in an interesting area called "lexical chains" [4]: group of related words running through a text, with their "weave" signifying topics.

The arrival of WordNet on the scene, when it happened, was a big deal, since there weren't many ways to perform knowledge-rich NLP back then. The common ones were using a dictionary or a thesaurus. There was some effort to tie topic models with WordNet too, like LDAWN [5]. And extending it, based on collocation information you could glean from the gloss - "eXtended WordNet" [6].

You still (occasionally) see its uses where you need some kind of rich prior knowledge. For ex, the "Hierarchical Probabilistic Neural Network Language Model" by Morin and Bengio [7], or cluster labeling (which uses embeddings with WordNet) [8]. To quote an example from the latter, 'a word cluster containing words "dog" and "wolf" should not be labeled with either word, but as "canids"'. And you know "canids" is a super-category here, by looking up the precise relationships in WordNet.

My own Master's research looked at combining WordNet based lexical chaining with more "ML"-ish techniques like Hidden Markov Models [9]. Which is why I know, or rather, vaguely remember, some of the stuff that was happening back then :-)

I think the primary reason why WordNet did not retain its popularity was it was a good "one off" solution. Worked well with "correct" English. You want to adapt it to your domain vocabulary? Heuristics. You want to use WordNet in another language? Well, someone needs to build one first. You want to use it to process text in internet lingo? Nope, hybrid models and heuristics. Also, at this time the amount of text available to train on was increasing by leaps and bounds, so the field moving toward ML heavy techniques made sense.

[1] https://www.aclweb.org/anthology/P14-2050.pdf

[2] https://www.aclweb.org/anthology/P08-1017.pdf

[3] https://pdfs.semanticscholar.org/7f2c/b3e390c5e539ef9089014a...

[4] http://www.cs.columbia.edu/nlp/papers/2003/galley_mckeown_03...

[5] https://wordnet.cs.princeton.edu/papers/jbg-EMNLP07.pdf

[6] https://en.wikipedia.org/wiki/EXtended_WordNet

[7] https://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical-nn...

[8] https://www.aclweb.org/anthology/U18-1008/

[9] https://pdfs.semanticscholar.org/e7ce/34e5acdbb7a91e28fdafa9...

yes, I'm also interested in use cases for the same.

A fun way to access WordNet hosted by dict.org

    nc dict.org 2628
    DEFINE wn hacker
Don't think it's open though.

It's open. You can install your own copy locally.

Arch has an AUR dict-wn that does this. For those not using arch you can still clone the PKGBUILD and see how it grabs and compiles and installs.

git clone https://aur.archlinux.org/dict-wn

I remember installing the Wordnet database on my Arch Linux install 17 years ago so I would have a dictionary I could use without internet connectivity. Wordnet can be used as a traditional dictionary although it is not very good compared to, e.g., the dictionary that comes with any Mac or iPhone.

I have used WordNet for 20 years, off and on.

Cool that is was forked and effort put into the new version.

WordNet has a rich set of libraries in many languages to use the data. I didn’t see anything similar on their github repo.

You can use the existing libraries for Wordnet with the version in WNDB format.

Is there a similar API with the etymology of the words? I played with this a bit and it doesn't seem to cover this area.

it... doesn't know the word "how"? literally like the second word i typed in. it seems like this must have some holes in it?

Function words are often considered less interesting form a syntacto-symantic point of view. WordNet focuses on content words (nouns, verbs, adjectives, adverbs).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact