Hacker News new | past | comments | ask | show | jobs | submit login

Google "word sense induction" or "word sense disambiguation". Intuitively, distributional information of the same sort that is used to derive representations for different word types in W2V or LexVec is useful for distinguishing word senses. Two (noun) senses of lead, two senses of bat, etc. are pretty easy to distinguish on the basis of a bag of words (or syntactic features) around them. Other words are polysemous: they have multiple related senses (across the language, names for materials can be used as containers; animal name for the corresponding food--but with exceptions). For some high frequency words it's a crazy gradient combination of polysemy and homonymy: 'home' for can refer to 1) a place someone lives 2) the corresponding physical structure 3) where something resides (a more 'metaphorical' sense), among other things. Obviously an individual use of a word has a gradient relationship to these senses, and speakers differ regarding what they think the substructure is (polysemous or homonymous, hierarchical or not, etc.). I've been working in my PhD on a technique to figure this out, but people clearly use a lot of information that isn't available in language corpora alone (e.g. intuitive physics).



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: