Hacker News new | past | comments | ask | show | jobs | submit login
King – man + woman is queen; but why? (p.migdal.pl)
263 points by stared on Jan 7, 2017 | hide | past | favorite | 101 comments

For a D&D "alignment chart", set x axis to [illegal -- legal] and y axis to [evil -- good]. Then start typing in words. Some surprises there! https://lamyiowce.github.io/word2viz/

Preliminary findings!

The most illegal thing is "heroin"

The most legal thing is "CEO"

The most good thing is "teacher"

The most evil thing is "lucifer"

Murder is more legal than money

Priests are about as legal and rich as criminals, same for nuns wrt. janitors

'Sad' is rich and legal, 'happy' is poor and illegal. Same delta with 'power' and 'money'.

Also interesting to use "peasant" -> "ruler" as the Y axis, while leaving the X axis as "she" -> "he". It shows a possible gender bias in language, with all of the male words being higher on the peasant-ruler scale than female words.

I found this interesting: https://i.imgur.com/FrqmGMz.png

Apparently being a secretary is more moral than being a priest.

And of course you can actually add in the classes: http://imgur.com/3FzX81i

I wanted to add feature of sharing plots via URL query strings, but I need to find some time for that.

Interesting (and somewhat disturbing) when combined with the Gods examples.


Also, note the values on the x axis; only "spirit", "true" and "faith" are more good than evil within this dataset, and only slightly so. "Allah" is associated with being illegal?

Apparently the '80s are Chaotic Good

Sentimential is Lawful Good

A Kitsch is Lawful Evil, and

Evokes/Evoking is Chaotic Evil

now I'm the king of the political axis memes http://i.imgur.com/Zh30Lhl.png

I learned about them just a few days before publishing this post (and I link to the second one, in the "technicalities" section).

Though, some statements about analogies should be taken with a grain of salt, see: Tal Linzen, Issues in evaluating semantic spaces using word analogies, https://arxiv.org/abs/1606.07736.

I've never found the "vector space" of word2vec remotely satisfying. In order to form a vector space, you need to also be able to make sense of scalar multiplication, and you need to be closed under arbitrary linear combinations. What is 2king? What is 3king - 2green + 0.5brutality? You can kind of make sense of this for adjectives, but it really breaks down with nouns.

You may already know this part, but for anyone who doesn't know: vector space models very often use cosine distance to make comparisons, instead of Euclidean distance. In this model, you can visualize your vectors as points on a unit hypersphere, where the distance between two vectors is how far apart they are on the sphere. x+y finds the point between them, x-y finds the point between x and the antipode of y (or, perhaps more intuitively, pushes x away from y). 2*x doesn't have any additional meaning (it's the same as x). But x+2y is the equivalent of finding a point that is proportionally closer to y than x (I think this is 2 times as close to y than x, but I didn't do the math). edit: to be clear this paragraph is very hand-wavy on the math, and is just meant to create an accurate-enough visual.

My intuition of why things like king + man - woman work is because the points in the vector space model happen to create a well-behaving manifold with smooth meaning changes. It's not very principled, but it does work.

I wrote a series of blog posts with a coworker about doing this with music:



Instead of mixing nouns and adjectives, we do things like mixing songs and artists and radio stations etc. In the second post we show how Nirvana - Kurt Cobain + Female Vocalist works remarkably well. I've studied empirically why this worked, and the best I could come up with is that the high dimensional space we created had a very dense set of points in the region of popular western music that led to a smooth manifold.

Those posts are really interesting.

I address this point. For scalar product, well, these word vectors are defined such that their scalar product approximate the pointwise mutual information between a pair of words (or sometimes other quantity, depending on the actual algorithm).

For differences - see this section: http://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html#....

I mention also that multiplying a word by a factor (for PMI compression) results in a word of similar meaning, just being more characteristic (bear in mind that for other models in can be related e.g. to word frequency or other properties).

But you are right, that there are some problems with linear structure. Some of them were brought to me be Omer Levy (a researcher in this subject). I think an article that summarises it the best (or rather: shows empirically that it does not always work as intended) is:

- Tal Linzen, Issues in evaluating semantic spaces using word analogies, https://arxiv.org/abs/1606.07736

Right, I don't deny that you get something sort of vector-ish and some operations that make intuitive sense some of the time. That's cool enough on its own! There's no need to pretend that we've actually constructed a vector space. Just say that we've arranged words on a hypersphere with an inferred distance metric and call it a day.

(Also, by "scalar product" I meant "scalar multiplication"--the product of a scalar and a vector, not the dot product. It's pretty clear how to make some sense out of the dot product, but it's pretty hard to make consistent sense out of scalar multiplication. Apologies for being unclear.)

There isn't a word at every point in the space. Is that your objection to calling it a vector space? I think of every point in space as representing subtle gradiations of concepts, only some of which are instantiated by words.

In any case "vector-ish" is an appropriate word here. As with logistic regression for complex problems:

"Essentially, all models are wrong, but some are useful." - George Box

It seems to behave a little like positions, then. London makes sense. Paris makes sense. London - Paris makes sense. But 2*London does not.

> London - Paris makes sense. But 2London does not.

I guess you know this, but for others: London-Paris makes sense, but has a different type to London and Paris (it's a vector, not a position) and while 2London doesn't make sense, 2(London-Paris) does make sense (it's a vector with the same direction but twice the length).

Such a system, with two distinct types -- positions and vectors, with vectors being the differences between positions -- is called an affine space. You can identify positions with vectors by picking a distinguished origin, but then you don't get the type-safety that forbids ridiculous expressions like 2London.

I thought it was putting words on the surface of a sphere and just doing math with cosine distance?

That's fine, but then you shouldn't pretend that you're in a vector space. Words have meanings, and mathematical concepts have precise definitions.

Constraining points to a hypersphere surface only reduces dimensionality by one dimension. For example a sphere in 3-space still has a 2-dimensional surface to play with. That's still a vector space right?

No; it's not closed under linear combinations and it doesn't contain an additive identity.

Well it does if you use the right coordinate system! As the article mentions, most people don't normalize correctly and only get passable answers by luck. You can add locations on the earth's surface if you pick a point to be the origin.

Then you really shouldn't look too closely at physics.

What is a secondmeter? A voltvoltgram?

I don't follow the point you're making. A volt plus a meter is just that, a volt plus a meter. A king minus a man plus a woman is a nonsense statement. Neat, but not as not exactly illuminating.

I'm glad it can be useful. But, I agree it seems to leave a lot lacking.

A king minus a man is a monarch. Add a woman and you get a queen. I don't see why this is so complicated. It is just like adding or removing adjectives from something.

This is nonsensical if you don't already have some existing notion of a relationship. These are vectors, not concepts. There a loads of operations that don't have any intuitive notion, yet are well defined in the context of vectors and most likely yield useless results.

Well isn't that true of all abstractions? They give us tools that we couldn't use before, and some of those tools don't make sense. I mean, in real physics it doesn't make sense to add different dimensions, eg time and distance, but math lets us do that, because they are 'just numbers'. We need to know that dimensions can only be multiplied and divided, and dimensional analysis needs to be done afterwards to check the unit of the result.

However, something like the dot-product here does make sense, since you can use it to determine similarities of vectors.

>This is nonsensical if you don't already have some existing notion of a relationship.

The engine does have an emergent notion of the relationship, which is the whole point.

That doesn't follow in any addition sense. A king minus man can just be a dead king. Or a eunuch. Or, a lonely king.

Similarly, add a woman to a person, and you just have two people. One being a woman.

I get that there is an answer that seems fun... But there is not a deep meaning to the math.

This is like the games where "send + more = money". Fun. But is there really something illuminating?

Stop thinking literally, and instead think of the concept. A king is a male monarch. So remove the male, and you have monarch. Add female and you have female monarch = queen.

Similarly with woman + person, the concept of woman is femininity. It makes sense if you think in terms of concepts.

>That doesn't follow in any addition sense. A king minus man can just be a dead king. Or a eunuch. Or, a lonely king.

Yes, and some of those meanings can be closer to the common understanding of reality or the weight of each notion or its probability than others.

Which is also why we can solve riddles and don't get lost in their infinite similar possibilities.

>A king minus a man plus a woman is a nonsense statement.

It's not supposed to be english, it's a query language.

King - man + woman actually is illuminating. Your 6 year old son might even say that's what a queen is. There is useful math here.

I call shenanigans. Find me a place where that algebra, which lacks standard rules of common algebra, makes sense.

Consider, king + woman = queen + man. Which looks neat, but is not a universal truth. It could be concubine, for example.

So, is queen + man also concubine?

Again. I'm glad this works for some things. But really just shows which words are often used together. It does not show any good rationale for their meanings. Unlike math, where 1 + 1 equals 2. Possibly in different encodings. But not just from convention of often being used together.

The answers you are looking for are Nobel-worthy. I have not formally studied linguistics, but you might wanna start there. Regarding the king equation, I think "king" is actually a vector in this context, as with the other objects. If so, then it is using the vector space formalism.

That said, I definitely agree with you. An English speaker may find any of these reasonable:

1. King - man = expensive clothing 2. King - man = prince 3. King - man = queen

What is "king"? What is "-"? What is "man"?

If a king is a dressed up wealthy man, and you remove the man, you have wealthy clothes? Or, does removing the man mean degrading the king back into a boy? Or, does removing a man mean adding a woman? Wait -- what if a king is more than a dressed up wealthy man? Should we include his home? Do we need to subtract the home? How do you subtract a home? Is the king minus a man a prince if the king was a beggar when he was young? ... death of the universe ...

Like you said, there's a combinatoric explosion here. Maybe this example is akin to trying to model each and every trajectory of all 10^23 particles in a gas. It looks like these scientists are stepping back, and looking at the big picture, instead, trying to find something more akin to PV = NkT

>It could be concubine, for example.

Precisely. Models which use this space do not propose strong equality (==). Rather, they would output a series of probabilities, and choose the most likely. Stating king + woman = queen + man is somewhat disingenuous; what should be said (mathematically) is something like the following: the word lying closest to the vector vec('king') + vec('woman') - vec('man') is 'queen'.

To suggest that a NN can't learn something about the meaning of words from a large corpus of text is unsubstantiated, I believe. The statement above suggests they do, I would say. I would not be too surprised if a sufficiently complex NN could 'learn' the concept of gender with a corpus of English text to a decently high degree of accuracy, simply based on vestigial features left from French and Old English.

It makes more sense to sort of factorise each thing into its different components. King is a masculine monarch. Man is a masculine person. Queen and woman are the feminine equivalents.

So king + woman = queen + man is better described as:

Masculine monarch feminine person = feminine monarch masculine person

A bit of reordering of adjectives and it is exactly the same. Even monarch is the wrong word, because you seem to be getting hung up on nouns, when these are all actually a bunch of chained adjectives. Perhaps "regality + nobility + rulery". English is a bad language to describe this, because we tend to noun and verb our adjectives regularly.

>Consider, king + woman = queen + man. Which looks neat, but is not a universal truth. It could be concubine, for example.

That's totally fime, because actual words don't define universal truths either.

Queen could be a band, a transvestite, an actual queen, and several other things besides.

That's the deficiency of AI right now... pattern matching is only one aspect of intelligence that's fairly narrow and only moderately sophisticated.

It's still useful -- you can classify millions of pictures in a meaningful way much faster than before.

I was not claiming against its usefulness. I merely question if it truly has meaning.

That is, x+y has meaning and use in most maths. Here, it seems primarily use.

Well, if I had to give it any meaning I would probably choose the word with the smaller distance from the one that was computed (provided that a sensed scalar product exists here). Closure under linear combinations is not a problem if you separate the "word vector" from the actual word. With the same reasoning as above 0.999999king would still be king.

The fundamental problem is that linearity is probably not that accurate in representing words relationships, but I don't think this is the goal.

> What is 2x"king"?

War? Murder?



Why is this approach preferred to treating words as symbols connected to other symbols? E.g.

   (king (genl headOfState)
         (sex male))
   (woman (genl person)
          (minimumAge 18)
          (sex female))
What are the advantages of storing words as floating point vectors? I can see how inputting huge amounts of text into a simple algorithm might be less labour intensive than manually building (or using a more complex algorithm to build) a dictionary. However, at least you can use your dictionary for other purposes, and its contents are readily verifiable by non-technical people.

From a survey of Deep Learning published in Nature[1]:

"The issue of representation lies at the heart of the debate between the logic-inspired and the neural-network-inspired paradigms for cognition. In the logic-inspired paradigm, an instance of a symbol is something for which the only property is that it is either identical or non-identical to other symbol instances. It has no internal structure that is relevant to its use; and to reason with symbols, they must be bound to the variables in judiciously chosen rules of inference. By contrast, neural networks just use big activity vectors, big weight matrices and scalar non-linearities to perform the type of fast ‘intuitive’ inference that underpins effortless commonsense reasoning."

Pithy version:

"As of 2015, I pity the fool who prefers Modus Ponens over Gradient Descent." - Tomasz Malisiewicz [1]

Superlong version: https://plato.stanford.edu/entries/logic-ai/

[1] https://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pd...

[2] http://www.computervisionblog.com/2015/04/deep-learning-vs-p...

The first quote is simply not true. Symbols, even in Lisp, have properties (as well as values and possibly function values), i.e. they have deep structure. In general, symbols are explicitly linked to other symbols or algorithms. These links are the symbolic analogue of the vectors in word2vec, except that they are explicit so you can see what they mean, but you have to enter them manually or use a more complex machine learning algorithm.

The author of the computer vision blog post doesn't seem to know much about symbolic AI. Some of the comments point this out.

The various benchmarks are a great opportunity to demonstrate the superiority of the symbolic approach over the neural network based models. I encourage you to try, there's no better way to dispel the doubters.

I don't think they're suitable for the same tasks, so I don't consider the two approaches to be in competition with each other. It's also very difficult if you're one person with limited time up against companies the size of Google.

I'm busy on other things at the moment, but I intend to develop a rule-based system some time soon. I can rule out a neural network straight away because there's no data available, the rules are explicit, well documented, and have to be followed, and the system has to justify its reasoning.

That is not to say I wouldn't consider using a neural network for a perception task.

Is anybody working on using a neural network to build and update a symbolic graph, and vice versa? Or at least using a symbolic graph as an input to a NLP neural network, so the network could learn to rely on the symbolic graph when it is useful?

Yes. ConceptNet [1] and distributional word embeddings go really well together, and can compare word meanings better than either one alone. Here's the preprint of the AAAI 2017 paper [2].

[1] http://www.conceptnet.io/

[2] https://arxiv.org/pdf/1612.03975v1.pdf

This is ridiculously simplistic - take woman, for example. The distinction between woman and girl isn't nearly as simple as age. There's a sorites paradox in trying to find a dividing point of age, but actually woman connotes a bunch of other concepts probabilistically. It suggests independence and maturity vs childishness and cuteness. While girl is often used in the context not of children, but women as objects of courtship - girls on a night out, girlfriend etc. And a poem or song may play with the multiple meanings and ambiguities - where is Britney's Not a Girl in this symbolic analysis?

The only way you can get symbols to work is with weights. Follow this to its logical conclusion and I think you'll end up with a system isomorphic with the vector approach, with dimensions representing something like symbols.

It was a very simplified example. The Cyc database contains more realistic examples. Yes, I'd seriously consider augmenting symbols with numeric data. You could even store the vector obtained from word2vec as a property of the symbol if that's found to be useful in capturing nuance, but you can't store symbols on floating point vectors.

Here's another example.

    The fat cat is sitting on a mat.
Only someone who has spent too much time away from people could possibly think that the meaning of this sentence is that somewhere, there is a fat cat sitting on a mat. It's a direct allusion to books used to teach reading to young children. It's a reference to simple sentences and simple words.

And even within the world of the sentence, it only has a vague meaning. What exactly makes the cat fat? Is it neutered or lazy or overfed? Why is it sitting on a mat - is the mat outside a door, is it waiting to be let in? What kind of a cat is it - could it be a big, dangerous cat? The sentence is laden with signifiers and unknowns. Western children are taught to consider sentences like these in the abstract, but it's not a natural way of thinking, because it's not practical in a life lived connected to the world.

Abstract hypotheticals are the hallmark of more disconnected concerns, and we teach our children this early, in part using silly, deliberately vague sentences like these, and discouraging curious questions that might resolve the ambiguities.

An AI system that's designed to handle abstract sentences like these is not one designed to understand human language, because humans don't reason like this unless they're thinking analytically - and even then, they do so blinkered with biases and errors.

>Only someone who has spent too much time away from people could possibly think that the meaning of this sentence is that somewhere, there is a fat cat sitting on a mat.

or someone who did not grow up using the English language.

I grew up with English and when I read that sentence, to me it means that there is a fat cat sitting on a mat. Not sure what point was trying to be made here.

The edge between a word and a symbol is where the number belongs - and arguably it should be a time series of numbers. In the vector view of the world, the dimensions of the vector are symbols - the number in a particular slot for a word vector is the number on the edge between the word and the symbol. But the "symbol" represented by a dimension may be a compound or quite abstract concept, it may not map trivially to a single word.

>The distinction between woman and girl isn't nearly as simple as age. There's a sorites paradox in trying to find a dividing point of age

Or, you know, you can use a cut-off point, like 18 or so.

If we're trying to assign meaning to words as used by humans, that's not even in the region of correct. It's not even wrong, it's on a different planet entirely.

Predicate calculus is an entirely parallel system that only has a representation in a strict subset of human language. A reverse mapping is hopeless and misguided.

I feel like this line of thought, that you can box up words with really concrete meanings that you can then reason about with logic systems is a kind of trap for people who've spent too much time in an analytic frame of the world. It's one where the smarter you are, the further down the road you can get without realising it's a dead end. At best, such a system could only augment.

>If we're trying to assign meaning to words as used by humans, that's not even in the region of correct. It's not even wrong, it's on a different planet entirely.

It will totally be fine for 99% of applications. Marginal returns.

In short, manual labelling rarely works for natural language processing (and top-down mathematical approach). See:

Peter Norvig, On Chomsky and the Two Cultures of Statistical Learning, http://norvig.com/chomsky.html

Also, we want something that is automatic. It means easily adjustable to other contexts (and languages), inferring information about neologisms (e.g. semantic meaning of emoji), etc.

Whether its symbolic or NN based or some other representation, the main thing people are missing here is the lack of symbol grounding. That is why we the open 3d environments based on virtual senses and motor outputs are the most likely to ultimately move forward in terms of things like NLP. See http://courses.media.mit.edu/2004spring/mas966/Harnad%20symb... or http://www.goertzel.org/papers/PostEmbodiedAI_June7.htm

What are non edge-case examples of the symbolic approach not working?

If my recollection is correct, Peter Norvig changed his views around the time he joined Google. Google have a particular way of doing things which doesn't include symbolic processing.

It should be possible to infer using a symbolic approach, or more simply just provide a new definition.

> What are non edge-case examples of the symbolic approach not working? ... It should be possible to infer using a symbolic approach, or more simply just provide a new definition.

I've met some really really brilliant people who've been banging their head against that particular wall since the 1980s. The MIT AI Lab crew, for example, poured untold brainpower into symbolic inference. There was the whole "expert systems" movement. This all failed miserably, in disgrace, because nobody could ever get it to work, and "AI" became a dirty word. Later, there was Cyc, which was hyped on and off throughout the 90s. http://www.cyc.com/ After that, there were people who tried to reason over RDF tuples, which didn't work either: http://www.shirky.com/writings/herecomeseverybody/semantic_s...

This idea pops up every 5 or 10 years and wastes a generation of brainpower. It never works. And let me be clear: There have been some terrifyingly brilliant people who were convinced that it ought to work, and who spent years of their life on it.

Meanwhile, any joker who can code up Bayes theorem or single value decomposition can get some results in a couple of weeks. Probability and statistics get results (as do more advanced numeric techniques). Logic deduction fails. I'm not even sure I could explain why. But I encourage you to think long and deeply on what Norvig has written on this subject. Or just buy Norvig's two AI textbooks (written before and after he discovered the joys of probability, basically), do some exercises, and compare the results you get.

It's clear in hindsight. Natural language is the serialisation of human thought. Humans don't think symbolically in terms of rigorous logical statements. We stretch and play with definitions, sometimes making up a new definition on the spot, and relying on context and our shared experiences for the other person to figure out the meaning.

Statistics isn't sufficient for fully understanding natural language (for that, a computer would have to go out and experience the world in the first person). But it is necessary.

> I've met some really really brilliant people who've been banging their head against that particular wall since the 1980s.

The idea of breaking meaning into an inventory of discrete components like this is at least as old as Hjelmslev, I think even Saussure touches on it.

Part of it seems to me to be that you're breaking down words into words. Even if you write it in uppercase and call it a primitive, you don't have to spend too much time on cognitive linguistics to know that there's really nothing primitive about MAN or MONARCH...

At that point, you do need some sort of grounding, e.g. a human reading the text, or a neural network embedded in a robot. It isn't symbols all the way down.

Well, AFAIK all practical systems for translation use data-centric approach, rather than any top-down one.

See also: Andrej Karpathy, The Unreasonable Effectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/, and try to replicate it with any formal semantics (good luck!).

Additionally, formal systems rarely incorporate for actual language, with some things being technically correct, but sounding weird, or things being incorrect, yet - prevalent (and a root for language evolution). See also: char2char translation (which accommodate for e.g. neologisms or typos).

But does the translation use word2vec or anything similar?

The original Altavista Babelfish (which used SYSTRAN) used rule-based machine translation. It has been replaced by Bing Translator, but my recollection of the Babelfish was that it was accurate enough to be usable, and better than Google Translate was when it was first released. Google Translate has improved a lot recently. My only problem with it is that it doesn't understand the meaning of the words it's translating.

Neural networks are perfectly suitable for perception, e.g. image recognition. No argument there.

> My only problem with it is that it doesn't understand the meaning of the words it's translating.

I think that's a fallacy that will haunt AI forever (or, more likely, will be the definitive civil rights struggle ca. March 25, 2035 6:25:45am to March 25, 2035 6:25:48am)

We tend to move the goalpost whenever AI makes advances. Where many people would have considered chess a pretty good measure of at least some aspect of intelligence, it seems like mundane number crunching once you know how it works.

It may be that we really mean consciousness when we say "intelligence", although if we ever find an easy formulation that creates a perfect "illusion" of consciousness, it may end up having some strong effects on people's conception of themselves that I don't necessarily want to witness.

thanks for sharing this... stirring thoughts. Did my best to paraphrase here (https://twitter.com/iamtrask/status/818090203990659072) but I really think we should spend more time on this as a society.

I think a vector of weights in different dimensions constitutes meaning. Words don't have a single meaning; they're a shared implicit web of allusions and hints. We only understand one another to the degree that our allusions are shared - it's why nonnative speakers miss nuance, and it's why poetry is a thing.

If there are symbols, they're in the dimensions of the vector; but words only probabilistically suggest meaning, they don't categorically denote it.

Words with multiple meanings will have multiple vectors each depending on context, and if you train your system without taking that into account the vectors will be averaged and the individual meanings will be lost.

With a symbolic approach, the right thing is to use a different symbol for each meaning, and disambiguate based on context (which you could identify either statistically or by using rules).

I feel that you don't appreciate how plastic language is, and how indirectly meaning is conveyed. New words and meanings become cultural shared knowledge on a weekly if not daily basis, and shades of meanings are added to existing words and phrases simply by being used in a milieu or by a sufficiently famous person. Symbols might be used as hidden variables representing concepts, but to fully represent how vague and allusive language is, you've got no choice but to make everything fractional. I don't see the result being anything other than isomorphic with a vector.

In particular, words are not repositories of meaning. They allude to concepts; new concepts are created and get forgotten on a regular basis. The connection between words and concepts waxes and wanes over time, and even the very timeline of a connection's strength can be used for allusion: using language to represent concepts that can only be coherently mapped by using previously-stronger allusions conveys a sense of being old-fashioned, while the reverse conveys future-thinking. Using allusions that are stronger within a milieu conveys social signalling information about group membership. Etc.

There's no way a human-maintained database is going to capture the subtlety here on anything like a timely basis. There's no universal truth, everyone's map is a little bit different, and the map is changing all the time.

Yet we've evolved to use symbols, which are short sequences of phonemes or glyphs, and to connect these together to form sentences, which are lists of symbols.

People who speak the same language are able to communicate perfectly well, almost all of the time, across continents and centuries. New words are rare compared to existing vocabulary and often soon disappear from use. New concepts can readily be mapped onto existing vocabulary. People are able to learn other languages and improve their own with the help of dictionaries and grammar books. Things like humour, cryptic crosswords, and social signalling are edge cases. And things like deception are unrelated to language understanding at the semantic level.

We're discussing the best way for computers to understand natural language and communicate with people, and in practice that's either going to use unambiguous language or it's going to need human help or verification.

We've evolved to use symbols to communicate (to transfer information), not to think using symbols. We use symbols as a way to transfer information over slow, lossy channels. Inevitably some information is going to get lost along the way, either because the symbols don't represent the meaning fully, or because if they would, they would be too long and cumbersome to use. Speed won over precision.

People who speak the same language are still prone to misunderstandings.

Take the word "soon" in your example. How many unambiguous meanings does it have?

We may be overestimating our success in communication. Heck, I'm not even sure if my understanding of soon equals yours. Is it a century, a few decades, a few years? And why is it so different from the meaning when I use it to answer when lunch will be ready?

in practice that's either going to use unambiguous language or it's going to need human help or verification.

The former doesn't exist in human languages (we'd otherwise have gotten rid of the lawyers long ago), and the latter is infeasible. There is a another way.

>My only problem with it is that it doesn't understand the meaning of the words it's translating.

Wouldn't that mean it was an actual full AI?

What you listed is called a logical form (and there can be many). A common logical form is CCG/DRS. But yet another representation structure that is a logical form that I don't see many people admit to it being a logical form is the RDF triple, used in freebase etc. You could represent the entire freebase in your notation.

There is work being done to merge the formal/structural semantics and distributional semantics. I have been working on that for over a year at my startup (not necessarily having successes, mind).

The benefit of using distributional semantics (word embeddings, etc) is GPU processing and libraries. My logical form parsing library uses dynamic programming and chart parsing with a LOT of tree pruning to parse a simple sentence, while distributed semantics merely require me to multiply 2 matrices/vectors together - something GPUs excel at.

The frames in my previous post could be implemented in a graph database, or RDF triples. It's similar to the representation used in Cyc.

I'm quite happy with a hybrid symbolic/statistical approach, which could be useful for disambiguating words and phrases in context.

I think the trick is to avoid generating the parse tree in the first place.

Today's hardware is pessimized for symbol and list processing. They can't be done on GPUs, and CPUs work better on contiguous data.

I wonder, if you could train on just Disney movie scripts, would

King – man + woman = Princess?

Or maybe: King – man + woman = villain?

(because of characters like Cruella, Maleficent, Ursula, etc.)

Here's word2vec as REST API (sign in to interact from the browser): https://algorithmia.com/algorithms/nlp/Word2Vec

I didn't know that one. Is it Google's word2vec dataset?

Yep, based on Google News. Model is 1.5GB when zipped.

Can someone explain a bit more on vector space model for words (and documents)? I first saw that approach in prof. Erik Demaine lecture on algorithms [1], and also here. It's fascinating how linear algebra and vector spaces pop up in unexpected places.

[1] https://courses.csail.mit.edu/6.006/spring11/lectures/lec01....

For documents there are various approaches. I would suggest using Latent Dirichlet Allocation, see my links here: https://pinboard.in/search/u:pmigdal?query=lda.

And here i thought this was a statement about the strange rules of sucession in modern western monarchy. Riddle me this: why are women who marry kings made queens while men who marry queens are made only princes?

Because, like a deck of cards, king trumps queen in the ranking of titles and they want to place the person who married into the family below the one who was born into it.

Close, but i would say that "queen" covers two different jobs: being a female monarch and the older 'wife to the king'. Some queens are monarchs and some are only wife, with no right to rule on thier own (ie they are not daughter to any previous monarch). Kings are only ever kings by right.

I believe there is such a thing as the king-consort.

Interesting how in the diagram at the top that all of the female words are either less than or equal on the queen axis to male counterparts.

Its interesting but ultimately to really 'under' 'stand' language in a deep way, the systems will need representations based on lower-level (possibly virtual) sensory inputs.

That is one of the main enablers for truly general intelligence because its based on this common set of inputs over time, i.e. senses. The domain is sense and motor output and this is a truly general domain.

Its also a domain that is connected to the way the concepts map to the real physical world.

So when the advanced agent NN systems are put through their paces in virtual 3d worlds by training on simple words, phrases, commands, etc. involving 'real-world' demonstrations of the concepts then we will see some next-level understanding.

As a Pole you shoul know better. It isn't true in all cases, i.e. Jadwiga was king, not queen.

I've always found it funny what these machine learning insights mean for how humans think.

You have people who focus on grammar and spelling. But word embeddings collect their insights by taking any sequence of 5 words, taking out the middle word, jumble up the result (technically they express it in a way that ignores order. They're expressed as 1 bit per word, 1 means the word is in the sentence, 0 means it's not. The sequence of the words in the input to the network is completely independent of their sequence in the sentence). And they understand that "king is to man as queen is to woman" and lots of other things.

When going deeper you quickly start to realize a few things : in 90% of sentences the sequence of words does not matter. No, not even if "not" appears before or after the verb (and thus refers to the subject of object of the sentence). Word sequence. Doesn't matter. Which noun you place an adjective next to. It is semantically important. Really important. Every English (or any language I imagine) teacher will hammer the point home again and again. And yet ... it almost never matters, in the sense that getting it wrong will not cause something dumber than a human to misinterpret the resulting sentence. So why do we care ? Social reasons (ie. to fuck other humans, or more generally, to get them to do stuff for us)

It's a weird thing that keeps coming back in machine learning. Humans think their reasoning high level. Yet algorithms that keep track of maybe 2 or 3 variables per individual can predict the actions of crowds with uncanny accuracy. Tens to hundreds of thousands of people, each believing they're individuals and think about what they're doing, take not just the same decision, but with an enormous probability will come to that decision within minutes of each other.

I am reminded of a quote by Churchill. Humans appear smart individually, but it's a trick, an impression, it's a facade, almost an illusion. If they act in group, said intelligence is utterly gone, and they almost always act in dumb ways in large groups, even when they are acting alone. Intelligence is 95% a parlor trick used in conversation, to make friends, or to mate, like a peacock's feathers, and only 5% or less something we actually use to act. So it's purpose, from a species' perspective, is not at all to act intelligent, merely to appear intelligent to others. Second is that everyone, even if they are smart and correctly reason about the world around them, will still act stupid. Without someone to impress, you could have a triple nobel prize, you won't act it. So intelligence doesn't work in an individual, and it doesn't work in most groups. It only works in groups where the interaction of the group has people impressing each other with what they did, with some sort of reward being given for that.

How can it be semantically important yet not matter? There are in fact languages where word order is very free, but English is not one of them. e..g. "Bob shot Alice" is very different from "Alice shot Bob". Which noun you place an adjective next to doesn't matter? So "The dumb teacher taught the student" means the same thing as "The teacher taught the dumb student"?

No, it means that in the majority of cases the context is clear regardless of the word order, so a neural network can learn the meaning despite jumbling the word order.

"attacks mouse cat"

"attacks cat mouse"

"cat attacks mouse"

"cat mouse attacks"

"mouse cat attacks"

Only "cat mouse attacks" could mean something that's even slightly different.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact