AI is kind of a strange beast like that: it's gone through a few very
different phases and it's difficult for one person to understand all of them
equally well. Which of course makes it even harder to avoid reinventing wheels
and repeating mistakes. A bit of history would do us all a world of good.
Btw, I'm getting the feeling most people here will probably hear of Fernando
Pereira for the fist time but he has a very long career in AI and NLP. He was
a prominent symbolicist, with some important contributions to logic
programming (he was one of the co-founders of Quintus, the company that sold
the first commercial Prolog, along with Warren, Byrd and others). Then he
turned to statistical AI and now he's a VP at Google (a.k.a. the den of the
connectionists, if I may be so bold). He's probably one of the few computer
scientists around who understands both symbolic and statistical AI in equal
measures. If anyone is qualified to talk about their relative merits, that's
(and if I sound like a bit of a fangirl- that is because I basically am.
Pereira is one of my logic programming heroes and a great teacher to me,
albeit unbeknownst to him :)
And here's the meat of his response:
> Idea! Let's go back to toy problems where we can create the test conditions easily, like the rationalists did back then (even if we don't realize we are imitating them). After all, Atari is not real life, but it still demonstrates remarkable RL progress. Let's make the Ataris of natural language!
> But now the rationalists converted to empiricism (with the extra enthusiasm of the convert) complain bitterly. Not fair, Atari is not real life!
> Of course it is not. But neither is PTB, nor any of the standard empiricist tasks, which try strenuously to imitate wild language
For me the problem with NLP and deep learning, or indeeed any empirical method, is that the evaluation metrics we have are imperfect. Take BLEU scores, from Goldberg's post, for instance. Those basically compare generated text to some arbitrary target. Originally, they were proposed as metrics of machine translation quality, so the target was some existing translation and the machine-generated translation was examined for coverage of this human-made translation. But of course, there is no principled way that we know of to choose one translation over another- or even say whether a translation is a good or bad translation, on its own. And that's true for translations by humans also. You give the same text to 10 professional translators, they'll give you 10 different translations. Then you give each of their translations to 10 readers and ask them for their opinion, and you get back 100 different opinions.
The translation task itself is not even particularly well defined, exactly because there may be any number of valid translations (possibly, infinitely many) of a piece of text in another language. So, with translation, we have an ill-defined task with an arbitrary metric. And that metric of course is lifted from its original task and used to evaluate language generation and so on. Then someone comes along who knows how to train a deep net but has no idea what the purpose of their chosen metric is, or what it does and has no understanding of the task itself- and claims to have solved it because they got good results on that metric.
It's a bit of a methodological mess that's not going to lead to much progress. People can keep piling on these "results" for as long as they like and pretend that they're "solving" this or that problem- but in real-world terms, nothing is really being solved at all.
I am Greek and translations from and to my language are utterly ridiculous, on the level of Bozo the clown doing the translation with his underpants on his head back to front.
Typical example: I put in the Greek word for "swallow", the bird, and ask for the French translation. I get back the word "avaler" - the French word for "to swallow", the verb.
That's my little benchmark there, useful because Google translate has been doing this consistently, for a good few years, before it used neural networks, before it started claiming its setup essentially constitutes an "interlingua" etc etc.
Note that the bird and the verb sound nothing like each other in Greek, or French. They sound the same only in English, so GT goes from Greek to French through English. Because it doesn't have enough parallel texts to go directly to French. And so it sucks, because it doesn't have enough data. You can ask native users of other languages-that-are not-English or have few ish speakers, perhaps Turkish or Hungarian etc. I'm pretty sure you'll find out they have similar experiences.
So I don't know what metric they use to evaluate their results, it doesn't seem to be a particularly good metric of translation quality. Maybe they just care more about how many people use their system and try to optimise for that, rather than going for the much harder to know quality.
Btw- no, I can't be sure of that. Why do you say I can? Do you know what metrics they use?
The main problem for natural language understanding is not parsing and not even the semantic and pragmatic representations per se, it has always been the understanding. This requires an adequate knowledge representation and the drawing of inferences from it, and I don't believe that any substantial advances have been made in that field. Computational ontologies have grown larger and there are more "frameworks" than you can count, but none of them offer much knew and promising approaches like geometric meaning theories are in their infancy. Knowledge representation and, generally speaking, the problem of how to integrate different information sources in useful ways are essentially unsolved problems.
Just my 2 cents. Note that I'm talking about the principal problems, not about specific practical applications for which you can use the statistical sledgehammer to some extent.
Apologies, I'm an outsider to the field, but what exactly are you referring to here ? The whole vector-space semantic embedding that was popularized by works like word2vec ?
Peter Gärdenfors: Conceptual Spaces - the Geometry of Thought. MIT Press 2000 (Paperback 2004).
It is very easy reading. The problems of geometric meaning theory are compositionality and quantification - how to get the expressivity of logical representations in addition to nearness measures, fuzziness and so on. There are some interesting approaches:
Martha Lewis & Jonathan Lawry: Hierarchical conceptual spaces for concept combination. Artificial Intelligence 237 (2016): 204-227.
Diederik Aerts, Liane Gabora, Sandro Sozzo: Concepts and their dynamics: a quantum-theoretic modeling of human thought. Topics in Cognitive Science 5 (4) (2013):737-772. [and other work by Aerts]
Aerts work is fascinating me personally, but it's unfortunately above my level of mathematical maturity. This is a general problem in this literature, maybe some solutions are already there but they also need to be sold in a way that allows linguists to understand and use the methods. Montague was lucky (well, not personally, of course), because he had scholars who were able to package his dense ideas in more verbose and easier to access textbooks.
Another short book worth reading in my opinion, though very programmatic in nature:
Jens Erik Fenstad: Grammar, Geometry, & Brain. CSLI Publications 2009.
All semantics has ever been about is not causing parse errors during the decoding step of the sentence, and the constraints imposed on that.
'Syntax' is usually confined to "low level" concerns, while 'semantics' to those above, but the distinction is arbitrary and artificial.
There is no meaning but usage.
(If it isn't clear, this comment is snide to GP.)
What you said made little to no sense and had no backing. Yours was a perfect example of layman speculation without any basis. Nothing you said made any coherent sense, nor had any backing. They don't even deserve a response.
Thank you for all the work you do moderating this community.
However, that distinction is arbitrary -- there is only the question of if the sentence is accepted by an agent (eg, person) as a well-formed sentence.
Any full accounting of the class of well-formed sentences must embed the semantic concerns; violating semantics is a syntax error (albeit, not usually a "first order" one). Similarly, even base syntax, such as the subject/verb/object distinction and ordering is carrying semantic information about word usage. The distinction between the two is non-existent: a full accounting of either must embed the other.
So semantics is syntax -- if you write a system of rules that only accepts valid sentences, then the rules will end up carrying the semantic structure of the language in them.
I suppose I left out why this might matter --
In the quest to build an AI that understands semantics (ie, that "understands meaning"), we can bypass attacking that problem directly by training it (eg, a NN) on the full acceptance task (joined syntax and semantics -- classifying a sentence as proper English or not), and then truncate the network away from "low level" features (and perhaps at the other side, focused on 'yes' or 'no') to extract a network that has (most of) the (abstract) semantic structure embedded. We could then utilize these "middle level" features as a sort of rosetta stone, to train low level networks to embed content for them to understand, and high level networks to utilize their output on decisions to repurpose "understanding" across tasks.
I would argue that using things like Word2Vec (or the resulting vector space of words) is a similar idea.
2) Me gizmo.
One of the above sentences is valid English, the other one is meaningful. That is the syntax/semantics distinction.
The distinction is a bit fuzzy in places (for instance, inflectional morphology), but does exist.
You contend that the top sentence is valid English; I disagree with that. It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use. Not being one that an English speaker would use makes it invalid English -- it's just a case where the first order approximation is wrong.
Similarly, if your syntax rules reject the second sentence, they're wrong -- since it's a sentence that English speakers can parse: the conclusion can only be that your syntax rules don't actually match the language you're trying to model.
I get the distinction that you're trying to point out with syntax/semantics, but you're ignoring my point: that divide is artificial and 'semantics' as you mean it is merely higher order syntax.
You haven't shown there's an inherent meaning to the difference (ie, that you haven't just drawn an arbitrary line in the sand), just that you can find examples that (naively) fall on different sides of it.
It is conceivable that there is some other language (eg. not natural human language) which does not have a syntax/semantics distinction, but that hypothetical language is not what linguistics studies.
You haven't pointed out how semantics is anything but higher order syntax -- merely outlined the way in which higher order syntax interacts with our perception.
I agree that there's a difference between the two sentences -- I disagree that it's because they're different fields of study instead of different edge cases of the same underlying notion of parsing syntax. (I especially disagree that the way forward on teaching machines language involves that distinction.)
I would appreciate you referring me to references on the semantic/syntax divide being "natural", though.
Well, an English speaker did use it.
Btw, who do you consider an "English speaker"? Do I count as an English speaker? My native language is Greek but I speak English as a foreign language. I often say things that a native English speaker wouldn't say- but they convey a meaning that I wish to express. Do these utterances count as things that "an English speaker would use", or not?
I say they do. English speakers can say anything they like. In fact, they do, everyday, and as they do their language changes along with that.
Human language seems to be a lot more flexible than what you give it credit for. Semantics being just some sort of higher-order syntax (which btw we just haven't found yet) would make for a much more limited language ability than what we currently have. We'd be restricted to only a finite set of forms and we could only say a finite number of things. Obviously, that's not the case.
An English speaker used it as an example of a statement that would cause a parse error for most English speakers, and so it did. The speaker said it even caused such a reaction in them. I would argue that they weren't attempting to use English, but quasi-English in an attempt to communicate the boundaries of English to people who can parse English (which inherently has some ability to parse quasi-English).
I don't think it's useful to pretend "English" is a coherent class of parsing rules (either over time or over population) -- there's only a roughly similar set of parsers undergoing continuous memetic evolution, broken up into subsets that are more similar.
At the end of the day, English is as people who can parse some subset of it do -- and it might reach the point where it makes more sense to talk about English languages than an Enish language.
That being said, your last paragraph confuses me:
It's not obvious to me that we aren't restricted to a finite number of forms in language.
It's not clear to me why you think semantics being higher order syntax requires that it only be capable of finitely many forms.
(The rest of it seems dependent on those two conclusions.)
| x | | | | I
| | | | | K
a b c d
It would be correct to say that the sentence in isolation is a parse error, but with the diagram, it's merely elaborate syntax.
My point isn't that there aren't higher order rules (and approximations) -- just that the division of those rules into a separate area of study is artificial.
It's not contentious to point out that chemistry is just an approximation of physics because the actual higher order rules are too complex to study directly -- but it seems to be to point the same out in semantics and syntax.
Syntax and semantics, or structure and meaning, are completely different things- in fact they are entirely unrelated and their only association is by arbitrary convention (we all agree that certain structures are associated with specific meaning).
This is why you can do translation, for example- where you're essentially taking the semantics out of one kind of syntax and putting it into another.
In NLP it's easy enough to reproduce the structure of a corpus- simple, unsmoothed n-grams will do that well enough already and with a little more statistical elbow grease you can train a model that reproduces your text very well and even generates new text that looks quite resonable. Except of course that it rarely makes any sense at all. To generate text that is both syntactically correct and makes sense you need a lot more than that and we haven't really managed to do that except for very short durations (a few words at a time).
I'm saying: in NLP we can deal with structure very well indeed, but meaning is still a long way off. If it was just a matter of "more syntax", we'd have solved all our problems a long time ago.
If semantics were actually distinct and carried by both languages, you could translate without that loss of subtle meaning.
I also completely disagree with your last sentence: semamtics could easily be syntax that has rules which are hard to compute.
NLP has trouble with long-range (or broad) effects, which are (some of) what I mean by higher order syntax.
Of course you can call the rules of well-formedness "higher-order syntax" in the sense that the computation required to decide it is of a higher order than syntax, but the distinction between syntax and semantics is by no means unnatural. It has been discovered independently several times; some ancient studies of the syntax of Sanskrit have survived to this day.
That's a very strange thing to say. The thing with human language is you can say anything you like, including things that make no sense at all and things that are syntactically incorrect. You can easily find examples of meaningless, syntactically correct sentences, like Jabberwocky ("All mimsy were the Borogoves and the mome raths outgrabe" etc). It's also easy to find examples of sensible sentences with incorrect structure (see twitter.com).
In fact, what is "incorrect syntax" keeps changing all the time, but we can still say the same things as we always could (plus a probably infinite many new things besides). If syntax was tied to meaning as tight as you say, we'd probably have only one or two languages and no dialects. Language would be a static, unchanging thing and we'd need no NLP, or translators etc.
Why not start working with more structured agglutinative * languages like Japanese/Korean and Indic family (Sanskrit esp.) .
How about other European languages ? Are they better structured empirically ? I hear German is very grammatical, and that Hungarian is ... erm odd ?
( Note: I know occidental tradition likes to split Indic tongues, and Indo in Indo-European is not considered agglutinative. I don't subscribe to this view. I use agglutinative in the sense of Panini: "particles" sticking to stems/roots/words - phonetic modifications are irrelevant for grammar.)
Just want to point out that "grammatical" probably isn't the word you want here. Every language is grammatical by definition in the sense that there are rules that govern its sound system, word formation system, syntax, etc.
The concept you're getting at, though--that some languages are easier for computer programs and/or speakers of Indo-European languages to understand--is sound.
I don't know too much about computational linguistics but it seems highly analytic languages could be easier to work with, but I'm not sure.
FWIW it seems the structure you're talking about exploiting is at a morphological and syntactic level, which modern language models tend to effectively handle. Semantics are a much harder problem.
I do not think that is correct. Anaphora exists in many languages. Check out the Anaphora article on wikipedia and click on different language versions. There are example sentences for many languages.
There are translation for the Winograd Schemas into a couple of languages. Granted I found some of the translations a little unnatural in some cases but they are still understandable and expose the problem.
This is true in particular of anything that pertains to reasoning and knowledge representation. People still are trying to "infer rules" and do logical, rather than probabilistic reasoning. I get why that is. To me though, the kind of real life reasoning that humans do seems heavily probabilistic and contextual, Bayesian almost. And there's next to no notable work going on in that direction.
That is because it's very hard to collect statistics on something that you
can't really quantify- meaning, in this case.
There was a thread on HN a couple of days ago about a blog post where someone
was experimenting with, among other things, training an LSTM network to generate Java programs .
In one example, the LSTM did really well in reproducing the structure of a
Java program, with import declarations, followed by a class implementing an
interface with a few methods with structured comments and throws declarations
and everything- and even a test!
On the other hand, this program was completely useless. From a cursory glance
it would probably not even compile (e.g. it refered to undeclared variables
etc). There was one method named "numericalMean()" that took a single double
and returned an (undeclared) variable "sum". The class had a nonsensical name
- "SinoutionIntegrator". The test was testing something called "Cosise",
presumably a method- but not one defined in the class. In short- a mess.
That might sound a bit harsh, but I think it's a very good example of why
statistical NLP is really bad at doing meaning: because there is nothing, not
a shred, of meaning in examples of the data we use to train statistical models
of language, i.e. text.
Because, you see, the relation between meaning and text (and even spoken
language) is completely arbitrary. Or, to put it in another way, there are
potentially an infinite number of valid mappings between structure and
meaning, of which we, human beings, somehow by convention or some other crazy
mechanism, have agreed to use just one. And even though the various forms
language entities take (inflections etc) are used exactly to convey meaning,
right, the rules of how meaning varies with structure are, again, completely
independent from structure itself.
Now, we have done very well in modelling structure, from examples of it (which
is what text is). But it's completely unreasonable to expect our algorithms to
be able to extract meaning from it also.
And that is why people are still trying to put down the rules of meaning by
hand. Because that's the only way we can think of, currently, to process
As far as I'm aware there is work underway to take logical constructions and integrate them with probablistic machine learning to do things like force zero probabilities in impossible input cases. That is encoding domain knowledge into the model directly in the form of symbolic reasoning.
I mean even Bayesian nets require some encoding of causality right? Maybe I'm reading to much of "blah symbolic reasoning is worthless" in your comment?
> We propose the Probabilistic Sentential Decision Diagram (PSDD): A complete and canonical representation of probability distributions defined over the models of a given propositional theory. Each parameter of a PSDD can be viewed as the (conditional) probability of making a decision in a corresponding Sentential Decision Diagram (SDD). The SDD itself is a recently proposed complete and canonical representation of propositional theories. We explore a number of interesting properties of PSDDs, including the independencies that underlie them. We show that the PSDD is a tractable representation. We further show how the parameters of a PSDD can be efficiently estimated, in closed form, from complete data. We empirically evaluate the quality of PSDDs learned from data, when we have knowledge, a priori, of the domain logical constraints.
Still working on my understanding but Professor Darwiche gave a lecture on the material in one of my classes. Salient bit:
> The problem we tackle here is that of developing a representation of probability distributions in the presence of massive, logical constraints. That is, given a propositional logic theory which represents domain constraints, our goal is to develop a representation that induces a unique probability distribution over the models of the given theory.
What were the main "practical" approaches for natural language understanding back then?
Does it have somethi g to do with the game playing AI from openAI? And if so, how is that even related to NLP?