Hacker News new | comments | show | ask | jobs | submit login
A computational linguistic farce in three acts (earningmyturns.org)
101 points by lx on June 11, 2017 | hide | past | web | favorite | 52 comments



So, I understand this blog post is about something else completely (the internet argument started by Yoav Goldberg on Medium, reportedly) but for me the really interesting part is the historical information in it. I wish Fernando Pereira could find the time to expound a bit on all those parenthetical notes in his blog post, perhaps even write a short book on the history of AI.

AI is kind of a strange beast like that: it's gone through a few very different phases and it's difficult for one person to understand all of them equally well. Which of course makes it even harder to avoid reinventing wheels and repeating mistakes. A bit of history would do us all a world of good.

Btw, I'm getting the feeling most people here will probably hear of Fernando Pereira for the fist time but he has a very long career in AI and NLP. He was a prominent symbolicist, with some important contributions to logic programming (he was one of the co-founders of Quintus, the company that sold the first commercial Prolog, along with Warren, Byrd and others). Then he turned to statistical AI and now he's a VP at Google (a.k.a. the den of the connectionists, if I may be so bold). He's probably one of the few computer scientists around who understands both symbolic and statistical AI in equal measures. If anyone is qualified to talk about their relative merits, that's him.

(and if I sound like a bit of a fangirl- that is because I basically am. Pereira is one of my logic programming heroes and a great teacher to me, albeit unbeknownst to him :)


This relates to the big Twitter uproar over this blog post:

https://medium.com/@yoav.goldberg/an-adversarial-review-of-a...

And here's the meat of his response:

> Idea! Let's go back to toy problems where we can create the test conditions easily, like the rationalists did back then (even if we don't realize we are imitating them). After all, Atari is not real life, but it still demonstrates remarkable RL progress. Let's make the Ataris of natural language!

> But now the rationalists converted to empiricism (with the extra enthusiasm of the convert) complain bitterly. Not fair, Atari is not real life!

> Of course it is not. But neither is PTB, nor any of the standard empiricist tasks, which try strenuously to imitate wild language


My reading is that Pereira doesn't think that deep learning has quite conquered language, and in this he's in complete disagreement both with Goldberg and Le Cunn's side (who both champion deep learning for NLP and claim that it has led to great advances in the field).

For me the problem with NLP and deep learning, or indeeed any empirical method, is that the evaluation metrics we have are imperfect. Take BLEU scores, from Goldberg's post, for instance. Those basically compare generated text to some arbitrary target. Originally, they were proposed as metrics of machine translation quality, so the target was some existing translation and the machine-generated translation was examined for coverage of this human-made translation. But of course, there is no principled way that we know of to choose one translation over another- or even say whether a translation is a good or bad translation, on its own. And that's true for translations by humans also. You give the same text to 10 professional translators, they'll give you 10 different translations. Then you give each of their translations to 10 readers and ask them for their opinion, and you get back 100 different opinions.

The translation task itself is not even particularly well defined, exactly because there may be any number of valid translations (possibly, infinitely many) of a piece of text in another language. So, with translation, we have an ill-defined task with an arbitrary metric. And that metric of course is lifted from its original task and used to evaluate language generation and so on. Then someone comes along who knows how to train a deep net but has no idea what the purpose of their chosen metric is, or what it does and has no understanding of the task itself- and claims to have solved it because they got good results on that metric.

It's a bit of a methodological mess that's not going to lead to much progress. People can keep piling on these "results" for as long as they like and pretend that they're "solving" this or that problem- but in real-world terms, nothing is really being solved at all.


Bit of an aside: Apparantly ChrF – character-level n-gram f-score – is the new hotness in evaluating MT systems http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412


OK, but this still has the same problem as BLEU- it relies on comparisons to human scores, which are entirely subjective. I'm not saying they're not the best we got, but it's a big problem for machine translation that the only way to evaluate results is, essentially, comparing it to eyballing.


Google translate is now based on a neural network and you can be sure they have solid metrics. By analogy Google search has a large panel of humans whose subjective feedback is used to test the quality of search algorithm variations.


This is something that needs to be repeated until everyone internalises it: for language pairs other than the "easy" ones Google translate sucks.

I am Greek and translations from and to my language are utterly ridiculous, on the level of Bozo the clown doing the translation with his underpants on his head back to front.

Typical example: I put in the Greek word for "swallow", the bird, and ask for the French translation. I get back the word "avaler" - the French word for "to swallow", the verb.

That's my little benchmark there, useful because Google translate has been doing this consistently, for a good few years, before it used neural networks, before it started claiming its setup essentially constitutes an "interlingua" etc etc.

Note that the bird and the verb sound nothing like each other in Greek, or French. They sound the same only in English, so GT goes from Greek to French through English. Because it doesn't have enough parallel texts to go directly to French. And so it sucks, because it doesn't have enough data. You can ask native users of other languages-that-are not-English or have few ish speakers, perhaps Turkish or Hungarian etc. I'm pretty sure you'll find out they have similar experiences.

So I don't know what metric they use to evaluate their results, it doesn't seem to be a particularly good metric of translation quality. Maybe they just care more about how many people use their system and try to optimise for that, rather than going for the much harder to know quality.


I'm Polish. I google translate even from Slavic languages that are very close to Polish (Ukrainian, Slovak - it's like 50% understandable without translation) to English not to Polish, because X -> Polish google translation sucks.


>> you can be sure they have solid metrics.

Btw- no, I can't be sure of that. Why do you say I can? Do you know what metrics they use?


But deep learning networks are being used in production every day aren't they?


Yes


Leading somewhat off-topic, but this has also sparked a rather frank debate on r/machinelearning about some of the things discussed in the review, in particular arxiv flag painting:

https://www.reddit.com/r/MachineLearning/comments/6gke6a/d_r...


I kind of disagree with some of the premises in the article. I've seen an HPSG for German in the late 90s that was able to parse almost any sentence I could throw at it correctly from a syntactic perspective.

The main problem for natural language understanding is not parsing and not even the semantic and pragmatic representations per se, it has always been the understanding. This requires an adequate knowledge representation and the drawing of inferences from it, and I don't believe that any substantial advances have been made in that field. Computational ontologies have grown larger and there are more "frameworks" than you can count, but none of them offer much knew and promising approaches like geometric meaning theories are in their infancy. Knowledge representation and, generally speaking, the problem of how to integrate different information sources in useful ways are essentially unsolved problems.

Just my 2 cents. Note that I'm talking about the principal problems, not about specific practical applications for which you can use the statistical sledgehammer to some extent.


Recently Coecke comments on Gärdenfors geometric meaning in the context of his categorical semantics that I'm finding interesting, in arXiv:1608.01402. What I would welcome is a computational link relating that semantics and oldie semantic-network based ideas. For instance in arXiv:1706.00526 description logic based knowledge representation is cast in string diagrammatic, categorical terms, and that at least puts the meaning realm in the same mathy foot.


> geometric meaning theories

Apologies, I'm an outsider to the field, but what exactly are you referring to here ? The whole vector-space semantic embedding that was popularized by works like word2vec ?


Geometric meaning theories.. this sounds like intriguing stuff, could you please point to a decent primer text about it?


References for you and the other poster who asked:

Peter Gärdenfors: Conceptual Spaces - the Geometry of Thought. MIT Press 2000 (Paperback 2004).

It is very easy reading. The problems of geometric meaning theory are compositionality and quantification - how to get the expressivity of logical representations in addition to nearness measures, fuzziness and so on. There are some interesting approaches:

Martha Lewis & Jonathan Lawry: Hierarchical conceptual spaces for concept combination. Artificial Intelligence 237 (2016): 204-227.

Diederik Aerts, Liane Gabora, Sandro Sozzo: Concepts and their dynamics: a quantum-theoretic modeling of human thought. Topics in Cognitive Science 5 (4) (2013):737-772. [and other work by Aerts]

Aerts work is fascinating me personally, but it's unfortunately above my level of mathematical maturity. This is a general problem in this literature, maybe some solutions are already there but they also need to be sold in a way that allows linguists to understand and use the methods. Montague was lucky (well, not personally, of course), because he had scholars who were able to package his dense ideas in more verbose and easier to access textbooks.

Another short book worth reading in my opinion, though very programmatic in nature:

Jens Erik Fenstad: Grammar, Geometry, & Brain. CSLI Publications 2009.


Semantics is syntax.

All semantics has ever been about is not causing parse errors during the decoding step of the sentence, and the constraints imposed on that.

'Syntax' is usually confined to "low level" concerns, while 'semantics' to those above, but the distinction is arbitrary and artificial.

There is no meaning but usage.


What do you mean?


It doesn't matter what he meant, all that matters is how he said it; your question is meaningless, for after all there is no meaning but usage.

(If it isn't clear, this comment is snide to GP.)


It's also a very strawman version of what I said, to the point of being wildly inaccurate.


Oh, you used the religiously verified word of "strawman"!

What you said made little to no sense and had no backing. Yours was a perfect example of layman speculation without any basis. Nothing you said made any coherent sense, nor had any backing. They don't even deserve a response.


Your comments have repeatedly been violating the HN guidelines by being uncivil and/or unsubstantive and generally nasty. We ban accounts that do this, so please stop doing this, and instead post civilly and substantively (or not at all).

https://news.ycombinator.com/newsguidelines.html

https://news.ycombinator.com/newswelcome.html


You're completely right and I apologize. I'll reign it in.

Thank you for all the work you do moderating this community.


That we artificially decompose the process of accepting a sentence as, eg, proper English into two phases: syntactic correctness and semantic correctness.

However, that distinction is arbitrary -- there is only the question of if the sentence is accepted by an agent (eg, person) as a well-formed sentence.

Any full accounting of the class of well-formed sentences must embed the semantic concerns; violating semantics is a syntax error (albeit, not usually a "first order" one). Similarly, even base syntax, such as the subject/verb/object distinction and ordering is carrying semantic information about word usage. The distinction between the two is non-existent: a full accounting of either must embed the other.

So semantics is syntax -- if you write a system of rules that only accepts valid sentences, then the rules will end up carrying the semantic structure of the language in them.

Ed:

I suppose I left out why this might matter --

In the quest to build an AI that understands semantics (ie, that "understands meaning"), we can bypass attacking that problem directly by training it (eg, a NN) on the full acceptance task (joined syntax and semantics -- classifying a sentence as proper English or not), and then truncate the network away from "low level" features (and perhaps at the other side, focused on 'yes' or 'no') to extract a network that has (most of) the (abstract) semantic structure embedded. We could then utilize these "middle level" features as a sort of rosetta stone, to train low level networks to embed content for them to understand, and high level networks to utilize their output on decisions to repurpose "understanding" across tasks.

I would argue that using things like Word2Vec (or the resulting vector space of words) is a similar idea.


1) Colorless green ideas sleep furiously.

2) Me gizmo.

One of the above sentences is valid English, the other one is meaningful. That is the syntax/semantics distinction.

The distinction is a bit fuzzy in places (for instance, inflectional morphology), but does exist.


I actually disagree with your assessment.

You contend that the top sentence is valid English; I disagree with that. It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use. Not being one that an English speaker would use makes it invalid English -- it's just a case where the first order approximation is wrong.

Similarly, if your syntax rules reject the second sentence, they're wrong -- since it's a sentence that English speakers can parse: the conclusion can only be that your syntax rules don't actually match the language you're trying to model.

I get the distinction that you're trying to point out with syntax/semantics, but you're ignoring my point: that divide is artificial and 'semantics' as you mean it is merely higher order syntax.

You haven't shown there's an inherent meaning to the difference (ie, that you haven't just drawn an arbitrary line in the sand), just that you can find examples that (naively) fall on different sides of it.


Language is not some inherent property of the Universe; it is an evolved behavior in humans. We can study how humans perform language; and when we do, we find the syntax/semantics distinction to be naturally occurring in humans. For instance, in my example, a native English speaker will find the second sentence "awkward" in a way that they do not for the first sentence. Similarly, a native English speaker will extract a clear meaning from the second sentence in a way that they would not from the first.

It is conceivable that there is some other language (eg. not natural human language) which does not have a syntax/semantics distinction, but that hypothetical language is not what linguistics studies.


.....Which is an effect of the first satisfying first order approximations while failing higher order rules, while the latter is merely an unusual sentence and so requires more effort to parse because it falls off the "fast path". (It also arguably fails to encode embedded cultural messages present in word choice -- a second consideration for why it feels "awkward": it's valid English, but not my tribe's English.)

You haven't pointed out how semantics is anything but higher order syntax -- merely outlined the way in which higher order syntax interacts with our perception.

I agree that there's a difference between the two sentences -- I disagree that it's because they're different fields of study instead of different edge cases of the same underlying notion of parsing syntax. (I especially disagree that the way forward on teaching machines language involves that distinction.)

I would appreciate you referring me to references on the semantic/syntax divide being "natural", though.


>> It subscribes to the "first order" rules (or an overly simplistic model), but isn't a sentence that an English speaker would use.

Well, an English speaker did use it.

Btw, who do you consider an "English speaker"? Do I count as an English speaker? My native language is Greek but I speak English as a foreign language. I often say things that a native English speaker wouldn't say- but they convey a meaning that I wish to express. Do these utterances count as things that "an English speaker would use", or not?

I say they do. English speakers can say anything they like. In fact, they do, everyday, and as they do their language changes along with that.

Human language seems to be a lot more flexible than what you give it credit for. Semantics being just some sort of higher-order syntax (which btw we just haven't found yet) would make for a much more limited language ability than what we currently have. We'd be restricted to only a finite set of forms and we could only say a finite number of things. Obviously, that's not the case.


There was an implied "in isolation" on that sentence -- what's proper with other clauses and sentences included isn't necessarily alone.

An English speaker used it as an example of a statement that would cause a parse error for most English speakers, and so it did. The speaker said it even caused such a reaction in them. I would argue that they weren't attempting to use English, but quasi-English in an attempt to communicate the boundaries of English to people who can parse English (which inherently has some ability to parse quasi-English).

I don't think it's useful to pretend "English" is a coherent class of parsing rules (either over time or over population) -- there's only a roughly similar set of parsers undergoing continuous memetic evolution, broken up into subsets that are more similar.

At the end of the day, English is as people who can parse some subset of it do -- and it might reach the point where it makes more sense to talk about English languages than an Enish language.

That being said, your last paragraph confuses me:

It's not obvious to me that we aren't restricted to a finite number of forms in language.

It's not clear to me why you think semantics being higher order syntax requires that it only be capable of finitely many forms.

(The rest of it seems dependent on those two conclusions.)


My favorite example where you need semantics to get the syntax is "See the a are of I."

   +---+---+---+---+
   | x |   |   |   | I
   +---+---+---+---+
   |   |   |   |   | K
   +---+---+---+---+
     a   b   c   d
The map shows 8 ares, the a are of I is marked with a cross.


You guys seem confused: I'm not claiming that what you're calling semantics doesn't exist; I'm saying that it's merely convoluted syntactical rules, and calling it a different name is misleading.

It would be correct to say that the sentence in isolation is a parse error, but with the diagram, it's merely elaborate syntax.

My point isn't that there aren't higher order rules (and approximations) -- just that the division of those rules into a separate area of study is artificial.

It's not contentious to point out that chemistry is just an approximation of physics because the actual higher order rules are too complex to study directly -- but it seems to be to point the same out in semantics and syntax.


>> You guys seem confused: I'm not claiming that what you're calling semantics doesn't exist; I'm saying that it's merely convoluted syntactical rules, and calling it a different name is misleading.

Syntax and semantics, or structure and meaning, are completely different things- in fact they are entirely unrelated and their only association is by arbitrary convention (we all agree that certain structures are associated with specific meaning).

This is why you can do translation, for example- where you're essentially taking the semantics out of one kind of syntax and putting it into another.

In NLP it's easy enough to reproduce the structure of a corpus- simple, unsmoothed n-grams will do that well enough already and with a little more statistical elbow grease you can train a model that reproduces your text very well and even generates new text that looks quite resonable. Except of course that it rarely makes any sense at all. To generate text that is both syntactically correct and makes sense you need a lot more than that and we haven't really managed to do that except for very short durations (a few words at a time).

I'm saying: in NLP we can deal with structure very well indeed, but meaning is still a long way off. If it was just a matter of "more syntax", we'd have solved all our problems a long time ago.


Translation is hard precisely because semantics is carried by syntax -- when the syntax is radically different, you can only approximate the higher order structures.

If semantics were actually distinct and carried by both languages, you could translate without that loss of subtle meaning.

I also completely disagree with your last sentence: semamtics could easily be syntax that has rules which are hard to compute.

NLP has trouble with long-range (or broad) effects, which are (some of) what I mean by higher order syntax.


Syntax is usually used to describe those rules of a language that are easy to compute; so easy that you don't have to understand the meaning (semantics) to do it. E.g. you can point to a missing semicolon in a C program without understanding what the program does.

Of course you can call the rules of well-formedness "higher-order syntax" in the sense that the computation required to decide it is of a higher order than syntax, but the distinction between syntax and semantics is by no means unnatural. It has been discovered independently several times; some ancient studies of the syntax of Sanskrit have survived to this day.


>> violating semantics is a syntax error

That's a very strange thing to say. The thing with human language is you can say anything you like, including things that make no sense at all and things that are syntactically incorrect. You can easily find examples of meaningless, syntactically correct sentences, like Jabberwocky ("All mimsy were the Borogoves and the mome raths outgrabe" etc). It's also easy to find examples of sensible sentences with incorrect structure (see twitter.com).

In fact, what is "incorrect syntax" keeps changing all the time, but we can still say the same things as we always could (plus a probably infinite many new things besides). If syntax was tied to meaning as tight as you say, we'd probably have only one or two languages and no dialects. Language would be a static, unchanging thing and we'd need no NLP, or translators etc.


I have to wonder if English is really the best language for NLP research. Things like the Winograd schemas which have attracted a lot of attention simply aren't possibilities in other languages.

Why not start working with more structured agglutinative * languages like Japanese/Korean and Indic family (Sanskrit esp.) .

How about other European languages ? Are they better structured empirically ? I hear German is very grammatical, and that Hungarian is ... erm odd ?

( Note: I know occidental tradition likes to split Indic tongues, and Indo in Indo-European is not considered agglutinative. I don't subscribe to this view. I use agglutinative in the sense of Panini: "particles" sticking to stems/roots/words - phonetic modifications are irrelevant for grammar.)


> I hear German is very grammatical, and that Hungarian is ... erm odd ?

Just want to point out that "grammatical" probably isn't the word you want here. Every language is grammatical by definition in the sense that there are rules that govern its sound system, word formation system, syntax, etc.

The concept you're getting at, though--that some languages are easier for computer programs and/or speakers of Indo-European languages to understand--is sound.


do you think analytic would be a good term here? i heard mandarin is very analytic language, maybe that could be a good choice


"Regular" would be the classic linguistics term, would it not? Although computer science limits the term to the use of regular languages in the Chomsky hierarchy sense (that is, more specifically to regular expressions and the languages they describe), I am under the impression linguistics as a whole treats regularity as a multivariate spectrum. Some languages have more regularity in terms of grammar productions or morphology than English.


i meant analytic in this sense of the word https://en.wikipedia.org/wiki/Analytic_language

I don't know too much about computational linguistics but it seems highly analytic languages could be easier to work with, but I'm not sure.


That points to Isolating [1] and I think highly isolating may be the more useful distinction to this specific example. (Modern English is rather analytic, having dropped most, but not all, inflections in the Middle English era. Mandarin Chinese is much more isolating than Modern English.)

[1] https://en.wikipedia.org/wiki/Isolating_language


One reason is that the amount of training data is many many orders of magnitude smaller.

FWIW it seems the structure you're talking about exploiting is at a morphological and syntactic level, which modern language models tend to effectively handle. Semantics are a much harder problem.


> Things like the Winograd schemas which have attracted a lot of attention simply aren't possibilities in other languages.

I do not think that is correct. Anaphora exists in many languages. Check out the Anaphora article on wikipedia and click on different language versions. There are example sentences for many languages.

https://en.wikipedia.org/wiki/Anaphora_(linguistics)

There are translation for the Winograd Schemas into a couple of languages. Granted I found some of the translations a little unnatural in some cases but they are still understandable and expose the problem.

http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/...

http://arakilab.media.eng.hokudai.ac.jp/~kabura/collection_j...

http://www.llf.cnrs.fr/winograd-fr


The whole field of NLP and computational linguistics reminds me of that joke where a drunk is looking for his keys under a street lamp instead of where he actually lost them.

This is true in particular of anything that pertains to reasoning and knowledge representation. People still are trying to "infer rules" and do logical, rather than probabilistic reasoning. I get why that is. To me though, the kind of real life reasoning that humans do seems heavily probabilistic and contextual, Bayesian almost. And there's next to no notable work going on in that direction.


>> People still are trying to "infer rules" and do logical, rather than probabilistic reasoning. I get why that is.

That is because it's very hard to collect statistics on something that you can't really quantify- meaning, in this case.

There was a thread on HN a couple of days ago about a blog post where someone was experimenting with, among other things, training an LSTM network to generate Java programs [1]. In one example, the LSTM did really well in reproducing the structure of a Java program, with import declarations, followed by a class implementing an interface with a few methods with structured comments and throws declarations and everything- and even a test!

On the other hand, this program was completely useless. From a cursory glance it would probably not even compile (e.g. it refered to undeclared variables etc). There was one method named "numericalMean()" that took a single double and returned an (undeclared) variable "sum". The class had a nonsensical name - "SinoutionIntegrator". The test was testing something called "Cosise", presumably a method- but not one defined in the class. In short- a mess.

That might sound a bit harsh, but I think it's a very good example of why statistical NLP is really bad at doing meaning: because there is nothing, not a shred, of meaning in examples of the data we use to train statistical models of language, i.e. text.

Because, you see, the relation between meaning and text (and even spoken language) is completely arbitrary. Or, to put it in another way, there are potentially an infinite number of valid mappings between structure and meaning, of which we, human beings, somehow by convention or some other crazy mechanism, have agreed to use just one. And even though the various forms language entities take (inflections etc) are used exactly to convey meaning, right, the rules of how meaning varies with structure are, again, completely independent from structure itself.

Now, we have done very well in modelling structure, from examples of it (which is what text is). But it's completely unreasonable to expect our algorithms to be able to extract meaning from it also.

And that is why people are still trying to put down the rules of meaning by hand. Because that's the only way we can think of, currently, to process meaning automatically.

________

[1] https://news.ycombinator.com/item?id=14526305


I don't think these two things are mutually exclusive.

As far as I'm aware there is work underway to take logical constructions and integrate them with probablistic machine learning to do things like force zero probabilities in impossible input cases. That is encoding domain knowledge into the model directly in the form of symbolic reasoning.

I mean even Bayesian nets require some encoding of causality​ right? Maybe I'm reading to much of "blah symbolic reasoning is worthless" in your comment?


It's not worthless, per se, it's just not a precursor to AGI in any shape or form, no matter how much the researchers pretend otherwise.


Worth reading maybe?

http://reasoning.cs.ucla.edu/fetch.php?id=136&type=pdf

Abstract:

> We propose the Probabilistic Sentential Decision Diagram (PSDD): A complete and canonical representation of probability distributions defined over the models of a given propositional theory. Each parameter of a PSDD can be viewed as the (conditional) probability of making a decision in a corresponding Sentential Decision Diagram (SDD). The SDD itself is a recently proposed complete and canonical representation of propositional theories. We explore a number of interesting properties of PSDDs, including the independencies that underlie them. We show that the PSDD is a tractable representation. We further show how the parameters of a PSDD can be efficiently estimated, in closed form, from complete data. We empirically evaluate the quality of PSDDs learned from data, when we have knowledge, a priori, of the domain logical constraints.

Still working on my understanding but Professor Darwiche gave a lecture on the material in one of my classes. Salient bit:

> The problem we tackle here is that of developing a representation of probability distributions in the presence of massive, logical constraints. That is, given a propositional logic theory which represents domain constraints, our goal is to develop a representation that induces a unique probability distribution over the models of the given theory.


When he talks about the "computational models of language" that ruled in the 80s, is he referring perhaps to stuff like Montague semantics? https://plato.stanford.edu/entries/montague-semantics/ Or is Montague semantics merely a descriptive framework without practical applications?

What were the main "practical" approaches for natural language understanding back then?


What is the Atari referred to here?

Does it have somethi g to do with the game playing AI from openAI? And if so, how is that even related to NLP?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: