1) More accurate, compared to hyperbole like e.g. "Bridging the Gap between Human and Machine Translation" we have right there in the title the domain: news.
2) A more impressive result. This result is on an independently set up evaluation framework, compared to Google's which used their own framework.
Compare further the papers:
These researcher appear to have been much clearer about what they're actually claiming, and also used more standard evaluation tools (Appraise) and methodology rather than something haphazardly hacked together.
The issue with that evaluation was that machines were much better at not making some trivial mistakes humans don't care about (eg transcribing umm, err, etc), but were more likely to get the meaning wrong. Kudos to MS for doing the error analysis and publishing that info, but I found the reporting of it misleading.
It is certain that, on occasion, the system will make such mistakes as stating the opposite of what is being said in the first place, or attribute one action to the wrong person, and what not. Perhaps on average it's as good as a person, but this system will make mistakes that disqualifies it from being used without a bucket of salt.
Apparently the translation of the Japanese response to the Allies ultimatum calling for their surrender might have been faulty.
The important thing is that we build systems that account for the process-based problems, not that we build components that are perfect. Things that matter more are how frequently these mistakes happen? What's the impact? Can we eliminate these errors with multiple layers of processes designed to identify the exploits?
Searle's argument is confusing, but how the program in the Chinese Room is implemented doesn't matter. His argument is solely against strong AI. He claims that the Chinese Room (or a suitably programmed computer) cannot be conscious of understanding Chinese in the same way that people can. He doesn't deny that a suitably programmed computer could, in principle, behave as if it understood Chinese, even if it wasn't conscious of anything at all.
However, the machine translation program mentioned in the article behaves as if it understands Chinese only within the limited context of the translation. It wouldn't be able to answer wider questions about things mentioned in an article it had just translated.
Previously it was thought that machine translation systems would have to understand the text they were translating in the way a person does, to produce a useful translation, but that's now shown not to be true. Without hindsight, it's a surprising result, but less surprising when you think of translation as pattern recognition, and think of how a person might go about translating text on a highly technical subject they don't understand.
You can however make a fairly solid argument that a CNN alone (as used in image/object recognition) is fundamentally incapable of dealing with images (but maybe not language), on the assumption that it can be faithfully described as Satan's boolean satisfiability problem, then by virtue of complexity theory it can only be solved in constant time with a sufficiently massive lookup table (which there wouldn't be enough atoms in the universe to store). Microsoft are actually dealing with this in their system by repeatedly applying the network and revisioning the text.
Regardless though, accurate NLP is going to come down to managing to codify how humans deals with objects, concepts and actions, because that's what the languages encode; GOFAI wasn't really too off (and the original effort was doomed from the start by the state of hardware and linguistics). Consider how distinguishing objects as masculine-feminine-neuter and animate-inanimate(-human) is universal (but doesn't necessarily affect the grammar), and that the latter is based purely on how complex/incomprehensible the behaviour of something is (unlike grammatical gender which seems to be fairly arbitrary). Of course that's arguable, but you can see animacy appear in english word choices (unrelated to anthromorphic metaphors) and in how "animate" objects tend to be referred to as having intent. You could try and figure all this out the wrong way around using statistical brute force and copious amounts of text, but that's pretty roundabout isn't it?
(Also, the assumption that an objects animacy is determined by predictability offers a pretty concise explanation of why the idea of human consciousness being produced by simpl(er) interacting systems often fails to compute so spectacularily, why most programmers appear to be immune to that, and also why the illusion that image recognition CNNs perform their intended function is so strong (regardless of how useful they are, the failures make it blatant that they're only looking at texture and low-level features, and are extremely sensitive to noise, which is the opposite of what anyone intended))
The amount of attention this argument has received has made me wonder whether the "rigor" used by philosophy departments is mostly just a way to obfuscate bad arguments.
You don't need some fancy philosophy and complex thought experiments to see what he means. Just look at how people learn math. You can do calculations by memorizing algebraic rules, but that's not the same as understanding why those rules exist and what they mean. Even though you will calculate answers correctly in both cases, we all know there is a qualitative difference between them.
Back to Searle. His argument is that everything computers do is analogous to rote memorization and that transition to understanding requires something computers don't have.
Whether you buy his argument, two things are clear. First, there is a difference between just producing results and understanding the process. We all experienced this difference. It's all theoretical as long as you stick to simple tests (like multiple-choice exams), but becomes relevant when you suddenly expand the context (like requiring the student to prove some theorem instead of doing a calculation). Second, we also know that for humans this difference isn't just quantitative. Memorizing more algebraic rules and training in their application will not automatically result in students gaining understanding of mathematical principles.
> It's all theoretical as long as you stick to simple tests (like multiple-choice exams), but becomes relevant when you suddenly expand the context (like requiring the student to prove some theorem instead of doing a calculation).
Not even expanding the context changes the situation. The proof of a theorem can be memorized without any understanding just as easily as algebraic rules.
> Second, we also know that for humans this difference isn't just quantitative. Memorizing more algebraic rules and training in their application will not automatically result in students gaining understanding of mathematical principles.
This is correct and the same principle applies not just to humans but to computers too (which was the point of Searle's argument). No amount of computation is going to make a computer aware or understand the meaning of the symbols. Ultimately "meaning" is our perceptual awareness of existence but that is a long proof for another day.
Sure, but in practice students who rely purely on memorization can't answer questions that go beyond what's directly covered in textbooks.
Sometimes, it manages to translate whole articles without errors.
I like soup -> COMO sopa
They eat soup -> COMEN sopa
Which boys does he say he believes eat soup?
¿Qué chicos dice que cree que COME sopa? [should be COMEN]
Which boys does she know like cheese?
¿A qué chicos sabe que LE gusta el queso? [Should be LES]
Is anybody in MT or text comprehension/generation really working on systems that construct a model/"understanding" of the bigger narrative in a longer-running text? Even just to be able to do correct anaphora resolution across sentence and paragraph boundaries, but intuitively also WSD seems easier if you've got some sort of abstract context over more than just a sentence.
Either way, I was mightily impressed, to the point where my wife had to roll her eyes and say 'yeah yeah I understand it now' to get me to drop it. (I'm just easily excited I guess.)
For instance, if you ask google to translate the Portuguese "báculo" into French it gives you "personnel". It's nonsense as far as I can tell, a báculo is a "crosier of a bishop". So what's going on here? Well if you translate it from PT to EN it gives you "staff" and suddenly it starts making sense, because while staff means "A long, straight, thick wooden rod or stick, especially one used to assist in walking" (which fits báculo) it can also mean "The employees of a business" which is an accurate definition for french "personnel". And I believe that's how you end up with the nonsensical PT -> FR translation.
Similarly Google used to be confused by the tu/vous (informal/formal) distinction that exists in many languages but not in English. At some point the portuguese "tu és" would be translated in french by the formal "vous êtes" instead of the informal "tu es". This appears to have been fixed however, I can't reproduce it at the moment.
Conjugations don't fare so well however, for instance imperfect past french "je chantais" is translated into portuguese preterite "eu cantei" even though "eu cantava" would make more sense I think. Obviously with such small phrases I can't really be too harsh on google's bad grammar, they're probably not optimizing for that case.
Parsing an entire paragraph for context is expensive...
If you're interested, probably the DiscoMT workshops are a good starting point for some things people have tried.
And it wasn't until I looked at the byline at the end when I realized, yes, it is that Hofstadter (Godel, Escher, Bach).
- dual learning
- deliberation networks
- joint training
- agreement regularization
I haven't read the paper to see how these are combined but it makes intuitive sense that using multiple training methods can lead to better performance. That is to say, to more effectively search the weight space of the network.
The number of papers I've read that have very poor grammar, to such an extent it's barely understandable...
You have to score the extent of wrongness too.