Hacker News new | past | comments | ask | show | jobs | submit login
On the foolishness of “natural language programming” (utexas.edu)
132 points by LiveTheDream on Aug 25, 2014 | hide | past | favorite | 152 comments

The hardest part about software always has been, and probably always be, getting humans to first know and then express exactly what they want. Natural human language allows for vague, abstract, and ambiguous expression (not to mention the euphemisms, exaggerations, half-truths and outright lies).

Most people who have never written a computer program, have probably never even been through the experience of having to express exactly what they want someone or something else to do for them, in a specific and non-ambiguous manner. It really is a different way of thinking.

Indeed. Not only that, but the power of a precise language is not to be underestimated. I noticed that I had (and probably still have) a lot of errorneous concepts in my head and the only way to find out the mistake is by expressing them in a rigorous, formal language that forces you to think about every detail, no matter how 'obvious' it appeared at first.

I agree with this but want to stress something else.

A really great formal language actually frees you from "every detail, no matter how obvious". That's the heart of abstraction.

Good mathematics is heavily abstracted because it forms a superhighway between mathematical concepts that allows you to lift and transport intuition from one context to another without forsaking formality.

To some degree these abstractions must always be built from nitty gritty details. In mathematics it's a matter of didactics to find the most efficient bootstrapping process. In computer science it's a matter of library and language design.

Abstraction works only with strong context. A few symbols in an equation are a miracle of conciseness, but meaningful only after the symbols, operators and formal system are defined.

I never claimed otherwise. I think in fact that there are two forms of context—didactic and mandatory.

Didactic context is personal, the context required for or used by a particular person in order to conceptualize and build confidence with a set of concepts (and their associated syntax, though it's close to meaningless so long as it doesn't get in your way too much).

Mandatory context is more like a compact notion of why and how some mechanism is applied. For instance, you might bootstrap homology using point-set topology (educational context) and then enhance it using algebraic topology. Eventually, perhaps, the mandatory context is that homology is a measure of non-exactness of chain complexes and this concept can be lifted from its base context and presented wherever exactness is an interesting measure.

Point being—language hardly exists in a vacuum. Good language invokes powerful concepts in an efficient manner and allows you to construct arguments using them. Good arguments can link seemingly disparate concepts with ease.

Exactly, a formal specification language (including any programming language) requires you to make all your the implicit, underlying assumptions explicit. Nothing can be taken for granted and there is no "you know what I mean." But once you really do know exactly what you mean, it's usually not so hard to specify it. Figuring out what you want is the hardest part.

Well, sometimes we get things like C++11's `auto`, which do let you trade formality for brevity. It's helpful sometimes, but can cause misunderstandings, just like natural language pronouns.

Why should you have to specify assumptions that don't matter?

I.e. it can be very productive to do the reasonable thing, until it's proven unreasonable.

Seems like software developers and lawyers share some common pains.

I always marvel how our legal systems manage not to collapse given the imprecise language they use. So while they have similar problems it seems to me that they are just ignoring them for the moment and let somebody else figure out what the precise meaning is if the necessity arises.

I have found that legal language is often very (overly) precise. The confusion comes from the fact that words may be used in a different manner than in common discourse, but that does not mean that they do not have a precise meaning in a legal context.

Yes, in programming, the goal is to communicate clearly to an indifferent interpreter. In legal worlds, the goal is to express the intention while excluding the maximum number of outsiders.

Why else does every coding language provide plain-language comments, but legal documents do not?

In technology, the machine is a stand-in for a human relationship - the cell phone app keeps track of restaurant ratings so people don't have to keep asking one another. The code is a part of the system of distributing the power in the form of information.

In the legal world, the intently obscure language is the means of hoarding and securing the power.

W/r/t to legal systems collapsing, they're protected by guns and myths. The baillifs and police believe Columbus was a hero and they kill to uphold his dream. The incentive to power is enough to entice enough new scholars each year to sustain the illusion.

So to respond to the poster above, the pains of the programmer and legal system are no more similar than the programmer and the chef. Each attempts to communicate a procedure, but the legal system has an entirely different and less noble set of requirements.

> W/r/t to legal systems collapsing, they're protected by guns and myths.

This is true enough in many parts of the world, but a careful reading of the history of English Common Law reveals a fascinatingly iterative legal process that formed a bulwark against monarchal oppression for hundreds of years.

For a good introduction, check out this book:


Interesting, but I doubt that the legal system does much useful iteration anymore, considering that it can be 5 years (and many dollars) from novel conflict to Supreme Court ruling. Unless the adaptation rate scales exponentially with the number of rulings, that's too slow.

If you look at areas of current evolution like patent law, you'll see that iteration is still taking place. Each case builds on the body of the precedent built before it. Right now the concept of 'software patents' is not legally defined, current precedent is aimed at figuring out just what that means. Math isn't patentable, that much is established. Is software math? Questions that seem simple on the surface become very complicated when applied to the real world. That's what the iteration is aimed at doing, solving tough questions of language.

Constitutional law is under constant iteration, the Supreme Court often makes decisions about which cases to take based on whether it thinks it will be able to advance the state of that area of law.

If you subscribe to blogs like Popehat, you can get a feel for this iteration/evolution process and how it works in modern times.

Through a lot of blood, deaths, public executions, and religious declarations, schisms and wars (civil and otherwise.)

The history of English common law is far more akin to natural evolution than any intelligent design.

Which is interesting, because the vast majority of people (including and perhaps especially those who reject intelligent design in biology) think that a relatively stable and successful society and economy can only exist when organized by a centrally and deliberately planned body called government. That's essentially the "intelligent design" belief for human societies.

If you believe that intelligence is the product of natural selection, and also observe that nature has very strongly selected for human social organization under a government, it's far from unreasonable to use the advantage of the intelligence we've evolved to augment the advantages of the governments we've evolved.

Natural selection often creates a relationship between predator and prey that looks red in tooth and claw -- bears and salmon, wolves and elk, etc.. No one will argue that the wolf represents an advantage to any individual elk, but to the elk genotype, the argument can be made that the wolf confers an advantage.

It's the same with governments and people -- history proves that relationship red in tooth and claw as well. The surviving elk outrun the wolves, and the surviving people stay one step ahead of predatory governments.

When Stalin signed the non-aggression pact with Hitler, he began a purge of his military. Anyone who had spoken against Hitler beforehand was purged -- shot or sent to Siberia. Then, when Hitler broke the pact and invaded Russia, all those officers who had spoken in Hitler's favor were purged. The survivors were those few who didn't have an opinion, or who didn't dare express it.

The Cultural Revolution in China purged all those bourgeois elitists who had a college degree or who had acquired skills like science and technology, or any significant academic achievement. Now everything has changed and individual Chinese are allowed -- nay, encouraged -- to educate themselves for success, acquire wealth, and grow the economy -- exactly the opposite of the Cultural Revolution outlook. It is a very wise person who avoided any problems during the Cultural Revolution, and who avoid any problems now.

My point? Elk who survive do so by avoiding wolves. People who survive do so by avoiding governments. It's true that wolves improve the genetic stock of elk, just as governments improve the genetic stock of people, and by the same method -- by tearing the weak and sick to pieces.

I never made that argument that anything is good or bad because nature has selected strongly for it. That's essentially an appeal to nature. Nature strongly selected for using animals as a main land transportation mechanism for a long time, but hopefully no one would argue that therefore no one should have worked at discovering a better alternative.

Even if government is the best way to organize society at the moment, that doesn't mean it's unreasonable to look for better alternatives. But again, none of this has to do with the analogy to intelligent design I was making.

"Relatively stable and successful" is, of course, relative. When judged against the hypothetical outcomes of roads not traveled, they could look like ongoing atrocities.

> The history of English common law is far more akin to natural evolution than any intelligent design.

So are many of the software projects I've been called on to maintain.

That's actually an arguably better way to think about code.

    "The computer is a machine, but a codebase is an organism."

I liked your comment, and share a lot of your cynicism about the legal system.

To be fair, though, I think you see technology and code in too positive a light, because code is also often a means for accruing power.

Your own example of a cell phone app that keeps track of restaurant ratings could serve to illustrate this quite well: such services have a tendency to become highly centralized with at most a handful of alternatives. [0]

Whoever controls the rating service then has considerable power, because they get to subtly influence the way that results are displayed, which can directly influence restaurants' bottom line.

This type of thing is even more obvious with services like Google and Facebook. Just something to keep in mind before patting oneself too much on the back...

[0] This is due to network effects: people use the service with the highest pervasiveness (basically, the most ratings and comments), and the service with the most users tends to get the most ratings and comments.

> I always marvel how our legal systems manage not to collapse given the imprecise language they use.

That's only surprising if you consider the language of laws to be of paramount importance. But generally, that's not the case. Most legal outcomes just come down to the discretion of police, prosecutors, judges, and juries, in descending order of prominence.

A lot of people, and probably especially programmers and engineers, think of the legal system as a formal system. It's quite common, when discussing some court case, or perhaps some unscrupulous act from a government official, to hear people say "but isn't that illegal?" The surprise implied by that question only makes sense if you assume either that a legal document has actual power to prevent an act (which we know it does not), or that the people who decide legal outcomes are somehow bound by the meaning of a legal document rather than the other way around.

Every time I read legal language that comes across as highly imprecise or ambiguous, my default assumption is that someone powerful wanted it written that way, to allow them some sort of loophole.

Do you assume every security flaw is a secret backdoor too? :)

Only if they're deliberately introduced. Very little accidental ambiguity makes it into legal contracts (unlike legislation and agency regulation!), but deliberate ambiguity is a very valid strategy, and often is designed to break the spirit of an agreement by creating loopholes in the letter. Always, always, always assume that if the same language could be interpreted two different ways, the guy on the other side of the table intends to argue the one that hurts you the most.

This is true.

I have developed software for generating legal documents. Believe me, legal documents have a much higher degree of ambiguity than most business processes.

contacts are written in code. They just reuse english words.

This is the chief reason I'd support the "Anyone can learn to code" programs suddenly en vogue. We need our business owners - ideally customers - to be able to consider what exactly they want and don't want. We never need a line of code from them, but as long as they have the ability to consider their domain accurately, a good developer should be able to extract it from them.

This is why exercises such as writing the steps to make a PB&J sandwich are instructive. They show how ambiguous natural language really is.

And one consequence of that, which is often overlooked when evaluating any AI application, is that humans can be really "bad" (or at least inconsistent and indecisive) at these types of problems. People miscommunicate and mishear each other all the time. We disagree about whether things are grammatical, the definitions of words, the implications of someone's tone, whether two people look alike, what a hastily handwritten note says, etc. When evaluating an AI application, people always assume that, even if the problems themselves are hard, the responses are obvious and trivial to verify. But that's not the case even for everyday human interaction.

The AI needs to be able to ask for clarification.

OTOH, If you can communicate to me in a certain way with euphemisms and idioms, then all the information that I get can be gotten from that communication.

It's not an impossible task, just a difficult one.

This is not true. The information you get depends upon your unpacking and translation of those idioms and euphemisms, and is therefore dependent upon the totality of your life experiences up to that point. One benefit of formal languages is that a shared culture beyond the symbols used is unnecessary.

Not surprisingly, when you try to go the other direction -- add disambiguating features to natural language, for cases where you don't want the other party to have to make a contextual guess -- people try to remove the precision

Case in point: the debasement of the term "literally".

It's possible* to build a software which understands natural human language and allows for vague, abstract expressions. Sometimes communication fails, which is why different "commands" should have different risk levels and different degrees of formality. For example, if I order pizza, I can be pretty vague, but when I order clothes online, I need to specify the size, color etc. more formally.

* It's possible since you can always simulate a human brain with software, but there are of course more practical artifical general intelligence systems like the AIXI-mc.

Graduate of analytic philosophy here (concentrating in philosophy of language), and my take away is a sloppy: "It's not a bug, it's a feature."

Sorry, but this is a completely shocking article to me. First in it's immediate dismissal of any formalism inherent in natural language, but also with the ease at which he dismisses the proposition without any real consideration.

If we learned anything from Chomsky, it's that the underlying grammar we are born with is both instinctual, and follows formal rules. To say differently is literally, demonstrably false. Irregular verbs, for example, aren't learned in the traditional sense, one must actually unlearn the formal rules. Any child that tells you she "swimmed" all afternoon is using a more formal version of English than you do.

The idea of a natural language programming language is flawed, but not by formalism. It's flawed by the evolutionary nature of natural languages. That is, the very people that he states "are no longer able to use their native tongue effectively," are probably using a new dialect that shares a common ancestor with his more "traditional" usage.

Many people in this thread are talking about the inability of plebes to express what they actually want. This is a fair point, but not a problem with language specifically. Communication tools would need to be employed by a computer in the same way as humans use them. E.g., a simple ambiguity checker could work wonders here, as it does between humans when someone you are talking to simply says, "What did you mean when you said you 'realized you forgot your phone at the train station'? Did you forget at the train station, or realize at the train station?".

What IS a problem, however, is Quine's indeterminacy of translation. That could pose serious hurdles that may be insurmountable, however, we still have effective communication between humans, so it's simple to see how this may only be a theoretical problem, rather than a formal one.

This subject should be under the purview of analytic philosophy and linguistics, not mathematics or computer science.


It seems an appalling number of people agree with Dijkstra here - "we not only don't need natural language, we don't even want it."

I'm sorry, but if I can express to a human being a set of directions to fill out a form in a minute or two and expressing that to a computer takes much longer and is more error-prone, that is an inefficiency in software development which it is extremely desirable to address.

There is nothing magical about human brains that would make them theoretically impossible to express in software in such a way that we can give a program natural language directions, and the fact that so many people want to dismiss this endeavor out of hand is ridiculous to me.

This should be our holy grail, something to strive toward, not something to ignore.

In fact I would suggest that it will most likely be the only way out of the mess of such a wide variety of software standards (ever have fun moving your things to a new system and having to re-learn many things just because you changed PC or phone operating systems?), whereas natural language is a standard we already have and works fine.

This way you essentially wouldn't need to learn a new set of incantations - just tell the damn thing what you want the damn thing to do, dammit.

The thing is, natural language suffers from many of the same problems. I'd say it's even worse, because it requires a common cultural background between the interlocutors.

Ever moved to a new job and had fun figuring out the local terminology, and learning the in-jokes (and figuring out when you were the butt of the joke but didn't quite get it), and when your boss told you to "just do your damn job and quit pestering me", did you immediately know he was just having a bad day and it wasn't your fault at all?

A mistake in understanding these "natural language incantations" will lead to you doing the wrong thing, or feeling embarrassed or depressed. Not so different from a system crash, when you think about it.

Thankfully machines never feel embarrassed or confused about the "damn thing" you want them to do. They only do the things we tell them, in a painfully literal way.

After all, when people want to be really precise, they use mathematical notation :)

Of course you are technically correct, however I retreat back to my main point: we do this every day and it works just fine.

Certainly not perfectly of course, but go and ask people who use some sort of CRUD system that used to be manual what they like and what they dislike about the new system.

Likely very few of them will say it is easier to use than just asking Sharon in the next cubicle to verify something.

They will have some likes, sure - but they will be things like the almost infinitely increased speed or the fact that they can access the system 24/7 while Sharon needs to sleep and take a break to eat the occasional Ding-Dong, but the ease of use that comes with dealing with another human being is a sacrifice that they make for these benefits.

(or to put it the reverse way, imagine the same office and a new employee named Eliza comes in and behaves exactly like a computer, "only doing the things you tell her in a painfully literal way." How quick would you want to give her the boot?)

> After all, when people want to be really precise, they use mathematical notation :)

Again, this is absolutely true. The problem is when you're dealing with a simple CRUD app for your insurance house or just copying files over to your iPod, you're not interested in being really precise - you're interested in the shortest path to get a relatively simple thing done.

If that path is blocked by the fact that you don't know the particular menu item, keyboard shortcut, or command switch for something that you can express in English without even thinking about it, then I regard that as a huge opportunity for technology in general and our industry in particular.

I fully agree conciseness is valuable! If a formal system doesn't have a direct route to the action you want, it probably can be improved. I don't think shorthand and formalism are opposites.

I also agree that a lot of people tend to shy away from formalism. I think that's what Dijkstra was lamenting.

By the way, what is a CRUD? John Doe doesn't understand the word. What do you mean, Create-Read-Update-Delete? He understands some of those words, but I'm unsure they mean what he thinks they mean. Are you sure that, without some training on the formalisms of the system (which Sharon obviously had!), you want John Doe to delete something from the system? He might try to unplug the harddrive, maybe that's what he thinks "deleting" means.

Yes, it's easier to teach John Doe to use a limited UI instead of, say, teaching him SQL. But he'll be able to do less complex stuff with just the UI. (And the fact SQL has some English-sounding keywords is helpful, but SQL is an extremely formal system with few parallels to natural language).

> you want John Doe to delete something from the system? He might try to unplug the harddrive, maybe that's what he thinks "deleting" means.

Highly unlikely that someone familiar with the system (even in an informal way) would do this, for the same reason that you have no trouble understanding me when I say "fruit flies like a banana". Is there technically a chance that I mean "all pieces of fruit fly through the air in the same manner that a banana flies through the air"? Sure. But it's so ridiculously low that you simply ignore it and are willing to accept the infinitesimal risk of misunderstanding.

Should all programming or user interface work be done this way? Of course not. I'm just saying it would be a very effective level of abstraction for a great many use cases, in the same way that I don't need formal mathematical notation to write a Python statement to print "hello, world".

Of course, he formalism has to be there once you get down far enough, in the same way that showing Sharon how to do an account credit in the CRUD system means that you are altering neural structures with electric and chemical signals.

However, the person who trains Sharon doesn't need to know exactly what neurons to stimulate in Sharon's brain with exactly what voltage in order to teach her that system - the relatively lofty level of abstraction provided by English works just fine.

I agree with some of what you say. The fruit flies sentence, for example. We humans are decent at disambiguating those, I'll grant you that.

Allow me to add some random thoughts:

- Python's print "hello, world" IS a formal notation. It's just that this particular notation and this particular task are so simple that we can delude ourselves into thinking it's English. But when you move to actual Python scripts, the only ones who believe "it sounds like English" are programmers :) I wouldn't trust my mom to write a Python script, after all.

- Let's go back to our CRUD/office situation example, and allow me to make it a bit more realistic (but still funny):

"Sharon, please print the report."

"Which report?".

"The one I asked you about yesterday."

"Uh, you asked about two reports yesterday. Do you mean the one about fruit flies or about bananas? Or do you want both?"


"Sorry, yes what? I asked you multiple questions!"

"Yes, both reports. I forgot about the other one, but I want it too."


"Ok, even though I have a terrible headache, I printed your reports. Here they are."

"Oops, sorry Sharon. I didn't mean you had to print them now. Tomorrow would have been fine. Also, please don't get mad, but I didn't want them on my desk. They are actually for Jane on the fifth floor... Didn't I mention that? Also, why did you print them using the expensive printer?"


My point is that just doing CRUDs with English is probably fine, but as the complexity of the task approaches that of a general purpose programming language, the level of precision you must use with your language approaches that of a formal system. Which is what programming languages are...

You've hit on an important point here. When we instruct our machines in natural language, they are surely going to have to be able to ask us questions to resolve ambiguities and fill in details we haven't specified.

There is actually a theory of how two entities reach agreement through conversation: http://en.wikipedia.org/wiki/Conversation_theory

I don't think EWD's point is that we don't need natural language at all; just that it's not a good way to program a computer. Which I do believe to be true.

We can explain to a human how to fill out a form because the human has probably seen a lot of forms before. Humans make a lot of assumptions that often end up being wrong: but enough of them are correct that they're still useful.

We trust computers to be unimpeachably accurate because we as humans are not. If computers need to make the types of assumptions that humans do, then they lose a good deal of their accuracy (and their usefulness).

Human language is also visually difficult to read. The biggest improvement that symbolic languages (specifically, modern programming languages) make is the use of spacing and symbols to break apart complex processes into sub-sections, loops, etc.

Furthermore, I disagree with your statement that "natural language is a standard we already have and works fine." Language is not static, nor is it standard. Sure, we may have "standard" grammar rules, but even those can vary from region to region and many people don't follow the rules on a day to day basis. It's not a static target, so developing something that could interpret natural language means developing an artificial intelligence capable of taking nuance, context and the like into account.

EWD was simply claiming that such a system applied to general purpose computing would be so complicated as to be wildly impractical.

>> This way you essentially wouldn't need to learn a new set of incantations - just tell the damn thing what you want the damn thing to do, dammit.

I'm going to guess that you've never given a set of requirements to a programmer before. Programmers ARE the human interface to computers, as they are often writing software to specifications created by someone else. Many of the problems arise in the ambiguity of the human-human communication. Another part comes from the lack of specificity combined with different ideas about how to handle unspecified cases. Your idea of what is obvious is not the only one. Some people lack domain knowledge that is assumed in requirements and leads to poor choices where specifications are not complete. In the end, natural language assumes a broad swath of "common sense" that computers do not have yet.

I'll respond to your points, but was the ad-hominem really necessary?

> In the end, natural language assumes a broad swath of "common sense" that computers do not have yet.

Absolutely. I'm not at all saying that we'd have had this last Tuesday if we'd just take our heads out of our asses - I'm saying it's something we should strive toward and not ignore.

> Many of the problems arise in the ambiguity of the human-human communication.

Human-Human communication works, and works well - once again, we do it every single day, all the time.

Do we encounter problems with ambiguity? Sure. But they are by far the exception and not the norm. After all, forms get filled out, Driver's licenses get renewed, complicated Starbucks orders get filled - these common use cases work.

By contrast, have you ever had this fun experience with a terminal program?

> quit

Unknown command: "quit"

> exit

Unknown command: "exit"

> shutdown

Unknown command: "shutdown"

or my favorite:

> quit

Unknown command "quit." If you want to close the program, type "exit."

Simple English statements like these work extremely well in human-human communication and do not work hardly at all in human-computer communication. I'd just like us to get from A to B, that's all.

Again, this is a problem of imprecise communication on the part of the speaker. It is a human problem, not natural language one.

There is nothing magical about human brains that would make them theoretically impossible to express in software in such a way that we can give a program natural language direction

I think we are still a long way from understanding what really happens inside our brains. About telling a computer what to do: They don't 'do' anything. They run programs. (Free after Weizenbaum)

> I think we are still a long way from understanding what really happens inside our brains.

Agreed 100% - I'm in no way saying that it's easy or will be done in our lifetime or the next 10 lifetimes - just that it's not impossible because we're not magic and we do it every damn day.

> About telling a computer what to do: They don't 'do' anything. They run programs. (Free after Weizenbaum)

What I want is (to reference an example in another comment) to go to a command line and type "Copy the report to the share" and (if I'm on a linux box) have my computer translate that to "cp /path/to/report.pdf /path/to/share/" without my ever having to know what it's doing behind the scenes.

Whatever categorical bucket someone wants to put that in doesn't matter to me whatsoever - all I'm saying is that's what I want to see, and that's what I think we should work toward.

What'd help get there is, an object store instead of a file system. We continue to hang files on the ceremonial file tree like xmas ornaments and call it an OS feature. But nearly zero real apps are happy with that. Everybody implements an object store inside a file, and keeps all their crap organized in there (email folders; docx files; project databases and on and on).

When will we get an OS that lets me persist my objects, uniquely identify them with a uuid plus arbitrary attributes (print date??? give me a break), migrate and cache them anywhere and sign them for authenticity? That would be a real OS feature.

Sure all that can be cobbled together on one machine with different libraries. But to be an OS feature, I need servers that understand and respect all that. Object browsers that let me create a relation to view pertinent objects. Security managers that limit access to apps with digital authority etc. All on the network.

The biggest show stopper here is, how do you email your objects to some client after you are done with them? Do you use a different representation? If so, why don't you just use the network representation all the time?

It's not a new idea, nobody ever was able to make that kind of storage work.

Strange claim. Network representation is always different than local, don't know what that could mean.

As for making it work, there's no obstacle. Implementation is straightforward. And since any current file system API is trivially implementable on top of it (create a relation using parentDir, filename, {dates}) there should be little integration issue.

The logical representation of data files is the same on any mainstream OS, wherever they come from (network, file system, device I/O, pipe, etc).

  Error: line 2: 'damnit': no denotation
Just kidding. I agree: pace Prof. Dr. Dijkstra, automatic programming is a grand challenge problem and deserves more attention.

if someone believed half the comments, you'd think humans have not the slightest clue what each other is talking about.

If you don't hear at least a sentence a day where you haven't the slightest clue what was said, then I suspect you don't encounter many humans.

You're exaggerating, but thats ok. If I don't, then I simply ask them to clarify.

This could be a personal shortcoming but I am frequently left clueless what was meant. For low volume interactive situations asking for clarification is fine as long as the domain is simple.

Gathering requirements is generally done in natural language, but it's a very slow error-prone process. Even after sign-off on requirements it's pretty standard for them to be wrong in critical ways. Frankly this is the part of many software development projects that dooms them to failure.

Ignoring the difficulty of actually getting precise natural language, you'll still get to the point where no one can understand the language. If you don't believe me, go read some Kant.

I'd appoligize for the rambling comment with grammatical and spelling mistakes, but they further my point :)

This is key. We don't generally build computer systems that are capable of asking for clarification, but we should.

An error rate of 1/1,000 or so, with almost all of them being recoverable errors? I'll take that. :)

The problem is a bridge between two disciplines. Saying that it should only be built from one side is no better than saying it should only be built from the other.

Indeterminacy of translation is indeed part of the problem. But you're not saying anything new by bringing it up - it's just a jargon term from linguistics to describe a problem a programmer might illustrate with the (buffalo)+ sentence or "(time|fruit) flies like (an arrow|a banana)" example. They already know the core of the problem without needing the whole weight of a linguistics education.

The obvious problem of indeterminacy of translation is why, when computer scientists talk about natural language programming, they do not normally mean natural language processing - even more so in Dijkstra's time, when computers were slower and our NLP algorithms worse.

The core that Dijkstra is getting at here is symbolic reasoning. He's pointing out that natural language is a poor fit for symbolic reasoning, that there's been a history of movement from rhetorical reasoning to symbolic reasoning in mathematics - in fact that mathematics stagnated where rhetorical reasoning persisted.

Even if we solved the translation indeterminacy problem, we would need Strong AI to convert such a high level description into something concrete enough for a computer to do. We propel computing machines using levers made out of abstractions - the higher the tower of abstraction, the longer the lever, and the greater the power. But the problem of programming is not in pushing the lever, it's in building the lever. In a word, it's engineering, not philosophy - it's about how, and not what.

Firstly, of course it's in the realm of all four disciplines. It would require a task so gargantuan as to be on par with J.L. Austin's dream of a formal dictionary, but doable. I just think the math and computer science students are often (as seen in this thread) not as well acquainted with natural language issues than their counterparts, but I'm still not really seeing your point.

Firstly, the indeterminacy of translation is a much, much deeper problem than the buffalo sentence and the time/fruit flies sentences.

The buffalo sentence is interesting in that it is grammatically perfect, and would be easy for anyone to understand following formal rules, but so bizarre that it's confusing at first.

The time/arrow sentence is also not an issue for humans as it's simply the result of two homonyms that happen to be verbs for one definition. They are clever, but not deeply ambiguous, and i seriously doubt they would pose a serious problem.

Second, natural language IS symbolic reasoning. I don't understand how you can justify a claim that it isn't. The term rhetorical reasoning presumes that symbolic reasoning is already happening.

Four plus four equals eight.

4 + 4 = 8.

Let there be a function f taking one argument, returning a result that is the argument multiplied by two.

f x = x * 2

Do you see the difference between rhetoric and symbol? The point is, perhaps, more literal than you suspected?

The numbers '4' and '8' and the words 'four' and 'eight' in your example are already abstract symbols. In fact most words in natural languages are symbols/pointers to different things or processes in the human experience.

Of course you can argue that the 4 number, is more abstract than the word, and that the variable x is more abstract, than the 'concrete' number 4 ... even so, you can build arbitrary complex structures with just 1 level of abstraction/indirection.

In this sence, I agree with scoofy: '(usage of) natural language IS symbolic reasoning' indeed.

The issue is in the form.

It's trivial to define a mapping from the symbolic to the rhetorical. The reverse isn't trivial, but even if it were trivial, we'd still prefer the symbolic.

It's easier to read.

> Second, natural language IS symbolic reasoning.

If that were generally true, we wouldn't have bothered to invent mathematical symbolism and syntax. Natural language has any number of pitfalls -- "are you going to sleep or watch TV?" The "or" in that sentence differs in meaning from the formal logical "or" which would suggest doing both at once.

If I want to express the difference between time and space in relativity theory, I might say:

t' = t √(v^2/c^2)

With that equation I have said a lot, referring to the Pythagorean Theorem, orthogonal dimensions, and the constancy of the speed of light. To translate the above into natural language would require many more symbols, as well as the acceptance of much more ambiguity of meaning.

It's uncontroversial to say that, as we get closer to describing nature accurately, we use more equations and fewer words, and not because of an irrational preference for equations.

"If we learned anything from Chomsky, it's that the underlying grammar we are born with is both instinctual, and follows formal rules. To say differently is literally, demonstrably false."

The rejection of this idea is not only not "demonstrably false", it's actually pretty commonplace among linguists. It would be more accurate to say that Chomsky's theory of a universal grammar is demonstrably false.

"the evolutionary nature of natural languages"

One of Chomsky's more widely criticized ideas (and a pretty bizarre one), is that the language instinct could not have arisen by evolution through natural selection.

If i talk about Chomsky's generative grammar, i do not mean his literal first works, in the same way that i wouldn't mean literally Darwin's evolution when citing Darwin. I mean the theory, generalized as "Skinner was wrong, Chomsky was right." This is not up for debate, it's easily provable. I could site dozens of linguists, Pinker immediately comes to mind offhand.

By "the evolutionary nature of natural languages," i do not mean natural language as a product of natural selection. Instead, i mean the constant flux of natural language dialects. The loss of some words, phrases, etc. over time with the adoption of new ones. "Reprise" has been replaced with "remix," "how do you do" becomes "howdy" over time. This isn't exactly controversial.

A philosopher saying that something's not up for debate? I must be dreaming.

Skinner and Chomsky both came up with some ideas about language acquisition that were useful and stimulated subsequent research. They are both quite wrong, in the sense that neither theory withstands the scrutiny of research over the past 50 years in anthropology, psychology, and [computational] linguistics. Pinker is one of the more prominent critics of Chomsky, who also points out the value of some of his ideas.

In stark contrast, Darwin's theory remains largely intact, with some modifications and additional insights. It's been magnificently confirmed by discoveries in genetics, of which Darwin could have known nothing. Darwin was right and created a successful theory that will live forever, even as it's added to and enhanced. We have nothing like that for a theory of language acquisition, neither from Chomsky, Skinner, nor anyone else, yet. On the whole Chomsky was not more right than Skinner in any ultimately significant way.

Sorry for misapprehending what you were getting at with evolution of language. I think the idea that human language changes over time is not unfamiliar.

I certainly am getting a lot more push back on chomsky vs skinner that i ever imagined.

Could we generally agree that language is instinctual and follows formal or formalalizable rules? If so, then i'd say my point still stands.

> Could we generally agree that language is instinctual and follows formal or formalalizable rules

No, not without some evidence.


> First in it's immediate dismissal of any formalism inherent in natural language

That's not what Dijkstra's article is saying.

> the ease at which he dismisses the proposition without any real consideration

Dijkstra provides several paragraphs of clarification. IMHO, these do provide useful context and show the reader why he has the opinion that he does (of course, we may just disagree here -- but that's okay). What do you feel is missing?

> This subject should be under the purview of analytic philosophy and linguistics, not mathematics or computer science.


> To say differently is literally, demonstrably false.

Why? Can you demonstrate that, or point to a source that does?

> Many people in this thread are talking about the inability of plebes to express what they actually want. This is a fair point, but not a problem with language specifically.

Why not?

> Graduate of analytic philosophy

Possible appeal to authority.

> This is not up for debate

Why not?

> it's easily provable.

Then please prove it, or point to something else that does so.

> I could site dozens of linguists, Pinker immediately comes to mind offhand.

Then please do so, and please be more specific than just giving us "Pinker". The more specific you can be, the more useful the citation is. I (and presumably others here) are not familiar with his work.

Having actually done research in this field, I feel confident in saying that you are wrong about the state of consensus. Here's a decent overview of the ongoing debate (assuming you are aware of what Skinner and Chomsky's theories involve): http://www.simplypsychology.org/language.html

Very interesting stuff. I will say, though, as i responded to just now. If language is instinctual, and has formal rules, or formalizable rules i think my major points above would still stand.

As someone who also has a degree in philosophy (specifically, philosophy of language as it relates to math) analytic philosophy, mathematics and computer science share a lot of common ground. Even the best natural language processing algorithms can only really come up with a probability of the intent of the sentence; this is often not enough for a programming language (which by necessity requires precision to be repeatable).

Nouns themselves can take a number of different meanings (specifically proper nouns and names) depending on context. Communicative languages often don't distinguish between equality and identity; a critical distinction in computer science (i.e. I can truthfully say "I've eaten at same restaurant 3 days in a row" if I ate at 3 different McDonalds locations on consecutive days). These assumptions about identity and equality are not the same from language to language (or even generation to generation, as you discussed). Context is also an issue; we often discuss things in ambiguous contexts which often require clarifying questions from a human. We may have effective communication between humans, but misunderstandings are common. IMO, language is an effective communication tool almost specifically because it is imprecise: we fill in the gaps with our own experiences and it mostly works out.

Overall, I don't think EWD was saying that natural language programming would be impossible; just that the effort required to program in it would likely be more than learning a symbolic programming language. The computer would need to ask so many clarifying questions to reach the level of specificity required for computer science that it would be a very arduous task. Rather than making computer programming more accessible, natural language programming would make it substantially more difficult.

Here, again, i disagree with the weight of your intended counter-factual. Your issues with equality and identity (formally: the frege-russell distinction) i simply disagree with. Your issue here is not equality and identity, but identity and subsumption.

Equality and identity are essentially the same with being verbs. Your sentence is an error of clarity, you'd fool a human in the same way as you'd fool a computer. I believe the ambiguity you are going for is that of "123 Main St is McDonalds" (identity) vs "123 Main St is a McDonalds" (subsumption, also called class-inclusion). Here however, we have the "a" which clarifies that the statement means subsumption. Why? because it would fool humans otherwise, and so we created a rule for clarification.

I certainly think a natural language programming language would need to ask clarity questions via a parser before running a program, but i honestly don't think this is a serious problem.

It would really only be a problem with intentionally ambiguous words and phrases, such as the verb "to hook up" which is intentionally fuzzy enough to allow the speaker to mean one thing, while allowing the speaker to imply something else.

It's been over a decade since I graduated; so forgive me if I don't remember the terminology :) The Frege-Russell distinction is often described with the Julius Caesar example: if I take Julius Caesar and make a perfect, instantaneous atomic copy of him 2 feet away, how do I refer to them? Because it is not really in dispute that they are two separate entities. Are they both Julius Caesar, the man? Or is one Julius Caesar and the other a copy? If you sent them both into a room together and asked them to come out again, how do you determine which one is the copy? This is an admittedly contrived example, but it's relevant because in a computer we can and often do copy objects, change them, then care to distinguish between identity and equality.

Certainly we could come up with linguistic rules to differentiate them, but for the sake of convenience (or maybe because each of us possesses an incomplete knowledge of linguistic rules) we may refer to them as being the same. But this problem becomes much less difficult in a symbolic language, where many languages have defined an identity operator (usually ===) and an equality operator (usually ==).

You're right that we have language rules; but many of these rules are arbitrary and mutable in their own right (and do change over time as your previous comment pointed out). The problem starts to come in that language rules are a lot more complex and subject to multiple different (and potentially correct) interpretations.

I side with EWD on this one: the interpretation of natural language is going to differ from person to person and as such, building a system that can perfectly discern intent is going to require so much context as to be impractical. It's just easier to program a computer using a symbolic language, while building in well understood formal rules to the symbolic languages that cover the vast majority of cases.

It's not that it would be impossible to develop a general purpose compiler capable of interpreting natural language to machine code; merely impractical. Symbolic languages are much better adapted to these tasks; just as natural languages are much better adapted to conveying uncertainty.

Well, the issue you point to here is the flux of language, which is a real problem. There is no "English," we have no academy like they do for French (which is bullshit anyway), and we never will, and it would be impossible. There is just a series of languages each wholly understood only by the speaker, and perhaps people in the immediate culture of the speaker.

Now, this is sort of a problem, but it's not as though we are walking around confused all day. We make it work, we formalize, we come up with a perscriptivist framework, even if we live in a decriptivist world.

We can do this with any issue brought up with being verb "problems." Our Julius Caesar issue is fixed by a simple convention (yes we have conventions in English as we do in formal languages). Getting people to follow the rules is the only problem, and it will eventually fail, but that doesn't mean we won't have a few hundred years of it working, possibly more.

The issue of practicality is an entirely different question. Currently, it's an absurd proposition, that isn't to say however, that it isn't doable. Especially with greater computational power in the future. I think saying it's impractical is a cop out. The amount of energy that goes into learning languages is immense. If we were able to learn one syntax, nearly identical to English, that could convert instructions into readable code, we'd enter a renaissance of programming akin to art after the photograph was invented, or the current renaissance of music we've created with the invention of the synthesizer and sampler. Skill would evaporate, ideas would reign. People would complain that proper rules aren't being followed like they always do, but the amount of programs produced would expand so far that the cream would rise to the top and the world would be a better place.

I guess if I had to boil the problem down to a thought, it would be this: the effort required to build a system capable of interpreting the nuance of language would be as much or more than building a system capable of generating and implementing its own ideas. Once you have a sufficiently powerful AI, you don't even need to tell it to do anything; it should just do it based off sensor data.

If, for security reasons, you wanted to shackle an AI from making and acting on its own choices, you would need to shackle its ability to interpret language as well because they are the same thing. You have to make choices about implied intent when interpreting language; and those choices. You can't just restrict a machine to making choices on linguistic interpretations if those interpretations then lead directly to action (in the case of natural language programming).

We can (and have) created sets of natural language interpreters for specific situations: Siri is a good example of that. But by and large these are hacks that flag specific situations (such as creating a reminder or opening an app) and pick out the relevant phrases and plug data into fields.

The ideas that I, as a programmer in the traditional sense, want to communicate to computers are not the ideas I want to communicate to humans.

I might ask my friend to move the report he's working on to a shared network location so I can load it into my computer and read it: "Hey Joe, can you move the report to the share?"

Joe might ask the computer to do the same thing: "cp /home/joe/reports/cool_report.pdf /network/share/reports/cool_report.pdf"

The actual ideas that are communicated are very similar, but not the same. English is good for communicating one idea while bash/GNU is good for communicating the other.

Just because English has some established formalism doesn't mean it's good at communicating the ideas we want to communicate to computers.

BTW, I don't care which field you put the issue under; it's the same issue and anyone who cares about it might contribute to the discussion.

you very solidly supported the argument you are against.

"Hey Joe, can you move the report to the share?"

is a great way to communicate something you want done. it doesnt matter if its to a computer, a person, or a dog. If it can't operate on those bounds then its not sophisticated enough to actually meet the needs of the user. One day computers will get there, they haven't done so not because its a 'bad way to talk to a computer' but because computers have not yet become that sophisticated.

I've been working with a natural language conversational agent for the last few months, creating a simulated personality from a novel, as a promotion for the novel. Natural language opens up a huge host of unreasonable expectations from the user, but more interesting is the utter gibberish people enter thinking they are conversing. When I say utter gibberish, that is exactly what people enter: sentence fragments with no subject, no verb and no correct spellings outside of "a", even "the" is most often spelled "teh". I suspect "natural language" makes people relax, but even when we hook in ASR (automated speech recognition) to get away from the gibberish spellings, the "sentences" people enter do not make sense. To some degree, I believe people are testing the limits of the natural language and knowledge base backing it, but too many of the "conversations" reveal an expectation of unreal super knowledge, like "you should know what I'm hinting, even though I can't spell or describe it"

Sounds a lot like SMS conversations.

It might as well be a different language in a lot of ways.

It's sad to see him buying into this silliness:

> Remark. As a result of the educational trend away from intellectual discipline, the last decades have shown in the Western world a sharp decline of people's mastery of their own language: many people that by the standards of a previous generation should know better, are no longer able to use their native tongue effectively, even for purposes for which it is pretty adequate.

When people gain a certain authority by being very smart in a narrow field, they often use it to talk about the problems of society in general.

Because though they are an expert in their small field, but no more informed or unbiased than your friend at the pub, they end up sounding dumb.

In this and some of other writings, he would do well to think and gather data about social issues, or talk mostly about algorithms.

All sorts of people talk about all sorts of things.

When someone who is famous talks about all sorts of things, it's fashionable to diss and mock them for "speaking outside their authority", yet everyone else is free to ramble about whatever they like.

You're doing an inverse arugment by authority here; "whatever Dijkstra says about [not-programming] is wrong and should not be listened to, because of who the speaker is".

Yeah, but he seems to be predicting the decline and fall of western civilisation because of unspecific use of language by young people these days. As there is no single definition of western civilisation, this is ironic as well as dumb.

The man in the pub may have a good point. If you read Djikstra's other essays where he wanders off subject, you'll motice he rarely does. He sounds like an angry old man, lost in dreams of a better past that never was.

~2010 Literacy rates in the US are roughly 15ish% fully literate (it's a bit tangled to try and reduce down to a single number; the documents didn't give one, and the data files are in some odd format), according to the US literacy groups and their surveys. It's also worth noting that literacy for the councils on literacy is a pretty low bar.

While I can't speak to total trend (measurements of literacy have changed, as have the prevalence of testing), it's fairly clear in a qualitative fashion that total reading capabilities have declined over the last 100 years. Examine pulps (cheap entertainment books) from the late 1800s, along with childrens' books of the time... significantly more complex paragraphs and much larger vocabulary.

I'm going to demand a cite for the claim that only about 15% of the U.S. is literate.

I'll also note that there's a distinction to be kept in mind between literacy (mastery of the artificial skill of using the written word) and fluency speaking one's native tongue (a skill all mentally healthy human beings pick up naturally).

Quite reasonable! It's a SHOCKING claim.

First, http://nces.ed.gov/naal/ is the organization I was doing my reading with. Their surveys span the last 20 years. There's a 200-page PDF describing the surveys.

Second, "literacy" is, as you point out, a nuanced term. NAAL has broken it out into 3 categories with 4 rankings possible from Below Basic to Proficient. Based on my reading of the survey questions, I drew the line of "literate" as "proficient".

So, http://nces.ed.gov/naal/kf_demographics.asp is a summary page, and " Percentage of adults in each prose, document, and quantitative literacy level: 1992 and 2003" is the graph.

You can see that the percentages are not collected into a single number. I made the ENTIRELY GROSS assumption that proficiency in one area probably leaks into the other areas. I would like to calculate the actual "Total proficiency" score based on the data though.

At any rate, the percentages are abominable.

Dijkstra is right, and this has been a standard problem in philosophy and logic. Only when one considers programming to be some new endeavor divorced from its logical roots that this problem seems new and puzzling. For more, check Russell's Theory of Definite Descriptions.

I'm not sure i follow. The problem of definite descriptions has more to do with reference in the external world. I'm not sure i see how it would be a problem with the limited scope of a formal framework in a programming language.

When you're dealing with variables, "morning star" "evening star" problems are essentially irrelevant because you are defining things, rather than merely naming them.

You're still naming things, they're just things in a computer that you have to carefully specify so that they behave like their real-world referents within your model (also in a computer.)

'Morning'/'Evening' star won't matter until it does - and if it were something that wouldn't ever matter, you probably wouldn't have input it.

But the framework is completely different. In nature, we must have an inductive framework. We will observe phenomena, and name. With computers, we are doing something very different. The framework is deductive, thus we do not name, we define.

With naming, we can be wrong, because of the inherent negative knowledge of our framework. That is, in science, we can only prove with certainty that things aren't the case, not that things are. This is Karl Popper thesis.

With defining, we cannot be wrong about things, we have the ability to have perfect knowledge, so the things we are referring to are defined to be such, we simply cannot be wrong about their identity and structure.

Dijkstra is wrong on this one. Siri and Cortana have already achieved this for trivial classes of programs, and they will only improve.

People comparing natural language programming to EULAs are missing the point entirely: natural language input guides a search process for formal programs, it isn't literally "the program" itself.

I think Djikstra would not be bothered by this. His arguments tend to be that even with such a search process it's the very providence of formal language to give someone the power to guide that search.

In other words, you'd need formal language to be able to specify what you truly want, then would "translate" it to natural language to execute the search, the processor would search for the proper formal language expression, and then you would verify.

Or you could just skip all the intermediate steps.

It's the developers of agents like Siri and Cortana, not the users, who need formal language.

Anyway I don't think we're disagreeing with each other: there is always going to be a place for formal languages, they won't ever go away and be "replaced" with natural language interfaces, it's just that Jane Random will be more than happy to speak some gibberish to her computer and let the machine figure things out.

Oh but we are: I don't think Jane Random will ever reach so far into that direction until her desires become vastly simplified. So in a Wall-e style future then yes, but in one where people still do things creatively there will always be need for formality.

So, they are not programming. The ones that build the backend do it, and put a natural input interface as a frontend.

Yes, but that's a bit of a semantic quibble. If they're making their computer do nontrivial things they're essentially programming.

If the computer changes state it's because of the execution of a program, and if no programmer actually wrote that specific program then the act of creation can be considered "programming".

By that definition, a user drawing a picture is programming too! Especially if the picture is exported to PostScript.

Yep. I take it you don't enjoy this definition?

One alternative I've seen is what happens with artificial intelligence: if we can do it, it's not AI, just "algorithms" or whatever. In reality the definition of "artificial intelligence" is broad enough to cover the things we consider mundane today, like movie recommendations and search engine results.

If MS Paint was souped up with things like repetition and conditionals, would you be so opposed to calling that "programming"?

Natural language is ambiguous: one phrase often has multiple meanings. That’s the result of many thousands of years of evolution, and it’s actually very efficient. If we had to be always explicit, our phrases and sentences would be much, much longer — and boring. When A says something to B, A makes assumptions on B’s context, common sense,beliefs and knowledge. The verbal message itself just contains the minimum information needed, on top of this pre existing information, in order for B to get it.

Natural Language can be seen as a very efficient compression algorithm. The phrase the speaker chooses is the shortest message, given the context of the receiver.

Programming a computer with Natural Language is incredibly difficult, because Natural Language alone, without the context it is built upon, really lacks much of the information the computer needs to operate the program.

All I can picture coming from this is reams of EULAs.

Legal documents contain the most precise language we humans can create that can still be considered to be "natural", and they're still all but unreadable without the proper degree and legal context.

Not precisely what I would call a "win" for the ease of programming

This is so true. Even the lawyers have trouble with the language in these documents. I've often had to work with large legal documents and it's not unusual to find serious mistakes in them. Once I was doing a deal that involved some nested partnerships and I realized that the lawyer (one of the best in town) had made an arithmetic mistake that would have mistakenly allocated an excess $70,000 to me over the course of the project! The attempt to express numerical relationships in english is just too easy to screw up.

Another time a contract had a weird compounded interest increase in percentage ownership that, the way it was written, kept increasing the percentage every year so that by the time the contract was over the other party would have something like twenty million dollars more equity than I did!!! I didn't trust the lawyer that wrote it (again one of the top lawyers in town) so I walked away from that deal, ignoring the claims that it was just a mistake.

This guy made an attempt - http://en.wikipedia.org/wiki/Ithkuil

Reading 'round trip' translations via Ithkuil is interesting.

Our father in heaven

hallowed be your name

May the [metaphorical] environment which fosters/sustains your rule eventually [metaphorically] permeate us

As for your aspirations, may they be made real

on Earth, not just in heaven

please be one who enables us to eat and drink our critical sustenance

And forgive us in regard to our moral transgressions in the way we grant and receive forgiveness amongst each other

may we successfully avoid having desires [that are] against our better judgement)

and be one who enables us to successfully avoid ideas associated with the Devil).

Without poetry, things are more clearly weird.

This really brings out the difficulty in intentionally conserving ambiguity: translating "temptation" and "evil" as "desires that are against our better judgement" and "ideas associated with the Devil" is engaging in some heavy editorializing over the original texts, which are terse and can credibly cover a wide range of meanings.

Consider that programming doesn't have to be linear - while legal documents need to be printed on a piece of paper. Granted, they have limited non-linearity (references to prior pages, pointers, etc.) but it generally has to be readable on physical paper.

For natural language programming to work, computers simply need to be capable of making prudent guesses - so long as we show the information clearly, we can always fix those guesses. We can allow for limited instances of constructed languages, rewiring connections between words, etc.

Legal documents are just as referential as programs are - their function calls are simply defined in a natural way, i.e. "Except for S6.6, S8.2, and S10.2, the test samples are stabilized at test room temperature prior to testing.", or "The following terms of service and end user license agreement (“EULA”) constitute an agreement between you and Rovio Entertainment Ltd, and its affiliates (“Rovio”). ".

You mention that such references are limited in nature, but laws are rarely as limited in their references as EULAs are. Plus, there's a whole layer of context which exists outside the written document - the exact legal definition of words.

I just don't think that a "prudent" guess will be enough, especially as you start distributing such programs to machines or situations where the context is different.

Dijkstra clearly points out that it would be impossible to program in a natural language. The thing is there are constructed languages such as Lojban[0]. Lojban, while semantically ambiguous, is grammatically unambiguous. This means a computer can understand something said in Lojban, but a computer cannot perfectly understand what is meant. While I'm no expert, I'm pretty sure it would not be very hard to create a computer interface using Lojban. I'm curious as to if anyone can explain why this is or isn't feasible.

[0] http://www.lojban.org/tiki/Lojban

That wouldn't help significantly, because the hard part of getting a computer to understand instructions in English isn't understanding English, it's understanding the universe.

Suppose you want a robot you can give instructions like "clean the kitchen". Programming the robot to understand this as performing the action 'clean' on the object 'kitchen' is something we can already do. The problem is that the robot doesn't know how to perform that task. It doesn't even know what what state of affairs constitutes the desirable end result of a clean kitchen (as opposed to e.g. an empty kitchen because it threw out all your food and cutlery along with the trash). That knowledge is the meat of the problem, and it's just as hard in any language.

For some reason, Dijkstra quoting A.E. Houseman brightens my day. And then there's this:

"Therefore, although changing to communication between machine and man conducted in the latter's native tongue would greatly increase the machine's burden, we have to challenge the assumption that this would simplify man's life."

I've noticed the limitation of natural language frequently. Sometimes, its just the inability of natural language to deal with levels of abstraction precisely. Sometimes, its the large assumed domain context that accompanies the use of natural language when describing procedures.

I had clients once where both of these limitations prevented us from being able to produce a product for them. They where incredibly successful Wall Street types. They wanted a software system written that would automate some of what they did. They were traders, but not interested in high velocity trading. After a weeks of meetings about high level goals, we had a meeting to finally get down to specifics about the procedures and functionality that they wanted. They simply couldn't describe in precise language what they did! They had been doing it for years and years very successfully, but we couldn't help them because they couldn't describe what they did. It was very odd.

They worked with sophisticated mathematical models and strange rules of thumb, the morning news, and the perceived level of activity on the exchange floor. Some of them used fancy interactive graphs while others relied on a simple printout of a spreadsheet full of numbers.

The universe they worked in was very complex and involved decision making that, apparently, was hard to describe in words. Imaging a world class boxer trying to put into words the algorithm that he used to win a match. It was a bit like that.

There is a joke that perfectly illustrates this entire article:

A programmers wife asks him, "Please go to the store and buy a carton of milk, and if they have eggs, buy a dozen."

The programmer returns home, and his wife is very angry with him. "Why did you buy twelve cartons of milk?!"

English SE question detailing the linguistic mechanisms: http://english.stackexchange.com/questions/40234/bring-6-egg...


>> At some point I hope to have computer systems I can program by voice in English, as in "House? Could you wake me up at 7?"

> Yeah, well, I fear the answer will be yes (it could), but it won't do so since you haven't asked it to wake you up, only if it could.

From an exchange on python-list, https://mail.python.org/pipermail/python-list/2003-October/1...

Wow, the examples in the SE discussion are awesome. I was aware of some of these linguistic issues, but the examples really underscore how subtle pragmatics can be. (The examples give completely syntactically parallel sentences that are pragmatically resolved in opposite ways.)

Wouldn't this imply that the "buy" instruction is stateful?

No. It's just the implied object of the second "buy" which is ambiguous.

But then the programmer would come home empty handed, without even buying the milk.

This reminds me of when I first learned to program. By far the hardest part was learning how to think clearly and discretely. It changed the way I looked at the world completely. Really, we should want people to overcome this challenge, not avoid it.

I saw the title and thought, "Oh good, someone else who thinks like Dijkstra!" Oh well. :-)

I feel like the halting problem and Godel's theorem imply programmers will never be out of a job, and the best we can do is build out more and more solutions for specific use-cases, like Wordpress and Shopify. I don't think there will ever be a general-purpose AI that can write computer programs. At least not in my lifetime or my children's.

And btw, has anyone else felt that the Halting Problem and Godel's theorem are two sides of the same coin? Is there any formal connection between them? I feel like they are not "independent" (in the sense that Euclid's 5th postulate is independent).

>And btw, has anyone else felt that the Halting Problem and Godel's theorem are two sides of the same coin? Is there any formal connection between them? I feel like they are not "independent" (in the sense that Euclid's 5th postulate is independent).

The theorem of the unsolvability of the halting problem is used in modern proofs of Gödels incompletness teorem(s) [1]. Hofstadter writes about this relation in Gödel Escher Bach as well.

[1] http://www.amazon.com/Lectures-Logic-Set-Theory-Mathematical...

Haha, the first thing I thought was that this was Dijkstra.

UTexas + "Foolishness" => Dijkstra.

This reminds me of Inform 7. The source code for the language is pretty much plain English. I like this because programs are read more often than they are written, and it's usually very easy to read a program and see what's going on. But the syntax is still pretty limited, and it only understands certain ways of expressing things. So you still have to learn the syntax and very specific semantics, even though the language looks like English.

If you use the same language to solve a problem as you use to describe the problem and to define success, then your ability to accurately solve the problem will be constrained by the level of precision that is available in the language being used.

Given this- natural language programming can hope to achieve reasonable levels of precision in very few languages. English, for example, has over a million words, while French and Italian are both under 100K. That 10x discrepancy in words means that there is necessarily far more imprecision in romance languages than English. Graamatical constructs can add precision and clarity but cannot make up for the inherent gap in vocabulary precision.

Someone with more knowledge can comment on the feasibility of precise NLP in non-English, non-romance languages.

If, however, acceptance testing is defined through the precision of a low-level programming language, but the problem is defined in a less-precise natural language, there will always be a precision mismatch between the languages used to define the problem. A solution may be implemented that addresses the natural language issue but cannot meet the constraints of the programming language.

Sometimes I wish that the term 'language' had never been used to describe the blobs of text we use to create formal specifications for programs. This invites all sorts of comparisons/analogies/metaphors to spoken and written languages that may not necessarily be meaningful or constructive.

I think of source code as being more analogous to architectural blueprints or formal logic than someone's verbal description of an object. It's just a coincidence that source code is governed by structures and concepts that exist in human languages.

Usually when people say they want to be able to create software with natural language, what they want are better tools that allow them to more efficiently create something, with less time spent worrying about syntax or technical minutiae. This outcome doesn't require natural language, it just requires better tools.

I kind of disagree with the article and would like to be able to use natural language. The fact that computers are currently too dumb to understand English is an issue but not the one the author seems to be going on about. One of the reasons I like Python is it reads rather like English making code easy to understand and one of the main reasons for Lisp not really taking off is its reading so unlike natural languages makes it hard to understand. The comparison with maths seems false to me - many mathematical concepts express very badly in English and elegantly in mathematical symbolism - complex numbers, path integrals and all that. There seems little with that kind of complexity in computer science it's all add this, write this to the database, draw this on the screen etc.

I think you missed Dijkstra's point. Why don't we use "natural language" to reason about mathematics, physics, logic, or even the structure of language itself?

Possibly. We do in fact use natural language to reason about mathematics and physics - if you look at any text book it's about 90% english words and 10% equations. But I'll give you the equations are important and powerful in math and physics but I'd argue less so in computer programming.

@tim333 Dijstra was arguing that mathematics was actually held back until mathematicians decided to embrace a formal system to describe it.

Actually, we do all of these things.

Why do proofs have commentary, and why do we have physics text books?

But people are too dumb to understand English as well. In fact, I may have missed your point entirely! (which, oddly, would also favor my position). Why do you want computer programs to improve at something people are notoriously bad doing?

Python is nothing like English. It is an extremely precise formalism, and thank god for that.

This article is an Appeal to Novelty fallacy. He never gives a reason why explicit language is better - just that it occurred later in civilization.

Beyond this, natural language programming doesn't preclude explicitness. You could still have portions of limited explicit language or even portions of actual code if that is insufficient.

Natural language programming gets us two advantages:

1. Easy entry (presumably what the author cares about)

2. A language which is composed of an infinite number of DSLs - but without the pains of limited language scope and in which DSLs are actually easy to write. This actually fixes the problem author was taking about - using a language for a purpose it clearly wasn't meant to.

And if you think natural language programming is going to get rid of symbolism in language, I've got a U+1F309 to sell you.

The problem is of course the vagueness in natural languages caused by the complexity of its grammar and the unstated contextual nature of it all.

"The black truck driver ran through the red light."

Is the truck black or is the truck driver black?

Is the driver driving, or is he running on foot?

Is the red light a traffic indicator, or is it a just a beam of red light?

Teaching a computer to figure out the intended interpretation here is a monumental task, and teaching the people to understand how their use of language is actually very easy to misunderstand is even harder.

Indeed, but figuring this out is trick

In Western countries you'd assume it's a black driver. In a country where almost everyone is black you'd assume a black truck. So the code could have bugs depending on the location/culture (and thus context).

Maybe we need some contextual analysis capabilities and new class of errors based on overly ambiguous code.

(Along with a user defined value that let's the computer take its best guess)

The counterpoint would be Inform 7 [1], a natural language-based programming language for interactive fiction.

[1] http://inform7.com/

Inform is a formal language with precise disambiguation mechanisms that natural language doesn't have.

The goal is not simply to replace curly braces and other symbols with English words; unlike what COBOL would have us think, that road doesn't lead to natural language at all.

I like that Dijkstra used the word foolish.

As someone who has interacted with Google, Microsoft and Apple's natural language systems I am glad this was ignored. Happy someone stayed hungry.

In another comment, I compared the process of making computers do our will to manipulating a lever, the lever being made out of abstractions.

What Google, MS and Apple's NLP systems are doing is mapping your words to a selection of pre-prepared levers. It's simplistic in the extreme compared to the real problem of being able to create new levers.

IMO, in dismissing Dijkstra, you completely missed his point.

Don't see any levers in the article lol I have no clue of what u saying sorry

I definitely think the aspiration for natural language programming has only made things better and something we all look forward to

Anything open to ambiguity can be resolved through dialogue.

That was a good article, and I agree with his take about the value of using natural language to program computers. However, the last, and unnecessary, "remark" undermines the entire essay by revealing a profound misunderstanding of the nature of human language, thereby placing himself firmly into the irrelevant crank category in the minds of many potential readers.

What this misses is that it can be useful to structure design as a conversation, rather than a monologue.

We often make this mistake with computer systems. I.e. the need to specify everything up front. Imagine a workflow where the process of design is a conversation with the system. To some extent, REPLs capture this, but they're still driven wholly by the programmers.

The same could be said of "natural language contracts" and "natural language laws and regulations". Lawyers get hounded for "legalese" but its just a form of code--based on but deviated from natural language--to try to overcome the inherent inefficiencies of natural language.

This is my favorite part of software engineering. The fact that there are so few instructions you can actually give the machine and that you have to map your abstract ideas onto those instructions

Meanwhile we have things like Wolfram Alpha and Google search trying to process natural language in the place of more code-like input.

see also natural language maths and natural language law.

though, to be fair, eliminating useless jargon is helpful, but you've got to keep the useful jargon.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact