

Can We Translate English To English? - robinhouston
http://rjlipton.wordpress.com/2010/11/12/can-we-translate-english-to-english/

======
mcantor
The linguist in me recoiled in disgust at this idea, instantly envisioning a
nightmare universe where laconic tweens fingered anti-prose into their touch-
screen phones ("lol i tnk ur rong cuz he wuz @ skool ystrday"), to be Googled
into corresponding legible words ("I think you have the wrong idea; I saw him
at school yesterday.") as their actual ability to communicate slowly
atrophied.

The developer in me thought, "Oh man. That would be effing sweet."

~~~
scott_s
Why is that idea disgusting? When people know what they say will get
translated, they tend to think in terms of the end result. This would
_promote_ the use of proper English, because that's how it would be read and
written. When I code in a programming language, I am constantly thinking about
what the language constructs _mean_ , not what they literally say. The
disjointed net-speak would just become a form of shorthand
(<http://en.wikipedia.org/wiki/Shorthand>).

~~~
mcantor
Perhaps it depends on the person? My "disgust," or worry, in my original post,
came from the vision of someone internally or unconsciously deciding, "The
computer will make my communication effective for me, so I needn't bother
learning to communicate effectively myself." Without an empirical
understanding of how to communicate, a person would also lack the trappings of
that learning process, and, I think, would be forever marred as a result.

When you code, mustn't you be aware of what the code constructs literally say,
because that is how the compiler will interpret then?

~~~
scott_s
When I say literally, I mean that a for loop is constructed of the English
word "for." I don't think of it like that. I think, "Okay, I need to iterate
over the elements of this sequence, so I'll use a for loop." The code is
literally just text, but I must understand their _semantics_ to know what will
happen.

Similarly, if someone was using what you proposed, they would have to keep in
mind that the other person would see the proper English, not the net-speak. So
they would - I think - end up thinking in proper English first, then _coding_
the proper English using net-speak. They must think in terms of the end
result, in terms of what the person they're communicating with will read. For
that reason, I disagree with your assessment that people would be able to
think "I don't need to bother learning to communicate effectively."

~~~
mcantor
Ah, I see. You're saying that the shorthand exists only for brevity of input;
when a tween types "lol ur rite," they first think of what they _mean_
("That's funny! You're right!"), and only _then_ do they translate it into the
corresponding shorthand so they can type it faster. That's a very good point.
I'll have to ruminate on that for a while.

 _Edit_ : It's worth noting that something about it still feels horribly wrong
to me, but I can't seem to find the right words to express it. So I will
concede... for now! :-)

~~~
scott_s
Yes, that's exactly what I'm saying. Perhaps your reservation comes from the
fact that now there is no translations, so you associate net-speak with what
it is now: stilted communication optimized for brevity of input time and
length.

~~~
mcantor
I think my reservation may come more from what I'm saying in this comment:

<http://news.ycombinator.com/item?id=1898053>

Even if a computer could translate netspeak to "Proper English," it would
still be _computer-translated_ "Proper English." The idea of allowing a
computer to choose which words to use, or where punctuation should go, or
anything like that, is unsettling to me. The translation model works for
programming languages because compilers have no emotions, but any message that
starts out in shorthand with the intention of being read by a human is
intrinsically hamstrung.

In fact, this can be true for programming languages, too! C is like short-
hand; the compiler is what turns our netspeak (C) into "Proper English"
(assembly/machine code). It usually works, but sometimes you just have to dip
into assembly to _really_ express what you mean.

------
dinedal
If there was a program to go from legalese to lay that was as accurate as a
lawyer then it would change the lawyer's profession forever.

------
mcantor
Honestly, the more I think about this, the more I think that we'll have an
English-to-English translating algorithm as soon as we have a "classical-
masterpiece"-writing algorithm or a "painting-so-beautiful-it-makes-you-
cry"-creating algorithm; that is, probably never. I'm no expert writer, but I
know a few, and it seems to me that excellent writing style is more than
reading Strunk & White cover-to-cover while reciting mnemonics for choosing
between "who" and "whom." It's also knowing when to break or bend the rules,
_how_ to break or bend the rules, and _why_ to break or bend the rules. That's
the kind of spontaneous creativity that I don't think computers are even close
to emulating, much less learning.

~~~
hxa7241
As Wittgenstein said, "language is a form of life" -- it is not just a system
of encoding.

But something like Photoshop filters for text style might be possible. You can
imagine the cheesy use of, say, gothic-romanticising, like overdone HDR tone
mapping. But there are useful image manipulations too, when well deployed, and
there must be similar potential for text, maybe more so given its more
structured form . . .

------
drats
"This is one of the best written and slickest proposals I have ever read. Do
NOT fund this work under any circumstances."- NSF grant reviewer

That's eminently quotable.

edit: "rant reviewer" => "grant reviewer"

edit2: proposal => proposals (even though it's proposal in the link, putting
proposal[sic] just looks odd).

------
cabalamat
One thing some people do is take an article on the net, use Google translate
to translate it into some other language and then back to English, and put it
as an article on their website to attract money from Google ads. This counts
as better, for a suitably restricted definition of "better" -- it makes
approximate sense and is different enough from the original that Google
doesn't spot it.

------
trotsky
_The idea was that perhaps they could help improve the quality of this or any
other written piece._

I'd pay $10 without thinking twice for a chrome extension that simply ensured
the correct versions of there, their and they're were replaced based on
context when they were misused. Hmm, I wonder how hard that would be.

------
nodata
A company called White Smoke has an application that will take non-native
English and fix it: <http://www.whitesmoke.com/products.html>

------
buymorechuck
I've often thought the same, and set out to build song writing software with
ML principles. I ended up with iRhyme, an iPhone app that analyzed a huge
corpus of existing song lyrics to make a good song writing oriented rhyming
dictionary. That worked out well but improving general prose seems difficult.
I think domain specific applications could be good stepping stones.

------
Prisen
Isn't this idea very similar to the grammar suggestions you can get from Word?
The quality of those suggests it is a very hard problem.

------
mcantor
Obligatory: <http://www.translationparty.com/>

------
lionhearted
Some classic works are very difficult to read in their native language. I've
had Japanese friends say that Genji Motoganari - probably the most famous
Japanese literature of all time - is basically unreadable in Japanese, and
they didn't "get it" until they read it in English.

I feel the same about some older English works. For a long time, I've thought
it'd be cool to have one talented author translate an older English book into
modern Japanese, and a second author who hadn't read the original to translate
it back to English. I bet it'd be more readable.

~~~
xiaoma
I read part of Genji _Monogatari_ during my Japanese studies as an undergrad.
It's nearly 1,000 years old and it's absolutely _not_ the same language that's
spoken in modern Japan. How much of anything written that long ago could a
normal native English speaker understand?

It's just not a reasonable thing to expect from the general populace.

------
klbarry
I think it's possible, but you would need peices to draw off of: Google uses
U.N. translations to train their algorithms, but how many documents do we have
translating legalese into good english, or poor writing into great writing?
Probably not more than a couple hundred.

~~~
danohuiginn
We have documents that have gone through an editing process, with changes
tracked from draft through to published article.

I'm thinking particularly here of the process inside, say, a news agency. Many
of the changes made by sub-editors are fairly mechanical modifications, to
conform to a standard style. They might well be sufficiently repetitive to
train a 'translator'.

You could also take a corpus of 'good' and a corpus of 'bad' writing, and
compare the features in each. sentence length, constructions used. Origin of
words would be an interesting one ('never use a Latinate word where an Anglo-
Saxon one will do').

Unfortunately, most of the difference between good and bad writing is in the
ordering of ideas; you can't really deal with that on statistical grounds.

