Hacker News new | past | comments | ask | show | jobs | submit login

I'd think that the advantage of machine translation is on corpora that are not known up front (i.e. user-supplied text) or corpora that are exceptionally large.

If you have a small (ish), well-known text, I don't think you will get much insight from machine translation. Certainly there are plenty of uses for computer text analysis/mining in biblical studies, but I doubt translation is one of them. And for obscure idioms or hapax legomena, machine translation definitely can't help you because by definition there are no other sources to rely on.




With a sufficient level of precision, there's room for machine analysis to "reveal" things we are ignoring out of custom. A lot of text analysis done by people is full of biases and deferral to authorities.

E.g. I remember from school getting in into an argument with a teacher over the interpretation of a poem. "His" interpretation, which was really the interpretation of some authority who'd written a book was blatantly contradicted by the text if you assumed that the author hadn't suddenly forgotten all his basic grammar despite all the evidence to the contrary everywhere else that he was always very precise in this respect.

Of course, in some of these kind of instances, it will be incredibly hard to overcome the retort that any "revelation" is just a bug.

In a more general sense, people are typically exceedingly bad at parsing text, judging by how often online debates devolve into bickering caused largely by misunderstanding the other party's argument. Often to the extent of even ending up arguing against people who you agree with. Having tools that help clarify the parsing for people might be interesting in that respect too.


Well I wouldn't look for idioms, but it would be interesting to throw in information such as "Strong's Concordance" into the mix, I've yet to really think of an application for this library fully, but it would be fun to play around with it nonetheless. I would be analyzing the Hebrew / Greek / Syriac scripts, seeking verses omitted, or missing, etc. It would make for interesting studying if anything.


You might be interested in Andrew Bannister's research on computer analysis of the Quran. He wrote a book on it [1], and there's also this paper which gives a high-level overview [2].

[1] http://www.amazon.com/Oral-Formulaic-Study-Quran-Andrew-Bann...

[2] http://www.academia.edu/9490706/Retelling_the_Tale_A_Compute...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: