
Turkish Grammar by Predicate Logic - fatiherikli
https://pypi.org/project/kefir/
======
maeln
Turkish is an amazing language. I have been learning it since my wife is Turk
and it has been a pleasure. Everything is regular and well defined, even the
way a word is constructed is defined by rule. Compared to my very irregular
native language (French), it is a real pleasure.

I supposed, this makes it a much more simple language to do NLP than a lot of
other language.

~~~
tehlike
The good thing about Turkish language besides its grammatical structure is the
way you pronounce it. I am a very fluent English speaker (and have been living
in the US for 8+ years now) and occasionally i am amazed at how some things
pronounced compared to root of the words (though when you get the hang of it,
you eventually develop an intuition as to why).

With Turkish, it's very trivial.

~~~
egeozcan
Well there are also a lot of exceptions in Turkish. Like "e". "Mert" and
"Meyhane" have different "e"s. For many words, you just have to know how to
pronounce them.

~~~
delidumrul
Actually, this is not an exception in Turkish. It is because of that "meyhane"
is a foreign word (adapted from Persian). In any case, no one will expect smo
to pronounce these 'e's distinguishably.

~~~
patates
First of all, nobody will understand if you pronounce it "meːɐ̯t" (as in
"mehr" in German + "t").

Mert is actually also a foreign word. Well, most (I'd guess 90%) of the words
starting with "me" are of Arabic or Persian origin.

But the difference is there in Turkish words as well. Geri, gerzek (which is
interestingly a shortcut for "geri zeka") and gevrek.

There are many "e"s in Turkish, difference among which is very sparsely
documented.

Don't even get me started on the "ğ" (which is normally supposed to just make
the previous vocal letter longer but took on many other roles).

------
mda
Not the same technique but generation of words is possible with Zemberek
library as well, see the bottom example in the morphology section:

[https://github.com/ahmetaa/zemberek-
nlp/tree/master/morpholo...](https://github.com/ahmetaa/zemberek-
nlp/tree/master/morphology)

------
PeterisP
GF
([https://www.grammaticalframework.org/](https://www.grammaticalframework.org/))
might be relevant to this, it seems to have a similar perspective but is an
established multilingual project, though with limited support for Turkish
(yet).

------
unhammer
If you find this interesting, there are a _lot_ of resources for the Turkic
languages in the Apertium platform:
[http://wiki.apertium.org/wiki/Turkic_languages](http://wiki.apertium.org/wiki/Turkic_languages)

These include monolingual lexicons and bilingual dictionaries that can be
compiled to finite state transducers, as well as disambiguators/parsers (in
vislcg3 Constraint Grammar format) and annotated corpora. For some pairs in
the language group, there are also full machine translation systems and
spellcheckers (for LibreOffice, Firefox and Microsoft Office).

Much of the data is already available in Debian/Ubuntu/Fedora/OpenSUSE, if you
don't need the newest git:

    
    
        $ sudo apt install apertium-tur
        $ apertium -l |grep tur
          tur-disam
          tur-gener
          tur-lexc
          tur-morph
          tur-tagger
          tur-twol
        $ echo 'aynı maruldaydık' | apertium tur-disam
        "<aynı>"
        	"aynı" det dem SELECT:144
        ;	"aynı" adj SELECT:144
        ;	"aynı" adj subst nom SELECT:144
        ;	"aynı" adj SELECT:144
        ;		"i" cop p3 sg
        ;	"aynı" adj SELECT:144
        ;		"i" cop aor p3 sg
        ;	"aynı" adj subst nom SELECT:144
        ;		"i" cop p3 sg
        ;	"aynı" adj subst nom SELECT:144
        ;		"i" cop aor p3 sg
        "<maruldaydık>"
        	"marul" n loc
        		"i" cop ifi p1 pl
        "<.>"
        	"." sent
        
        
    

Or use the stuff from beta.apertium.org for a quick test; click "analyse" to
go from form to analysis:
[http://beta.apertium.org/index.html?choice=tur&qA=ayn%C4%B1%...](http://beta.apertium.org/index.html?choice=tur&qA=ayn%C4%B1%20maruldayd%C4%B1k#analyzation)

Click "generate" to go from analysis to form:
[http://beta.apertium.org/index.html?choice=tur&qG=%5Eayn%C4%...](http://beta.apertium.org/index.html?choice=tur&qG=%5Eayn%C4%B1%3Cadj%3E%3Csubst%3E%3Cnom%3E%24%20%5Emarul%3Cn%3E%3Cloc%3E%2Bi%3Ccop%3E%3Cifi%3E%3Cp1%3E%3Cpl%3E%24#generation)

There's no "predicate logic" yet, but the Constraint Grammars already used for
disambiguation in Apertium are very well suited for syntax and dependency
parsing (there are some quite advanced ones for the Saami languages that could
be used as a basis)

------
nanis
I am also going to point out a subtle mistake committed by this specific
library. The examples include this:

    
    
        hanımelinin çiçeği (flower of a plant called hanımeli)
    

That's subtly wrong. The flower is called _hanımeli çiçeği_ just like
_portakal çiçeği_ etc.

I fear that the inability of computers to deal with Turkish will end up
morphing Turkish into something computers can deal with instead of improving
computers.

Turks have done great damage to Turkish since the junta. Just a simple
example: Almost everyone says "Deniz Sokak" instead of the correct "Deniz
Sokağı" while somehow getting things like "Anafartalar Caddesi", "Çanakkale
Apartmanı" etc right.

Adding computers to the mix is not going to improve things.

------
krmbzds
Fantastic work! I hope to see it become feature complete with the remaining
grammatical cases implemented. It would be great to have an active and mature
Turkish NLP library. Much needed work! ( _Preparing a pull request ATM._ )

~~~
mda
As a note, Zemberek is still actively maintained and mature imo. But the more
the merrier of course.

------
mavdi
Fascinating stuff. I think I should study the code to understand the language
better. I speak both Turkish and Azeri and most of times I'm unable to explain
how the grammar works.

------
nanis
As I pointed before[1], Turkish and computers don't really get along. The
beauty of Turkish lies in its irregularity, not the rules you learn. That
irregularity is amplified by the shrinking vocabulary in use owing to the
disappearance of many Arabic and Persian words which people used to use as
recently as the 30s, 40s, and the 50s.

Understanding a Turkish speaker requires a grasp of the entire context at all
times. " _Matematikten çakmak_ " can mean to be good at math or to flunk math.

Another example which I noticed because it involved my name is " _Uçak düşsün
ama sadece Sinan ölsün!_ "[2] which Google Translate translates as " _The
plane is your dream, but only Sinan!_ "[3].

In fact, it means " _I wish the plane crashes and only Sinan dies_ ". Of
course, " _düş_ " as a noun is " _dream_ " but "düşsün" is derived from the
verb " _düşmek_ " which means "to fall" but when it is coupled with a plane
means " _crash_ ". Interestingly, there is not a good general purpose
translation of "plane crash" to Turkish. Sure, you can get away with " _uçak
kazası_ " in most cases, but not all crashes are accidents. In fact, something
as simple as " _five perished in the crash_ " is perilous to translate. Most
will translate that as " _kazada beş kişi hayatını kaybetti_ " even in cases
where the crash was not an accident. To deal with similar things, people end
up over-using the word " _olay_ " (event).

Now, ponder the difference between " _Sinan bir düş_ " and " _Sinan bir düş_
".

Turkish and machines don't mix.

Now, let's briefly consider the claim that Turkish pronunciation is easy
compared to English. It is not. I am not even going to get into regional
differences, but leave this one example which became even harder after the
military junta abolished the circumflex. List the meanings of the sentence "
_Karı severim_ ". Which one is pronounced differently? Note that at least one
of those meanings can get you in trouble in polite company.

Edumacated Turkish native speakers always had a tendency to resort to English
or French words even when completely unnecessary and I've been observing in
the press and social media many examples of automated translation errors
making it into daily use.

PS: A related rants[4,5].

[1]:
[https://news.ycombinator.com/item?id=17737333](https://news.ycombinator.com/item?id=17737333)

[2]: [http://www.hurriyet.com.tr/kelebek/magazin/ucak-dussun-
ama-s...](http://www.hurriyet.com.tr/kelebek/magazin/ucak-dussun-ama-sadece-
sinan-olsun-40725251)

[3]:
[https://translate.google.com/#auto/en/U%C3%A7ak%20d%C3%BC%C5...](https://translate.google.com/#auto/en/U%C3%A7ak%20d%C3%BC%C5%9Fs%C3%BCn%20ama%20sadece%20Sinan%20%C3%B6ls%C3%BCn)!

[4]: [https://www.nu42.com/2013/04/translation-of-programming-
term...](https://www.nu42.com/2013/04/translation-of-programming-terms-
and.html)

[5]: [https://www.nu42.com/2014/08/replacing-hash-keys-with-
values...](https://www.nu42.com/2014/08/replacing-hash-keys-with-values-
does.html)

~~~
delidumrul
Many mistranslations may be a result of lack of work/research in Turkish
language. I don't agree with that Turkish language does not get along with
computers well. Yet, there is not a mature work on this. If a language is
understandable, analyzable, executable, expressible by humans, computers most
probably can process it (this is a claim, not a fact).

~~~
nanis
> _If a language is understandable, analyzable, executable, expressible by
> humans, computers most probably can process it (this is a claim, not a
> fact)._

In English, a sentence[1] is " _a set of words that is complete in itself,
typically containing a subject and predicate, conveying a statement, question,
exclamation, or command, ..._ ".

In Turkish, sometimes you need a paragraph, sometimes you need a page, and
sometimes you need the whole story to provide " _a set of words that is
complete in itself_ ".

Try these for size[2,3].

I especially love the translations of the word " _tane_ ". Or " _say_ "
here[4].

I am going to stop before I keel over. Sure, you can try to codify a rule for
each and every " _set of words that is complete in itself_ ", but I doubt that
set is finite.

The way things are going I full expect to see people being required to use
Turkish in a way computers can understand before computers can deal with
Turkish.

[1]:
[https://www.google.com/search?q=define%3Asentence](https://www.google.com/search?q=define%3Asentence)

[2]:
[https://translate.google.com/#tr/en/%C3%A7akmak%20%C3%A7akma...](https://translate.google.com/#tr/en/%C3%A7akmak%20%C3%A7akmak%20g%C3%B6zleri)

[3]:
[https://translate.google.com/#tr/en/%C3%A7ak%20bir%20tane](https://translate.google.com/#tr/en/%C3%A7ak%20bir%20tane)

[4]:
[https://translate.google.com/#tr/en/say%20ba%C5%9Ftan](https://translate.google.com/#tr/en/say%20ba%C5%9Ftan)

~~~
egiboy
English is not immune to ambiguities and thus not superior to Turkish in this
regard:

[https://translate.google.com/#en/tr/fruit%20flies%20like%20a...](https://translate.google.com/#en/tr/fruit%20flies%20like%20a%20banana)

~~~
delidumrul
One approach to test the effectiveness of a translator is translating a
passage from L1 to L2 and then its reverse. When you do this action many
times, if the final version of the passage in L1 gives the meaning of the
original passage in L1; then it is a stable translator. They do this test for
English-German at Linguee translator. Their claim is that it is more
successful than Google translate in these terms. This gives another idea that
Google Translate does not give strong results for English-German translations.
Considering German is much more closer language to English than Turkish, I
don't expect much from Google Translate. Thus, I don't see it as a proof to
any argument.

~~~
nanis
On Google translate, Dolar düşsün → Dollar dream → Dolar rüyası.

