I used this method too and found several misstakes, it can be pretty gross mistakes where tone shift is not detected by choosing the wrong words. Idioms and fixed phrases are the hardest I guess. Usually I can get an explanation if I ask four-ten follow up questions it can be really hard to get it right. I hope you are not missing mistakes, and that it is just how I use it that makes it such a burden for me.
I have the worst time with transcripts, and email conversations.
The accuracy of the results might depend on the language you’re reading, the LLM you’re using, the nature of the text, and the amount of context you give to the LLM. When I’ve tested the best models with more or less standard texts, such as excerpts from novels or news articles, in English, Japanese, and Russian, the results have been extremely good. The latest versions of ChatGPT, Claude, and Gemini are able to explain the meetings of words quite well, and they also get the grammar correct. (I say this as a long-time language teacher and lexicographer. I have written and edited many textbooks and dictionaries for learners of English and Japanese; LLMs come close to my ability and maybe exceed it sometimes.)
They are not always so good, however, with more granular aspects of language, particularly the way words are written or pronounced—the problem the models have with the word “strawberry” is well known. I’ve also seen them struggle with the meanings of words and sentences in isolation, as a lack of context can confuse them (as it can confuse people).
In the case of emails or transcripts, the text might contain mistakes or non-standard language that might trip up the LLMs as well.
In any case, at least for major languages and non-critical applications, I think LLM’s are a great way to understand what is written in another language.
As a learning tool it works ok as long as you keep vigilant. When lose the original text though or become over reliant on the LLM results you will make mistakes. I will agree as long as you have lots of contexts the mistakes will be few, but in real human communication context might change fast and span multiple medias.
I have the worst time with transcripts, and email conversations.