Hacker News new | past | comments | ask | show | jobs | submit login
Paralleltext: Learn languages by reading (paralleltext.io)
312 points by monort 4 months ago | hide | past | web | favorite | 88 comments

This is really cool. As an upper-intermediate Spanish student/speaker, I spend about 1/3 of my language study time reading Spanish books, transcripts, blogs, etc, so this seems like a nice tool. One thing I'd recommend for anyone hoping to make reading a part of their studies would be to get a kindle with a touch screen (paperwhite or otherwise). From there, you can install an English <-> $TARGET_LANG dictionary, and just touch the words that you don't know for instant translations. I think a good target is text where you already know about 90% or more of the words. Otherwise, things get pretty cumbersome (translating too much stuff makes you lose context). Lastly, you can dump the words from your Kindle into an Anki deck or other flashcard program for independent study. This has really helped boost my vocabulary. Lastly, for anyone else learning Spanish, I'd recommend News in Slow Spanish for lower intermediate level, and then I'd suggest Radio Ambulante (with the transcripts) for upper intermediate/advanced level. You can do a "send to kindle" from your browser on any Radio Ambulante transcript on their website, and it renders perfectly for offline reading.

Do you have any sources in particular for a beginner in Spanish? Which blogs do you recommend do you have anything on technology? Which news?

I know http://lite.cnn.io/en which has all the articles in Spanish as well. Really useful but sometimes a bit to cumbersome to read for my level.



Start with that article. Follow the advice about building core vocab to get a baseline set of words under your belt. I highly recommend Anki, but you can obviously use whatever tool fits our needs; although spaced repetition is something I'd definitely look for.

For the early stages, I'd recommend Pimsleur. It's roughly 30 audio lessons per unit. They're about 30 minutes each. They'll familiarize you with basic travel phrases and help you to start getting a feel for the mechanics of the language.

You can take Pimsleur all the way through the 4-5 units that are provided, but it's pretty narrow (travel/business Spanish), and it's VERY formal, so not a great long-term resource.

I'd supplement Pimsleur with something like Destinos (telenova online for learning Spanish) to get some variety into your practice.

Your real goal is getting to the point where you can consume level-appropriate media to make learning less of a chore and more fun.

If you're serious about making progress, find language partners and a teacher (or two) on italki.com. I can't emphasize how helpful this has been.

Look for podcasts that are geared towards Spanish learning. Notes in Spanish is pretty good, but it's Castillian, so I avoided it for that reason. Españolistos isn't terrible if you can handle the husband's terrible Spanish accent (he's a gringo from Texas).

Both of those would be in the intermediate phase. The intermediate phase is a LONG journey to get towards advanced, so that's where most folks fall off. You can make some tremendous early gains just by getting some basic vocab and travel phrases, and that kind of learning is pretty intoxicating; however, really learning a language is just a matter of discipline (practicing every day), and grinding it out.

Duolingo has a good introductory podcast for folks that are earlier on in their learning. I recommend duolingo to learn basic vocab, but don't expect any miracles to happen regarding "fluency" with ANY app.

Clozemaster is good once you're further along for drilling the mechanics of sentence structure into your brain via thousands of "reps" (yes, sometimes it really just feels like physical exercise for your brain).

Also, get yourself a decent book that explains grammar, and a book will all the conjugations of the main verbs. Spanishdict.com is an amazing resource for contextual sentences and conjugations, but books are nice for laying in bed and reviewing stuff.

"Easy Spanish - Step by Step" is a pretty good introduction to the basic language mechanics. I read it from cover to cover (and did the exercises), and it was pretty helpful.

Barron's 501 Spanish Verbs seems like its the standard verb conjugation reference. SO much of learning the language is just knowing how to conjugate properly. The explanations of the tenses at the beginning of the Barron's book have been indispensable in my understanding of how each tense works.

I wish you luck. Feel free to email me if you want any additional pointers (tinymountain [at] gmail).

http://lite.cnn.io/es times out for me.

The previous link ended in ‘en’ not ‘es’.

Edit: sorry for not checking the site. I now see the site links to the es ending url and that times out for me as well.

I've found the French and Spanish short stories for beginners with downloadable audio to be a really useful tool. They're pretty mundane stories about everyday life aimed at adult learners.

Really cool concept, love the split view. The voiceover feature is nice, but I think an inline dictionary would be even more useful - maybe something along the lines of macOS' built-in lookup (https://i.imgur.com/qGTQ80a.png)

> I think an inline dictionary would be even more useful

Yes this. It would make a huge difference for me at my level. I'm finding that I lose my focus on specific words. I can gather the general meaning from the sentence and/or paragraph but...

Translating the entire paragraph is too much of a crutch and forces me back into thinking in English. If I can get past the single word(s) with a click, I can keep my thoughts entirely in the second language. I suppose an extension could probably fix this...

Readlang is the extension you're looking for: https://readlang.com/

I've given it a quick try, and it seems to work fine with this site

Totally agree, we'll work on this feature next.

Ironically enough, macOS's dictionary is missing the correct (in this context) translation of refléchir in this example :)

I made something like that for myself some time ago: https://aprelo.wearekiss.com

Neat! But not yet for beginners – my choices for eng→rus are Tolstoy and Tolstoy :-S

Also, the "I speak" list needs to be checkboxes, not radio (and remember the choices).

This is what I've been hoping to try to read eventually https://web.archive.org/web/20060629211922/http://www.shnare...

The book is the basis of the film Stalker, which is a must-see

Lingq.com is a mature platform to learn languages by reading and listening. The only app I'm not abandoning after few days, because I can read stuff that's really interesting to me (because I can import articles and entire ebooks). This video explains how it works: https://www.youtube.com/watch?v=QeqFrO1sbTM

'(works best with Google Chrome)'

  Wtf. I thought we were beyond that now.

A website will always work best with the browser it was created with.

Feel free to email them and offer them money to improve their free service.


Seems like the paragraph matching is still pretty buggy.

For example I chose French as the language to get better at and selected "Tour du monde en 80 jours" as the text. The very first paragraph of the French text is

> En l'année 1872, la maison portant le numéro 7 de Saville-row, Burlington Gardens -- maison dans laquelle Sheridan mourut en 1814 --, était habitée par Phileas Fogg, esq., l'un des membres les plus singuliers et les plus remarqués du Reform-Club de Londres, bien qu'il semblât prendre à tâche de ne rien faire qui pût attirer l'attention.

and the English translation is

> Mr. Phileas Fogg lived, in 1872, at No. 7, Saville Row, Burlington Gardens, the house in which Sheridan died in 1814. He was one of the most noticeable members of the Reform Club, though he seemed always to avoid attracting attention; an enigmatical personage, about whom little was known, except that he was a polished man of the world.

Notice how there's an extra sentence at the end about him having an enigmatic personage not present in the French version. Indeed that's in the next paragraph. And the matching goes basically out of whack. The second paragraph of the English is

> People said that he resembled Byron--at least that his head was Byronic; but he was a bearded, tranquil Byron, who might live on a thousand years without growing old.

But the third paragraph of the French is

> On disait qu'il ressemblait à Byron -- par la tête, car il était irréprochable quant aux pieds --, mais un Byron à moustaches et à favoris, un Byron impassible, qui aurait vécu mille ans sans vieillir. Anglais, à coup sûr, Phileas Fogg n'était peut-être pas Londonner.

They don't tell where they took the translations from and who created these translations. Given your examples, I assume the whole thing is based on Gutenberg or some similar, not on machine translations. If it were humans who made these translations, deviations as you describe them are to be expected. Literary translations ofter are rather some kind of re-creation or interpretation of the original work.

Yes, the problem is most translations aren’t literal one-to-one translations. Literal translations are usually hard to read so some translators use “dynamic equivalency” while others paraphrase heavily. Unfortunately this can make machine linguistic matching difficult and unreliable. For instance, there are two translations of the Nordic classic Kristin Lavransdatter which both read differently and there are people who will argue passionately about which translation is best.

Statistical machine translation uses curated parallel texts for training, but they tend to match with multiple corpora so the translation is some sort of average I believe. I wonder if matching with just one translation might produce less reliable results?

Yeah, we took those translations off of a pre matched corpora of books. We did, however develop later a system for automatic matching of translations, but unfortunately didn't get to use it

>Notice how there's an extra sentence at the end about him having an enigmatic personage not present in the French version.

There's no real "paragraph matching". They just took old texts translated in 2 languages and assumed they match paragraph for paragraph.

We do have a tool that does real paragraph matching based on the Gale–Church alignment algorithm and offline dictionaries. On top of that we have an additional manual process to make sure that the alignment is correct. However many of the books were pre matched and we didn't align them using our tool.

It would work better on poetry, where skilled translators themselves are poets and do seek to match up the verses; in parallel texts I own.

Same for the first paragraph of the French/English Robinson Crusoe


Really interesting idea but a bit disappointing first experience.

I found the same issue with the same book for English/Spanish.

Connecting Paralleltext to OPUS [0] might be nice. OPUS is a collection of translated texts from the web.

[0] http://opus.nlpl.eu/

Sample: http://opus.nlpl.eu/EUbookshop/de-en_sample.html

Interesting. There's a longstanding dead-tree series with the same idea, published by Penguin: https://www.penguinrandomhouse.com/series/BMH/penguin-parall... (Possibly other publishers, too, but this is the one I know of.)

You get the same collection of short stories in two languages (one always being English) on facing pages. One important difference these books have with that website is that (I believe) in all cases, it's the English that's the translation. This is, I think, best for a native English speaker who's learning the other language. The non-English is not simply correct, but actually idiomatic. Any deficiencies in the English are easy to gloss over/ignore.

I started out restricting myself to original works with English translations, but it turned out the most important quality was that it be fun to read and at the right level for me. This was easier to find with translations of English books for children/teens that I knew were fun, like Harry Potter and Roald Dahl.

Someone further along can shift their mix back.

So if my language (Ukrainian in my case) is not in the "I speak" list, there's nothing I can do except for closing the browser tab. It would be much better if I could contribute something instead, or at least indicate interest in my language if the language is missing from the list.

Oh the poor people who try to learn German by reading Kafka. I'm not sure literature in general is suitable for learning a language that you don't already have a tight grip on. Maybe children's books, but even then it's too far removed from spoken language for some languages.

I've found the concept of extensive reading with a 98% threshold (that is, you must be understand 98% of words on a page) to be pretty effective in picking up foreign languages. Less than that and I give up. You're right, starting to learn German through Kafka is going to lead to bad times. I tried that with Mandarin and a translation of an Orhan Pamuk novel and gave up some twenty pages in after a brutal slog.

Having some simpler and easier selections ala Lingua Latina (https://lingualatina.dk/wp/) would definitely help. If you haven't heard of it, that is a book for learning Latin Wooten entirely in Latin. It starts off with very simple grammar and vocabulary and adds more each chapter. You can find digital copies pretty easily.

Like the concept as I am actively learning Dutch right now but I find it hard to copy a word without accidentally clicking on the sentence to play it out.

I want to copy words to keep my own word list and to translate the word.

I looked at Robinson Crusoe, the Dutch looks really old. Spelling is incorrect (according to modern standards), you'll talk a bit like a Dutch Shakespeare if you'll learn it this way.

Well as someone else commented, the source material is likely from Project Gutenberg, so a long out of copyright translation is perhaps expected.

On the other hand, Cervantes and Shakespeare were contemporaries, so I think it enhances reading Don Quixote for the translation to use early modern English - e.g. "Idle reader: thou mayest believe me without any oath".

Yeah...binding double-click to select the word only would be a good improvement.

Don't fiddel with text selection, at all.

The use case here might be exceptional, in contrast to contextual ad overlays.

This is very neat. I’d love to see Hebrew and Mandarin added, though (each is likely to be challenging in its own way given the way sentences are matched).

Agree on Chinese, it makes sense also add Japanese and Korean as well.

I tried making a beta version of Pingtype (Chinese/English parallel) for Japanese, but haven't figured out word spacing yet. Could you help?

The most popular tokenizer for Japanese is probably MeCab. It can also give you readings, but in cases where the reading depends on the meaning of the sentence, it's about as likely to get it wrong as to get right. My current approach is to also use Kuromoji and compare the output so that I can at least notice which parts might be inaccurate.

I used to run Google Translate on news sites and hover over sentences to see the translations, but this UX is better.

Are the voices backwards for anyone else? English sentences are read (unintelligibly) in a French voice and vice versa. http://paralleltext.io/read/?b=f727d42f-7649-49b0-b583-99bb9...

Seems like you found a bug, works as expected for the other language combinations, we'll fix it a.s.a.p. Thanks!

On Firefox (locale is English): I speak Whatever, I want read Whatever else. Choose any book. Voice is English. The only exception is if I choose I want to read Spanish, but this is because I have installed a particular Spanish voice in my system, and that's the one that's used.

On Vivaldi (browser's locale is Spanish): Anything is read with a Spanish voice, unless I choose I want to read English, then an English voice is used.

Same with this: http://paralleltext.io/read/?b=02b82ad6-29ed-47dd-98af-4c4ff...

Unintelligible German spoken by an English voice. My German-speaking friend thought it was trying to speak Chinese!

The German text here is also read by an english voice - http://paralleltext.io/read/?b=de-en-the-picture-of-dorian-g...

Also, in the german text the chapter markers ("Erstes Kapitel", "Zweites Kapitel") have been pulled into the paragraph following them, but they're missing completely in the english text.

The Dutch Version of Tom Sawyer is read by an English voice.

Yeah I had the same issue. If my device locale is English the voice reads it like an apathetic 10th grader who refuses to try the accentuate properly while read French aloud in class. On the bright side it was a trip down memory lane to high school.

It's not even backwards. The voice doesn't match the languages chosen. On a machine with Danish locale it will read the french text with a danish voice. (I chose "learn french/know english")

There's definitely something strange with that text.

Maybe it's individual metadata for Sherlock Holmes - when I tried Spanish to English worked on a different book it worked ok.

I did something similar for Chinese! Mandarin (Traditional & Simplified), Taiwanese, Cantonese, and Hakka.


Sorry that the user interface is so bad, please just try clicking some items in the heading row to see what's there.

I'm reminded of a book on topology I read which suggested learning how to read mathematical text in French with a particular book by -- I think it was Bourbaki, "using the English translation as a pony". I've never tried it, but supposedly you could get pretty far this way.

I like the idea. Only the (very few) available books require even in my own language a pretty advanced level. I would use this for sure if the reading level of the books available would be more for children so I can learn words and sentences that I can actually use and practice in daily life.

Because of legal reasons we can only have books from the public domain. If you know some that are free, just point me to them and I'll match them for you. I need them to be in .epub format for both languages.

I took a quick look at English / Swedish, and noticed that the Swedish version of Three Men in a Boat is badly encoded: the (vital) Swedish letter å has been consistently replaced with a. On the other hand, ä and ö seem to be present as expected.

So (based on a sample size of one), I'm afraid it doesn't look like the translations are necessarily reliable.

I've previously struggled with the encoding of books on Project Gutenberg. In theory each text file has a header that specifies the encoding and other data about the book like author and title. In practice the header format varies unpredictably (likely it was written manually and not meant for machine consumption) and even if you can parse the encoding value, you might have one that says ASCII but is actually some Windows code page, or UTF-8, or something completely different. And that encoding is also used in other parts of the header, so it can happen that you fail to parse the header because you don't know which encoding to use. One book actually used different encodings in different parts.

After collecting a messy tangle of special cases over several days, I threw in the towel and just used BeautifulSoup's UnicodeDammit to brute-force a working encoding.

Maybe the Swedish book didn't indicate a vs å in the first place, though.

Not sure this one came from Gutenberg; at least, I can't seem to find a Swedish version of it there. But anyhow, either the source is worthless, or some conversion process somewhere was bad. A vs Å is not a minor or optional issue in Swedish; å, ä and ö are separate letters of the alphabet in their own right, and "omitting accents" isn't a thing. It's just wrong.

(E.g. see the Swedish alphabet at http://omniglot.com/writing/swedish.htm)

> I threw in the towel and just used BeautifulSoup's UnicodeDammit to brute-force a working encoding.

For a limited definition of "working", at times! "A working encoding" that shows the wrong letters isn't "working" in a very useful sense.

I had the same thought. Some text is hard to read as well. For example:

-- Harris sade, att han drabbades av sa utomordentligt starka yrselanfall emellanat, att han knappt visste vad han gjorde; och da sade George att han led av starka yrselanfall, och knappt visste vad han gjorde.

This made very little sense to me, until I googled it and found another source where the second sentence italicized some words in the second half, making it at least make some sense. (https://sv.wikisource.org/wiki/Tre_m%C3%A4n_i_en_b%C3%A5t._K...)

Nice idea. But instead of paragraphs it would be better if there was an option to select lines instead. A beginner has only the processing power of few words and paragraphs sometimes can be very long.

I firmly believe it's important to learn a spoken language before learning the written language, assuming your goal is to become fluent.

Learning through translation is also an anti-pattern. Best to learn the basics of the target language then bootstrap your understanding of the more complicated bits by breaking them down to the basics.

I can understand the time benefit to learning through reading and translation, but I can't help but think it's a path to mediocre reading skills and poor speaking skills at best.

Sometimes you might only learn reading/writing and have no need to speak that language, so, I think it can be OK to only learn written language.

(And some kinds of languages are not speech anyways)

For audio practice: There is also a parallel text for YouTube I found helpful: https://www.fluentu.com

Also I found https://www.lingq.com pretty solid for learning vocab while reading.

Split view, Spanish English, Franz Kafka son turns into bug story, page 32: text ends, or english text is longer than usual on the page:


this is pretty cool way to learn. I learned English by memorizing Bambi as a kid.

My only complaint is that the Spanish AI lady seems to have suffered a stroke, see below:


I liked Readlang's approach where you click words or expressions that you don't understand and you get the translation. It's also added to your personal dictionary so you can create cards for spaced repetition training.

Cool idea! I love anything that helps to learn/practice foreign languages. I only wish it would work better on mobile safari (swipe is clunky, works only sometimes, and everything I try to do activates the TTS).

I tried out the english->hungarian and it's pretty hilarious, because hungarian version sound so german that even I can't understand what it says without the text. :D

Does the text-to-speech feature not work on desktops? I'm using Chromium on Linux, and I don't hear anything when I click on sentences. TTS works on Android though.

This is cool, but please improve your text to speech. check out things like http://mary.dfki.de

I'd completely disable the audio -- for Spanish, at least, it's horrible. I like the concept though and think being able to switch to the language is nice.

Great idea!

But being native Spanish speaker... No Spanish as source language but many texts with Spanish as language to learn?

Shouldn't be the same texts but viceversa?

I'd love to have something like this but with texts available in varied areas, eg: programming (such as Haskell learning).

Well done! Would love to have audio for pronunciation though.

Maybe the audio can be initiated with a quick highlight on the text or sentence?

This is really nice! It would be really handy if you can shortlist the words you want to revisit by a click.

Great idea, but the english->polish book is wrong on most paragraphs unfortunately

Any plan to include Semitic languages like Arabic and Hebrew? That would be great.

Swipe on paragraph to see translation doesn’t work for me on safari :(

Nice idea, but it seems like some of the translations are directly from google translate, and thus in parts very bad and misleading.

Neat idea. I would add ability to build own dictionary.

Good idea, thanks!

I'd love if you'd have something in Turkish.

Great job!

Similar idea to beelingual app.

This super cool! Congrats!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact