
Paralleltext: Learn languages by reading - monort
http://paralleltext.io/
======
tmountain
This is really cool. As an upper-intermediate Spanish student/speaker, I spend
about 1/3 of my language study time reading Spanish books, transcripts, blogs,
etc, so this seems like a nice tool. One thing I'd recommend for anyone hoping
to make reading a part of their studies would be to get a kindle with a touch
screen (paperwhite or otherwise). From there, you can install an English <->
$TARGET_LANG dictionary, and just touch the words that you don't know for
instant translations. I think a good target is text where you already know
about 90% or more of the words. Otherwise, things get pretty cumbersome
(translating too much stuff makes you lose context). Lastly, you can dump the
words from your Kindle into an Anki deck or other flashcard program for
independent study. This has really helped boost my vocabulary. Lastly, for
anyone else learning Spanish, I'd recommend News in Slow Spanish for lower
intermediate level, and then I'd suggest Radio Ambulante (with the
transcripts) for upper intermediate/advanced level. You can do a "send to
kindle" from your browser on any Radio Ambulante transcript on their website,
and it renders perfectly for offline reading.

~~~
adminu
Do you have any sources in particular for a beginner in Spanish? Which blogs
do you recommend do you have anything on technology? Which news?

I know [http://lite.cnn.io/en](http://lite.cnn.io/en) which has all the
articles in Spanish as well. Really useful but sometimes a bit to cumbersome
to read for my level.

~~~
wild_preference
[http://lite.cnn.io/es](http://lite.cnn.io/es) times out for me.

~~~
johnsonjo
The previous link ended in ‘en’ not ‘es’.

Edit: sorry for not checking the site. I now see the site links to the es
ending url and that times out for me as well.

------
supernes
Really cool concept, love the split view. The voiceover feature is nice, but I
think an inline dictionary would be even more useful - maybe something along
the lines of macOS' built-in lookup
([https://i.imgur.com/qGTQ80a.png](https://i.imgur.com/qGTQ80a.png))

~~~
O1111OOO
> I think an inline dictionary would be even more useful

Yes this. It would make a huge difference for me at my level. I'm finding that
I lose my focus on specific words. I can gather the general meaning from the
sentence and/or paragraph but...

Translating the entire paragraph is too much of a crutch and forces me back
into thinking in English. If I can get past the single word(s) with a click, I
can keep my thoughts entirely in the second language. I suppose an extension
could probably fix this...

~~~
danohuiginn
Readlang is the extension you're looking for:
[https://readlang.com/](https://readlang.com/)

I've given it a quick try, and it seems to work fine with this site

------
unhammer
Neat! But not yet for beginners – my choices for eng→rus are Tolstoy and
Tolstoy :-S

Also, the "I speak" list needs to be checkboxes, not radio (and remember the
choices).

~~~
Jim_Heckler
This is what I've been hoping to try to read eventually
[https://web.archive.org/web/20060629211922/http://www.shnare...](https://web.archive.org/web/20060629211922/http://www.shnaresys.com/roadside/picnic/parallel.htm)

The book is the basis of the film Stalker, which is a must-see

------
pps
Lingq.com is a mature platform to learn languages by reading and listening.
The only app I'm not abandoning after few days, because I can read stuff
that's really interesting to me (because I can import articles and entire
ebooks). This video explains how it works:
[https://www.youtube.com/watch?v=QeqFrO1sbTM](https://www.youtube.com/watch?v=QeqFrO1sbTM)

------
TheGrassyKnoll
'(works best with Google Chrome)'

    
    
      Wtf. I thought we were beyond that now.

~~~
wild_preference
Feel free to email them and offer them money to improve their free service.

~~~
NPMaxwell
:)

------
kccqzy
Seems like the paragraph matching is still pretty buggy.

For example I chose French as the language to get better at and selected "Tour
du monde en 80 jours" as the text. The very first paragraph of the French text
is

> En l'année 1872, la maison portant le numéro 7 de Saville-row, Burlington
> Gardens -- maison dans laquelle Sheridan mourut en 1814 --, était habitée
> par Phileas Fogg, esq., l'un des membres les plus singuliers et les plus
> remarqués du Reform-Club de Londres, bien qu'il semblât prendre à tâche de
> ne rien faire qui pût attirer l'attention.

and the English translation is

> Mr. Phileas Fogg lived, in 1872, at No. 7, Saville Row, Burlington Gardens,
> the house in which Sheridan died in 1814. He was one of the most noticeable
> members of the Reform Club, though he seemed always to avoid attracting
> attention; an enigmatical personage, about whom little was known, except
> that he was a polished man of the world.

Notice how there's an extra sentence at the end about him having an enigmatic
personage not present in the French version. Indeed that's in the next
paragraph. And the matching goes basically out of whack. The second paragraph
of the English is

> People said that he resembled Byron--at least that his head was Byronic; but
> he was a bearded, tranquil Byron, who might live on a thousand years without
> growing old.

But the third paragraph of the French is

> On disait qu'il ressemblait à Byron -- par la tête, car il était
> irréprochable quant aux pieds --, mais un Byron à moustaches et à favoris,
> un Byron impassible, qui aurait vécu mille ans sans vieillir. Anglais, à
> coup sûr, Phileas Fogg n'était peut-être pas Londonner.

~~~
stewbrew
They don't tell where they took the translations from and who created these
translations. Given your examples, I assume the whole thing is based on
Gutenberg or some similar, not on machine translations. If it were humans who
made these translations, deviations as you describe them are to be expected.
Literary translations ofter are rather some kind of re-creation or
interpretation of the original work.

~~~
wenc
Yes, the problem is most translations aren’t literal one-to-one translations.
Literal translations are usually hard to read so some translators use “dynamic
equivalency” while others paraphrase heavily. Unfortunately this can make
machine linguistic matching difficult and unreliable. For instance, there are
two translations of the Nordic classic Kristin Lavransdatter which both read
differently and there are people who will argue passionately about which
translation is best.

Statistical machine translation uses curated parallel texts for training, but
they tend to match with multiple corpora so the translation is some sort of
average I believe. I wonder if matching with just one translation might
produce less reliable results?

------
wumms
Connecting Paralleltext to OPUS [0] might be nice. OPUS is a collection of
translated texts from the web.

[0] [http://opus.nlpl.eu/](http://opus.nlpl.eu/)

Sample: [http://opus.nlpl.eu/EUbookshop/de-
en_sample.html](http://opus.nlpl.eu/EUbookshop/de-en_sample.html)

------
wool_gather
Interesting. There's a longstanding dead-tree series with the same idea,
published by Penguin: [https://www.penguinrandomhouse.com/series/BMH/penguin-
parall...](https://www.penguinrandomhouse.com/series/BMH/penguin-parallel-
text) (Possibly other publishers, too, but this is the one I know of.)

You get the same collection of short stories in two languages (one always
being English) on facing pages. One important difference these books have with
that website is that (I believe) in all cases, it's the _English_ that's the
translation. This is, I think, best for a native English speaker who's
learning the other language. The non-English is not simply correct, but
actually _idiomatic_. Any deficiencies in the English are easy to gloss
over/ignore.

~~~
abecedarius
I started out restricting myself to original works with English translations,
but it turned out the most important quality was that it be fun to read and at
the right level for me. This was easier to find with translations of English
books for children/teens that I knew were fun, like Harry Potter and Roald
Dahl.

Someone further along can shift their mix back.

------
kozak
So if my language (Ukrainian in my case) is not in the "I speak" list, there's
nothing I can do except for closing the browser tab. It would be much better
if I could contribute something instead, or at least indicate interest in my
language if the language is missing from the list.

------
bunkerbewohner
Oh the poor people who try to learn German by reading Kafka. I'm not sure
literature in general is suitable for learning a language that you don't
already have a tight grip on. Maybe children's books, but even then it's too
far removed from spoken language for some languages.

~~~
sarabande
I've found the concept of extensive reading with a 98% threshold (that is, you
must be understand 98% of words on a page) to be pretty effective in picking
up foreign languages. Less than that and I give up. You're right, starting to
learn German through Kafka is going to lead to bad times. I tried that with
Mandarin and a translation of an Orhan Pamuk novel and gave up some twenty
pages in after a brutal slog.

------
umarniz
Like the concept as I am actively learning Dutch right now but I find it hard
to copy a word without accidentally clicking on the sentence to play it out.

I want to copy words to keep my own word list and to translate the word.

~~~
mettamage
I looked at Robinson Crusoe, the Dutch looks really old. Spelling is incorrect
(according to modern standards), you'll talk a bit like a Dutch Shakespeare if
you'll learn it this way.

~~~
petecox
Well as someone else commented, the source material is likely from Project
Gutenberg, so a long out of copyright translation is perhaps expected.

On the other hand, Cervantes and Shakespeare were contemporaries, so I think
it enhances reading Don Quixote for the translation to use early modern
English - e.g. "Idle reader: _thou mayest_ believe me without any oath".

------
rcarmo
This is very neat. I’d love to see Hebrew and Mandarin added, though (each is
likely to be challenging in its own way given the way sentences are matched).

~~~
xvilka
Agree on Chinese, it makes sense also add Japanese and Korean as well.

~~~
peterburkimsher
I tried making a beta version of Pingtype (Chinese/English parallel) for
Japanese, but haven't figured out word spacing yet. Could you help?

~~~
yorwba
The most popular tokenizer for Japanese is probably MeCab. It can also give
you readings, but in cases where the reading depends on the meaning of the
sentence, it's about as likely to get it wrong as to get right. My current
approach is to also use Kuromoji and compare the output so that I can at least
notice which parts might be inaccurate.

------
xaybey
Are the voices backwards for anyone else? English sentences are read
(unintelligibly) in a French voice and vice versa.
[http://paralleltext.io/read/?b=f727d42f-7649-49b0-b583-99bb9...](http://paralleltext.io/read/?b=f727d42f-7649-49b0-b583-99bb9463c907)

~~~
calinf
Seems like you found a bug, works as expected for the other language
combinations, we'll fix it a.s.a.p. Thanks!

~~~
gnud
The German text here is also read by an english voice -
[http://paralleltext.io/read/?b=de-en-the-picture-of-
dorian-g...](http://paralleltext.io/read/?b=de-en-the-picture-of-dorian-gray)

~~~
gnud
Also, in the german text the chapter markers ("Erstes Kapitel", "Zweites
Kapitel") have been pulled into the paragraph following them, but they're
missing completely in the english text.

------
peterburkimsher
I did something similar for Chinese! Mandarin (Traditional & Simplified),
Taiwanese, Cantonese, and Hakka.

[https://pingtype.github.io](https://pingtype.github.io)

Sorry that the user interface is so bad, please just try clicking some items
in the heading row to see what's there.

------
bitwize
I'm reminded of a book on topology I read which suggested learning how to read
mathematical text in French with a particular book by -- I think it was
Bourbaki, "using the English translation as a pony". I've never tried it, but
supposedly you could get pretty far this way.

------
bespoken
I like the idea. Only the (very few) available books require even in my own
language a pretty advanced level. I would use this for sure if the reading
level of the books available would be more for children so I can learn words
and sentences that I can actually use and practice in daily life.

~~~
calinf
Because of legal reasons we can only have books from the public domain. If you
know some that are free, just point me to them and I'll match them for you. I
need them to be in .epub format for both languages.

------
jfk13
I took a quick look at English / Swedish, and noticed that the Swedish version
of Three Men in a Boat is badly encoded: the (vital) Swedish letter å has been
consistently replaced with a. On the other hand, ä and ö seem to be present as
expected.

So (based on a sample size of one), I'm afraid it doesn't look like the
translations are necessarily reliable.

~~~
yorwba
I've previously struggled with the encoding of books on Project Gutenberg. In
theory each text file has a header that specifies the encoding and other data
about the book like author and title. In practice the header format varies
unpredictably (likely it was written manually and not meant for machine
consumption) and even if you can parse the encoding value, you might have one
that says ASCII but is actually some Windows code page, or UTF-8, or something
completely different. And that encoding is also used in other parts of the
header, so it can happen that you fail to parse the header because you don't
know which encoding to use. One book actually used different encodings in
different parts.

After collecting a messy tangle of special cases over several days, I threw in
the towel and just used BeautifulSoup's UnicodeDammit to brute-force a working
encoding.

Maybe the Swedish book didn't indicate a vs å in the first place, though.

~~~
jfk13
Not sure this one came from Gutenberg; at least, I can't seem to find a
Swedish version of it there. But anyhow, either the source is worthless, or
some conversion process somewhere was bad. A vs Å is not a minor or optional
issue in Swedish; å, ä and ö are separate letters of the alphabet in their own
right, and "omitting accents" isn't a thing. It's just wrong.

(E.g. see the Swedish alphabet at
[http://omniglot.com/writing/swedish.htm](http://omniglot.com/writing/swedish.htm))

> I threw in the towel and just used BeautifulSoup's UnicodeDammit to brute-
> force a working encoding.

For a limited definition of "working", at times! "A working encoding" that
shows the wrong letters isn't "working" in a very useful sense.

------
superasn
Nice idea. But instead of paragraphs it would be better if there was an option
to select lines instead. A beginner has only the processing power of few words
and paragraphs sometimes can be very long.

------
zumu
I firmly believe it's important to learn a spoken language before learning the
written language, assuming your goal is to become fluent.

Learning through translation is also an anti-pattern. Best to learn the basics
of the target language then bootstrap your understanding of the more
complicated bits by breaking them down to the basics.

I can understand the time benefit to learning through reading and translation,
but I can't help but think it's a path to mediocre reading skills and poor
speaking skills at best.

~~~
zzo38computer
Sometimes you might only learn reading/writing and have no need to speak that
language, so, I think it can be OK to only learn written language.

(And some kinds of languages are not speech anyways)

------
therealdrag0
For audio practice: There is also a parallel text for YouTube I found helpful:
[https://www.fluentu.com](https://www.fluentu.com)

Also I found [https://www.lingq.com](https://www.lingq.com) pretty solid for
learning vocab while reading.

------
DoctorOetker
Split view, Spanish English, Franz Kafka son turns into bug story, page 32:
text ends, or english text is longer than usual on the page:

[http://paralleltext.io/read/?b=3b323333-ad58-4dba-9629-72851...](http://paralleltext.io/read/?b=3b323333-ad58-4dba-9629-72851a006fbf#31)

------
pwaai
this is pretty cool way to learn. I learned English by memorizing Bambi as a
kid.

My only complaint is that the Spanish AI lady seems to have suffered a stroke,
see below:

[http://paralleltext.io/read/?b=cac6e2c6-19fb-4c60-9fef-c4c39...](http://paralleltext.io/read/?b=cac6e2c6-19fb-4c60-9fef-c4c396bb99bb)

------
andrelaszlo
I liked Readlang's approach where you click words or expressions that you
don't understand and you get the translation. It's also added to your personal
dictionary so you can create cards for spaced repetition training.

------
htk
Cool idea! I love anything that helps to learn/practice foreign languages. I
only wish it would work better on mobile safari (swipe is clunky, works only
sometimes, and everything I try to do activates the TTS).

------
StrykerKKD
I tried out the english->hungarian and it's pretty hilarious, because
hungarian version sound so german that even I can't understand what it says
without the text. :D

------
lake99
Does the text-to-speech feature not work on desktops? I'm using Chromium on
Linux, and I don't hear anything when I click on sentences. TTS works on
Android though.

------
izabera
This is cool, but please improve your text to speech. check out things like
[http://mary.dfki.de](http://mary.dfki.de)

------
sarabande
I'd completely disable the audio -- for Spanish, at least, it's horrible. I
like the concept though and think being able to switch to the language is
nice.

------
sofuerman
Great idea!

But being native Spanish speaker... No Spanish as source language but many
texts with Spanish as language to learn?

Shouldn't be the same texts but viceversa?

------
sridca
I'd love to have something like this but with texts available in varied areas,
eg: programming (such as Haskell learning).

------
max_
Well done! Would love to have audio for pronunciation though.

Maybe the audio can be initiated with a quick highlight on the text or
sentence?

------
nasir
This is really nice! It would be really handy if you can shortlist the words
you want to revisit by a click.

------
discobean
Great idea, but the english->polish book is wrong on most paragraphs
unfortunately

------
nafizh
Any plan to include Semitic languages like Arabic and Hebrew? That would be
great.

------
nojvek
Swipe on paragraph to see translation doesn’t work for me on safari :(

------
Kkoala
Nice idea, but it seems like some of the translations are directly from google
translate, and thus in parts very bad and misleading.

------
machiaweliczny
Neat idea. I would add ability to build own dictionary.

~~~
calinf
Good idea, thanks!

------
zwaps
I'd love if you'd have something in Turkish.

Great job!

------
jcul
Similar idea to beelingual app.

------
colobas
This super cool! Congrats!

