
Do the English Use More Words Than the French? - deleterofworlds
http://filipwolanski.com/french-english-vocabulary/
======
murbard2
I'm a native French speaker and fluent in English (and of course, as luck
would have it, I'm going to stick grammar mistakes in this post)

Some have pointed out that French is almost always longer. It's a bit more
complicated than that. French uses a wider vocabulary than English, and uses
many different words to convey different connotations. Words, as a result,
tend to be longer, because they carry more information.

English tends to be much more modular and flexible. Nouns can be made into
adjectives, adverbs and verbs rather easily, and "prepositions" drastically
alter the meaning of verbs.

The end result is that English can be much shorter than French _when trying to
be concise_. A short UI message will always be much shorter in English than in
French. However, when conveying nuanced ideas, I believe they will be much
closer in length, with perhaps a small advantage for French.

~~~
Nursie
>> French uses a wider vocabulary than English

This seems an extraordinary claim, given that English is generally considered
to have the largest number of words of any modern language (mostly because we
just steal them).

~~~
pash
Word-use is roughly Zipf-distributed [0] in most all modern languages, so
there's not much difference in the number of commonly used words (as long as
you ignore basic differences like the existence of articles and the richness
of word inflection).

An analysis of English and French texts on Project Gutenburg shows that 93
words account for half of English usage, versus 89 for French [1]. English has
fewer words in the middle of the distribution (696 versus 795 at 70%) and in
the near tail (6,428 versus 9,050 at 90%, and 14,736 versus 21,231 at 95%),
but more in the extreme tail.

0\.
[https://en.m.wikipedia.org/wiki/Zipf%27s_law](https://en.m.wikipedia.org/wiki/Zipf%27s_law)

1\. [http://1.1o1.in/en/webtools/semantic-
depth](http://1.1o1.in/en/webtools/semantic-depth)

~~~
murbard2
> 14,736 versus 21,231 at 95%

So about 50% more words for French. The very extreme tail is less interesting
because it captures the size of the dictionary, and words very rarely used.

I think the exponent of the tail would be the most relevant metric, but I
can't open those pdf. Can someone plot the inverse CDF and make a log-log
plot?

------
S4M
I notice that four of the seven authors with the smallest vocabulary are
writing theater (Moliere, Shakespeare, Racine, Corneille). I think it's not a
coincidence, because writing in rhymes must be a hard constraint in the choice
of words one can use.

~~~
pash
A keen observation, and Wilde really makes it four and a half.

But Shakespeare wrote mostly blank verse, and rhyming is not as much of a
constraint on vocabulary in French due to the high commonality of word
endings. If it's true that works for the stage tend to use fewer words, it
might be because dialogue dominates; I imagine narrative, which consists more
of description, naturally draws on a bigger vocabulary than even very stylized
dialogue.

~~~
pothibo
How is french easier to rhyme than english? Do you have empirical evidence?

Since I'm a french canadian who is lucky enough to speak the two languages, I
find it difficult to believe.

Remember that rap (rhyme and poetry) was popularized by poor english speaking
people.

~~~
pash
For a discussion of the ease of rhyming in French versus English, see, e.g.,
this book [0]. Rhymes are widely considered to be easier to form in French;
partly as a result, however, classical French poetry developed several
additional stylistic constraints related to rhyming, such as the distinction
between "feminine" and "masculine" rhymes of words with and without a final
devoiced "e".

I wonder what you would accept as "empirical evidence" of the relative ease of
rhyming in different languages. The demand strikes me as typifying one of the
obnoxious impulses of the Hacker News community (another of them on display
elsewhere in this thread) to dress up a vapid criticism as scientific
skepticism.

You could, I suppose, try to do a combinatorial analysis of the ease of
rhyming by writing a program to find rhyming pairs in a phonetic dictionary;
just make sure your blog post about the results properly acknowledges the
shortcomings of your analysis in its title.

0\.
[https://books.google.com/books?id=ByVoAgAAQBAJ&lpg=PA60&ots=...](https://books.google.com/books?id=ByVoAgAAQBAJ&lpg=PA60&ots=5MIVrQ-
Dtw&dq=ease%20of%20rhyming%20in%20french%20and%20english&pg=PA60#v=onepage&q&f=false)

~~~
pothibo
So basically you make a claim that french is easier to rhyme than english,
then I ask you if you have any evidence and you retorqued with a condescending
answer adding a link to a book you took 5 second to google and that you didn't
read.

You didn't read it because I just read it and while the first sentence would
make it appear to back your point, it doesn't. First because he doesn't
compare french to english but merely explain how the main accent is put on the
suffix of words and that, in turn, makes it easy in french to build rhyme.

And if you continue to the next page, he explains how french rhyme are not on
the same level as their english counterpart "because most English rhymes
involve the root rather than the inflectional ending or suffix [...]"

Bottom line, you are wrong, and you try to pass as knowledgeable. You are a
fraud.

------
beloch
The findings of this study are unrelated to the title of this post.

First, the authors and books were not chosen randomly and the sample size is
tiny, so the results are meaningless, although I do respect the effort that
was put into this. The author is up-front and honest about this in the
beginning, but unfortunately proceeds to draw unwarranted generalizations
based on his study.

Second, even if a similar, but much larger, study were conducted on randomly
selected authors and books, the results would still conflate the vocabulary
size in common usage with the vocabulary size of single authors. Hence, what
the article is attempting to study is not the same thing as the title of this
post.

~~~
pash
I think the most unwarranted generalization is yours. The author clearly did
not mean to do what you criticize him for failing to do.

But I suppose we could ask the mods to change the submission title. How about
this?—"One blogger's analysis of the size of the vocabularies employed by some
of his favorite French- and English-language writers, neither claiming nor
achieving statistically significant insights into the question of whether
French- (on the one hand) or English-speaking peoples (on the other) use more
words in common usage."

~~~
dsp1234
Or maybe:

"Do the English Use More Words Than the French? I don't know, and what I did
to test it doesn't tell me anything concrete"

~~~
pash
Or: "Once again, the intent of this blog post is clear from its opening
paragraphs (some readers' abiding dissatisfaction with its title
notwithstanding), aforementioned intent being not to analyze patterns of
general usage of the French and English languages in a statistically rigorous
way but to offer an entertaining and well presented disquisition on some
really quite insignificant but nonetheless amusing points related to the
vocabularies of selected French and English writers of the author's own
choosing (aforementioned dissatisfied readers nevertheless succeeding in
cluttering up the comments on Hacker News with their whinging)"?

------
omonra
Wouldn't analysis of top 3 contemporary newspapers / magazines in each
language yield a more accurate result?

Ie if Dickens or Melville used certain words in their writing - if modern
speakers don't know them, we can't really use them to gauge how many words
English speakers use _today_.

I, for one, simply can't read Shakespeare - most of his vocabulary (anybody
knows what a 'bodkin' is?) sounds foreign to me. And I scored 99 percentile on
the verbal SAT.

~~~
pcrh
Most of Shakespeare's language is accessible to English speakers in England,
Ireland, Australia, etc. There might be a few word here or there that are not
widely understood ("bodkin" is not one of these, its meaning is generally
known in the UK).

However, American English, especially as is daily spoken in the more
ethnically diverse areas of the US has a much reduced vocabulary due to the
need to accommodate non-native speakers.

For an example of where Shakespeare's language become confusing consider

"Romeo, Romeo, wherefore art thou Romeo."

Modern speakers of English most often interpret "wherefore to mean "where",
when it means "why".

~~~
omonra
Seriously? What % of the English do you think will understand this (random
start of first Shakespeare play that popped into my head):

If music be the food of love, play on;

Give me excess of it, that, surfeiting,

The appetite may sicken, and so die.

That strain again! it had a dying fall:

O, it came o'er my ear like the sweet sound,

That breathes upon a bank of violets,

Stealing and giving odour! Enough; no more:

'Tis not so sweet now as it was before.

O spirit of love! how quick and fresh art thou,

That, notwithstanding thy capacity

Receiveth as the sea, nought enters there,

Of what validity and pitch soe'er,

But falls into abatement and low price,

Even in a minute: so full of shapes is fancy

That it alone is high fantastical.

~~~
pcrh
Which _words_ in that passage are not easily understandable? The most
"obscure" one that I can identify is "notwithstanding" (Google gives
48,000,000 results for "notwithstanding", so it is not exactly an obscure
word).

A failure to understand that passage is a failure of comprehension, not of
vocabulary.

~~~
omonra
You're talking about something else - individual words (at least in this
passage) are certainly known to everybody.

But what % of English speakers do you think can understand the passage?

~~~
pcrh
To clarify: what is it that _you_ do not understand about the posted passage?

~~~
omonra
Honestly - I don't understand anything. Imagine you had an English text with
half the words substituted with Dutch ones (that look similar but make no
sense).

So you might think you understand 2-3 words in a row only to come across the
fourth word that seems key for the line but means nothing to you in the given
context.

~~~
pcrh
What makes it harder to understand is not the vocabulary, but the grammar, as
well as Shakespeare's sometimes idiosyncratic use of word meaning. For
example, "strain" here means a portion of music, which while not its most
common meaning, is certainly within the abilities of most educated native
speakers of English to understand, at least from context.

~~~
omonra
That's exactly what I mean - words that we otherwise know are used with
completely different meaning, which renders the text unintelligible.

I disagree with you completely that 'most educated native speakers' can
understand this. By the same token one can say that most native English
speakers can solve differential equations :)

The definition of strain you mention is #30 in the dictionary -
[http://dictionary.reference.com/browse/strain](http://dictionary.reference.com/browse/strain).
Can you cite its use in that meaning in a contemporary newspaper or magazine?

~~~
pcrh
As to usage of "strain" in music, a quick search lead me to this article from
the US dated 16 Feb 2015 [1] "When the trumpets blare and the snares rattle
during the opening strain of John Phillip Souza’s The Washington Post
March..."

I not proposing that Shakespeare is the easiest thing to read, but merely
suggesting that it is not as inaccessible as it might first appear.

[1][http://www.miamiherald.com/news/local/community/miami-
dade/h...](http://www.miamiherald.com/news/local/community/miami-
dade/homestead/article10468706.html)

------
euphemize
If anyone is interested in anecdotal data [0][1] regarding sentence length for
equivalent meaning, I have to use both French and English every day,
alternating between them (spoken and written) all the time. English is almost
always shorter than French, which is more verbose. I am used to "thinking" in
whatever language I'm using to talk/write and have noticed many times that I'm
a bit slower in French, even if it's technically my native language. It might
also be that French seems to have more alternatives to say the same thing in
different ways, but that could just be just me.

[0] C'est presque toujours plus long en français qu'en anglais. [1] It's
almost always shorter in English than in French.

~~~
pavlov
I think French spelling plays a part here, though. If I were to write your
example phrase in French written as pronounced using Finnish language spelling
rules, it would look like this:

"Se preskö tuzuur ply lon on franse kon angle."

This "phonetic compression" made the printed sentence about 25% shorter.

------
oelmekki
Being a french that always do EN and FR multi locale websites, I'm quite
surprise of this question :) The french text is almost always longer than
english one. Not sure if it's about word count, though.

Also, something very common for french people when learning english is to try
to translate their french sentence exactly. It always seems over-complicated
in english, and quite pedantic. With time, we learn to stop trying using the
same words and try to express it directly in english. We realize at this point
english sentences are always way more simple. But here again, vocabulary
complexity says nothing about word count.

------
pcrh
That's a surprising result.

My previous view was that English vocabulary is based on Germanic as well as
Latin roots, and so there there are often two terms with the same meaning, one
which is subtly different in usage than the other. In contrast, French
vocabulary is mostly Latin in origin.

~~~
DaveMebs
Just because more words exist, does not mean the authors make use of these
words. I think the author would agree that English has more words - he cites
the dictionary sizes skewing heavily in favor of English.

With that said, I am also surprised. I would like to see a similar analysis
using more modern works, as well as a discussion around how texts are
selected. It seems like the selection of authors and their novels will
significantly impact the results.

FWIW - Both English and French stem from the same original language (Proto-
Indo-European). You are correct that English is a Germanic language (this does
not mean it stems from German), and the French is a Romance language - with
Romance being used in the sense of "came from Rome," not "sounds sexy." If you
are interested in more details here, I highly recommend "The History of
English" podcast:
[http://historyofenglishpodcast.com/](http://historyofenglishpodcast.com/) I'm
still working my way through it, but it does a great job explaining the
history of the English language, how it evolved, how it relates to other
languages, etc.

English contains words from many other languages. It is not just Latin words,
but many more recent languages (such as French) as well.

~~~
blumkvist
Wow! That podcast seems awesome. Thanks a lot. This will be my bed time
listening for a while.

------
yellowapple
I'm just going to assume Betteridge's law of headlines is in effect. If the
answer to this was "yes", it would be phrased in a declarative rather than
interrogative manner.

------
tomcam
A model of presentation of data. Its clarity belies the variety of techniques
used. Perfect use of word clouds. Thank you.

~~~
danso
I have to strongly, _strongly_ disagree. There is no "perfect use" of word
clouds in the context of proper data visualization. The arrangement and
colorization of the words are virtually on account of randomization (and the
sizing of words based on frequency is of dubious value, given how difficult it
is for the human eye to evaluate the size comparison)...this randomness is
anathema to the proper presentation of data... the variability in
interpretation is vast, and depends on each viewer's ability to see colors and
read vertically.

~~~
pcrh
Generally, I find word clouds be be pretty useless anyway. They visually
highlight the top few words (4 or 5 at most), but beyond that are obscure and
uninformative.

------
xacaxulu
Oui.

