Hacker News new | past | comments | ask | show | jobs | submit login
Do the English Use More Words Than the French? (filipwolanski.com)
34 points by deleterofworlds on Feb 20, 2015 | hide | past | favorite | 56 comments



I'm a native French speaker and fluent in English (and of course, as luck would have it, I'm going to stick grammar mistakes in this post)

Some have pointed out that French is almost always longer. It's a bit more complicated than that. French uses a wider vocabulary than English, and uses many different words to convey different connotations. Words, as a result, tend to be longer, because they carry more information.

English tends to be much more modular and flexible. Nouns can be made into adjectives, adverbs and verbs rather easily, and "prepositions" drastically alter the meaning of verbs.

The end result is that English can be much shorter than French when trying to be concise. A short UI message will always be much shorter in English than in French. However, when conveying nuanced ideas, I believe they will be much closer in length, with perhaps a small advantage for French.


I'm also a native French speaker, but I'm bilingual ( British mother ).

French definitely neither has nor uses a wider vocabulary than English, if only because English keeps growing while modern French stopped evolving a while ago. The only kind of French which does evolve is spoken French, which bears little resemblance to written French nowadays.

Now, for some reason, there's a widely propagated myth in France which is that English in particular is a <em> very poor </em> language : English is supposed to have a very limited vocabulary, and no possibility to express subtle ideas accurately. I have no idea where this myth comes from - maybe the old French-English rivalry - , but I guess it is the source of your belief.


What do you make of the statistics quoted below which shows that, at the 95th percentile, French uses 50% more words than English?

Compare http://www.eupedia.com/europe/missing_words_english.shtml and http://www.eupedia.com/europe/missing_words_french.shtml I see many errors in the second list, do you see many in the first?

I do not consider English a poor language at all. I find it extremely flexible and expressive, whereas French just comes with a very large standard library.


>> French uses a wider vocabulary than English

This seems an extraordinary claim, given that English is generally considered to have the largest number of words of any modern language (mostly because we just steal them).


Word-use is roughly Zipf-distributed [0] in most all modern languages, so there's not much difference in the number of commonly used words (as long as you ignore basic differences like the existence of articles and the richness of word inflection).

An analysis of English and French texts on Project Gutenburg shows that 93 words account for half of English usage, versus 89 for French [1]. English has fewer words in the middle of the distribution (696 versus 795 at 70%) and in the near tail (6,428 versus 9,050 at 90%, and 14,736 versus 21,231 at 95%), but more in the extreme tail.

0. https://en.m.wikipedia.org/wiki/Zipf%27s_law

1. http://1.1o1.in/en/webtools/semantic-depth


> 14,736 versus 21,231 at 95%

So about 50% more words for French. The very extreme tail is less interesting because it captures the size of the dictionary, and words very rarely used.

I think the exponent of the tail would be the most relevant metric, but I can't open those pdf. Can someone plot the inverse CDF and make a log-log plot?


>50th percentile of word use at 93 words for English and 89 for French [1].

If I understand the meaning to that phrase correctly, it seems to be an extraordinarily useless measure


I believe that's the difference between "uses a wider vocabulary" and "has a wider vocabulary".

I would think that literary French generally uses a wider vocabulary than English though, at least because repeating words in French is to be avoided at all costs, while it's more acceptable in English.


After a point, having more words in a dictionary does not necessarily result in the average person using more words in their communication. I would think that education/environment would have a bigger effect.

That said, I would interested if the grandparent could provide more info supporting their statement, as I don't know the truth one way or the other and would like to know more.


This is mostly based on my own observations, I don't have a very specific study on the matter.

Here's a data point though. Many French students learn in English class, as an anecdote, that "shallow" has no equivalent in French. If there were many other words like this, there would be little point in relating this anecdote. "Shallow" is one of the very few words in English with no French equivalent. French words with no English equivalent? I encounter them all the time.

There are different words for the mouth and leg of an animal versus human. (gueule et patte vs bouche et jambe). There's a common word for the lump of flours that form in dough (grumeau). There's a word for a sandwich with a slice of bread on only one side (tartine). There's a single word for brown sugar (cassonade). There's a different word for hair on your head and hair on your body (cheveux, poils), etc.

Here's a nice list http://www.eupedia.com/europe/missing_words_english.shtml

English words with no French equivalent? Shallow and Serendipity come to mind. I'm googling but I can't find many other at the moment, apart from very technical terms.


English is a bit more subtle. It has inherited a lot of French vocabulary from the Normans, as well as retaining Anglo-Saxon roots. So it often has more words for things other languages have fewer words for, with English having more shades of meaning.

Take happy for example: joyous, joyful, gay, merry, gleeful, etc. I counted some 50 words in the thesaurus under happy, but many of them only translate to joyeaux or gai according to my dictionary checks (although there are also plenty for French, not as many).

English tends to have a lot of words that are close synonyms but with different evocations. It's possible to converse English in language almost completely of French origin, or speak plainly in the common tongue with older words. The former sounds flowery; the latter tends to remind one of the Bible.



Many English dictionaries have more than 300k entries. The reason is that many of the words are close synonyms inherited from different sources. That's what I was trying to get at in my carefully constructed sentence.

IMO the synonyms you link to aren't as close as those at http://www.thesaurus.com/browse/happy - I believe there are further English translations of many of those on your page, but they take things in a different direction. Many of them relate to luck, for example.

I experimented with synonymo.fr a little further. It looks like many distinct English words have a single French word translation, and that French word has different meanings in different contexts. Consequently, looking up the synonyms for that word, you end up seeing synonyms for all the different contextual meanings for that word.

For example, take the word 'sky'. A quick scan seems to suggest French doesn't have distinctly separate words for sky vs heaven, translating both as ciel. Synonymo.fr suggests lots of synonyms, but they are mostly synonyms for heaven, not sky. It's like this with most French words I experimented with; same word, lots of different meanings, where English has separate words without the same degree of "meaning spread". That is, the English words are more precise and less ambiguous. And there are usually more of them, without wandering into different concepts.


> When conveying nuanced ideas, I believe they will be much closer in length, with perhaps a small advantage for French.

I've looked in bookstores at expert translations of French novels to English, and English novels to French, and in either case, the English is shorter. This seems contrary to your claim.


1) It could be the translator's choice to drop a connotation rather than go through a cumbersome periphrasis to express it faithfully.

2) English words tend to be much shorter, which is reflected in the book's length.


I notice that four of the seven authors with the smallest vocabulary are writing theater (Moliere, Shakespeare, Racine, Corneille). I think it's not a coincidence, because writing in rhymes must be a hard constraint in the choice of words one can use.


A keen observation, and Wilde really makes it four and a half.

But Shakespeare wrote mostly blank verse, and rhyming is not as much of a constraint on vocabulary in French due to the high commonality of word endings. If it's true that works for the stage tend to use fewer words, it might be because dialogue dominates; I imagine narrative, which consists more of description, naturally draws on a bigger vocabulary than even very stylized dialogue.


How is french easier to rhyme than english? Do you have empirical evidence?

Since I'm a french canadian who is lucky enough to speak the two languages, I find it difficult to believe.

Remember that rap (rhyme and poetry) was popularized by poor english speaking people.


For a discussion of the ease of rhyming in French versus English, see, e.g., this book [0]. Rhymes are widely considered to be easier to form in French; partly as a result, however, classical French poetry developed several additional stylistic constraints related to rhyming, such as the distinction between "feminine" and "masculine" rhymes of words with and without a final devoiced "e".

I wonder what you would accept as "empirical evidence" of the relative ease of rhyming in different languages. The demand strikes me as typifying one of the obnoxious impulses of the Hacker News community (another of them on display elsewhere in this thread) to dress up a vapid criticism as scientific skepticism.

You could, I suppose, try to do a combinatorial analysis of the ease of rhyming by writing a program to find rhyming pairs in a phonetic dictionary; just make sure your blog post about the results properly acknowledges the shortcomings of your analysis in its title.

0. https://books.google.com/books?id=ByVoAgAAQBAJ&lpg=PA60&ots=...


So basically you make a claim that french is easier to rhyme than english, then I ask you if you have any evidence and you retorqued with a condescending answer adding a link to a book you took 5 second to google and that you didn't read.

You didn't read it because I just read it and while the first sentence would make it appear to back your point, it doesn't. First because he doesn't compare french to english but merely explain how the main accent is put on the suffix of words and that, in turn, makes it easy in french to build rhyme.

And if you continue to the next page, he explains how french rhyme are not on the same level as their english counterpart "because most English rhymes involve the root rather than the inflectional ending or suffix [...]"

Bottom line, you are wrong, and you try to pass as knowledgeable. You are a fraud.


I'd be interested in philosophical tomes being analyzed. They seem to contain an oversized amount of low frequency words.


Perhaps the reason is not the necessity to rhyme, but to be popular; being prolix might be a deterrent to popularity.

Even if so, I find the conclusion unexpected, as Shakespeare not only had a wide popularity in his day, but also coined many new words into the English language.


Rhyming is part of the problem. Playwrights' work has to be remembered by the actors by heart.

Also, the ones you mentioned lived in the 16th century, when languages were not very sophisticated.


I'm not sure what you mean by "sophisticated". Languages in the 16th century were just as complex as they always have been.

The vocabularies might be bigger or smaller compared to today. Certainly no one was saying "bae" back then (for instance), but no one calls each other "thou" today, either.

EDIT: except for the older generations of Geordies, apparently.


What I mean is that in the 16th century there were much less words than today, or let's say the 19th century.


The findings of this study are unrelated to the title of this post.

First, the authors and books were not chosen randomly and the sample size is tiny, so the results are meaningless, although I do respect the effort that was put into this. The author is up-front and honest about this in the beginning, but unfortunately proceeds to draw unwarranted generalizations based on his study.

Second, even if a similar, but much larger, study were conducted on randomly selected authors and books, the results would still conflate the vocabulary size in common usage with the vocabulary size of single authors. Hence, what the article is attempting to study is not the same thing as the title of this post.


Sampling is a complex topic of its own, but sample size has far less to do with research validity than sample selection.

In the case of literature, random sampling is arguably far less valid than selected sampling of, say, top authors.

The interactions here are complex. There are a finite set of published authors, they're selected by editors, works are bought by the public. So there's both nonrandom and popular selection. Works themselves typically contain many words either from the popular argot or not (e.g., Joyce).

And if you're going to sample from large sets of works, say, newspaper and magazine articles, you'll run into issues such as style guides which often dictate terms that must (or must not) be used.

It's a difficult field to research.


I think the most unwarranted generalization is yours. The author clearly did not mean to do what you criticize him for failing to do.

But I suppose we could ask the mods to change the submission title. How about this?—"One blogger's analysis of the size of the vocabularies employed by some of his favorite French- and English-language writers, neither claiming nor achieving statistically significant insights into the question of whether French- (on the one hand) or English-speaking peoples (on the other) use more words in common usage."


Or maybe:

"Do the English Use More Words Than the French? I don't know, and what I did to test it doesn't tell me anything concrete"


Or: "Once again, the intent of this blog post is clear from its opening paragraphs (some readers' abiding dissatisfaction with its title notwithstanding), aforementioned intent being not to analyze patterns of general usage of the French and English languages in a statistically rigorous way but to offer an entertaining and well presented disquisition on some really quite insignificant but nonetheless amusing points related to the vocabularies of selected French and English writers of the author's own choosing (aforementioned dissatisfied readers nevertheless succeeding in cluttering up the comments on Hacker News with their whinging)"?


Wouldn't analysis of top 3 contemporary newspapers / magazines in each language yield a more accurate result?

Ie if Dickens or Melville used certain words in their writing - if modern speakers don't know them, we can't really use them to gauge how many words English speakers use today.

I, for one, simply can't read Shakespeare - most of his vocabulary (anybody knows what a 'bodkin' is?) sounds foreign to me. And I scored 99 percentile on the verbal SAT.


Most of Shakespeare's language is accessible to English speakers in England, Ireland, Australia, etc. There might be a few word here or there that are not widely understood ("bodkin" is not one of these, its meaning is generally known in the UK).

However, American English, especially as is daily spoken in the more ethnically diverse areas of the US has a much reduced vocabulary due to the need to accommodate non-native speakers.

For an example of where Shakespeare's language become confusing consider

"Romeo, Romeo, wherefore art thou Romeo."

Modern speakers of English most often interpret "wherefore to mean "where", when it means "why".


> There might be a few word here or there that are not widely understood ("bodkin" is not one of these, its meaning is generally known in the UK

This is wrong, most people in the UK don't know what a bodkin is. Many people would think it an item of clothing or something to do with alcohol.

About 16% of working age adults in the UK are functionally illiterate http://www.literacytrust.org.uk/adult_literacy/illiterate_ad... (have less literacy than is expected of an 11 year old child). http://www.theguardian.com/uk/2006/jan/24/books.politics

The average reading age in the UK is that of a 9 year old child. http://s-i-w.com/library/faqs/readability/what-average-readi... (although he doesn't give a source for this and it seems a bit low, even to me)


Seriously? What % of the English do you think will understand this (random start of first Shakespeare play that popped into my head):

If music be the food of love, play on;

Give me excess of it, that, surfeiting,

The appetite may sicken, and so die.

That strain again! it had a dying fall:

O, it came o'er my ear like the sweet sound,

That breathes upon a bank of violets,

Stealing and giving odour! Enough; no more:

'Tis not so sweet now as it was before.

O spirit of love! how quick and fresh art thou,

That, notwithstanding thy capacity

Receiveth as the sea, nought enters there,

Of what validity and pitch soe'er,

But falls into abatement and low price,

Even in a minute: so full of shapes is fancy

That it alone is high fantastical.


Which words in that passage are not easily understandable? The most "obscure" one that I can identify is "notwithstanding" (Google gives 48,000,000 results for "notwithstanding", so it is not exactly an obscure word).

A failure to understand that passage is a failure of comprehension, not of vocabulary.


You're talking about something else - individual words (at least in this passage) are certainly known to everybody.

But what % of English speakers do you think can understand the passage?


Understanding the English-speaking world's most famous poet and playwriter might be beyond the abilities of some people, that is for sure. It doesn't mean, though, that the language used is deliberately obscure.

I would estimate that most high school graduates in the UK could correctly parse the passage posted, given enough time.

Those with education specifically in English would parse it at first pass.


To clarify: what is it that you do not understand about the posted passage?


Honestly - I don't understand anything. Imagine you had an English text with half the words substituted with Dutch ones (that look similar but make no sense).

So you might think you understand 2-3 words in a row only to come across the fourth word that seems key for the line but means nothing to you in the given context.


What makes it harder to understand is not the vocabulary, but the grammar, as well as Shakespeare's sometimes idiosyncratic use of word meaning. For example, "strain" here means a portion of music, which while not its most common meaning, is certainly within the abilities of most educated native speakers of English to understand, at least from context.


That's exactly what I mean - words that we otherwise know are used with completely different meaning, which renders the text unintelligible.

I disagree with you completely that 'most educated native speakers' can understand this. By the same token one can say that most native English speakers can solve differential equations :)

The definition of strain you mention is #30 in the dictionary - http://dictionary.reference.com/browse/strain. Can you cite its use in that meaning in a contemporary newspaper or magazine?


As to usage of "strain" in music, a quick search lead me to this article from the US dated 16 Feb 2015 [1] "When the trumpets blare and the snares rattle during the opening strain of John Phillip Souza’s The Washington Post March..."

I not proposing that Shakespeare is the easiest thing to read, but merely suggesting that it is not as inaccessible as it might first appear.

[1]http://www.miamiherald.com/news/local/community/miami-dade/h...


By the way, I found the "Spark Notes" for "If Music be the food of love". It's a dreadful lump when the poetry is gone, but here it is:

"If it’s true that music makes people more in love, keep playing.

Give me too much of it,

so I’ll get sick of it and stop loving.

Play that part again! It sounded sad.

Oh, it sounded like a sweet breeze blowing

gently over a bank of violets,

taking their scent with it. That’s enough. Stop.

It doesn’t sound as sweet as it did before.

Oh, love is so restless!

That, despite it's ability to encompass

as much as the sea, is impenetrable,

to anything, cannot be bought,

yet becomes cheap.

Even if fleeting, though, love is so varied and rich

that it is the highest pinnacle of experience.


If anyone is interested in anecdotal data [0][1] regarding sentence length for equivalent meaning, I have to use both French and English every day, alternating between them (spoken and written) all the time. English is almost always shorter than French, which is more verbose. I am used to "thinking" in whatever language I'm using to talk/write and have noticed many times that I'm a bit slower in French, even if it's technically my native language. It might also be that French seems to have more alternatives to say the same thing in different ways, but that could just be just me.

[0] C'est presque toujours plus long en français qu'en anglais. [1] It's almost always shorter in English than in French.


I think French spelling plays a part here, though. If I were to write your example phrase in French written as pronounced using Finnish language spelling rules, it would look like this:

"Se preskö tuzuur ply lon on franse kon angle."

This "phonetic compression" made the printed sentence about 25% shorter.


I have the same experience in Portuguese (my native language). Romance languages tend to be more verbose (also anecdotal personal evidence). But, in a way, my feeling is that modern English has substantially more words than Portuguese – which, I suppose, makes sense. English is under a much greater creative pressure due to its wider audience (English as a global language).

Using your example sentence (armchair laboratory):

[0] É quase sempre mais curto em inglês do que em português. [1] It's almost always shorter in English than in Portuguese.

11 vs. 9 words.


Being a french that always do EN and FR multi locale websites, I'm quite surprise of this question :) The french text is almost always longer than english one. Not sure if it's about word count, though.

Also, something very common for french people when learning english is to try to translate their french sentence exactly. It always seems over-complicated in english, and quite pedantic. With time, we learn to stop trying using the same words and try to express it directly in english. We realize at this point english sentences are always way more simple. But here again, vocabulary complexity says nothing about word count.


That's a surprising result.

My previous view was that English vocabulary is based on Germanic as well as Latin roots, and so there there are often two terms with the same meaning, one which is subtly different in usage than the other. In contrast, French vocabulary is mostly Latin in origin.


Just because more words exist, does not mean the authors make use of these words. I think the author would agree that English has more words - he cites the dictionary sizes skewing heavily in favor of English.

With that said, I am also surprised. I would like to see a similar analysis using more modern works, as well as a discussion around how texts are selected. It seems like the selection of authors and their novels will significantly impact the results.

FWIW - Both English and French stem from the same original language (Proto-Indo-European). You are correct that English is a Germanic language (this does not mean it stems from German), and the French is a Romance language - with Romance being used in the sense of "came from Rome," not "sounds sexy." If you are interested in more details here, I highly recommend "The History of English" podcast: http://historyofenglishpodcast.com/ I'm still working my way through it, but it does a great job explaining the history of the English language, how it evolved, how it relates to other languages, etc.

English contains words from many other languages. It is not just Latin words, but many more recent languages (such as French) as well.


Wow! That podcast seems awesome. Thanks a lot. This will be my bed time listening for a while.


I'm just going to assume Betteridge's law of headlines is in effect. If the answer to this was "yes", it would be phrased in a declarative rather than interrogative manner.


A model of presentation of data. Its clarity belies the variety of techniques used. Perfect use of word clouds. Thank you.


I have to strongly, strongly disagree. There is no "perfect use" of word clouds in the context of proper data visualization. The arrangement and colorization of the words are virtually on account of randomization (and the sizing of words based on frequency is of dubious value, given how difficult it is for the human eye to evaluate the size comparison)...this randomness is anathema to the proper presentation of data... the variability in interpretation is vast, and depends on each viewer's ability to see colors and read vertically.


Generally, I find word clouds be be pretty useless anyway. They visually highlight the top few words (4 or 5 at most), but beyond that are obscure and uninformative.


What about monochrome, single word-orientation word clouds? Font size is one axis, and I guess the only remaining noise is word placement.


Oui.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: