I'm a native French speaker and fluent in English (and of course, as luck would have it, I'm going to stick grammar mistakes in this post)
Some have pointed out that French is almost always longer. It's a bit more complicated than that. French uses a wider vocabulary than English, and uses many different words to convey different connotations. Words, as a result, tend to be longer, because they carry more information.
English tends to be much more modular and flexible. Nouns can be made into adjectives, adverbs and verbs rather easily, and "prepositions" drastically alter the meaning of verbs.
The end result is that English can be much shorter than French when trying to be concise. A short UI message will always be much shorter in English than in French. However, when conveying nuanced ideas, I believe they will be much closer in length, with perhaps a small advantage for French.
I'm also a native French speaker, but I'm bilingual ( British mother ).
French definitely neither has nor uses a wider vocabulary than English, if only because English keeps growing while modern French stopped evolving a while ago. The only kind of French which does evolve is spoken French, which bears little resemblance to written French nowadays.
Now, for some reason, there's a widely propagated myth in France which is that English in particular is a <em> very poor </em> language : English is supposed to have a very limited vocabulary, and no possibility to express subtle ideas accurately. I have no idea where this myth comes from - maybe the old French-English rivalry - , but I guess it is the source of your belief.
I do not consider English a poor language at all. I find it extremely flexible and expressive, whereas French just comes with a very large standard library.
This seems an extraordinary claim, given that English is generally considered to have the largest number of words of any modern language (mostly because we just steal them).
Word-use is roughly Zipf-distributed [0] in most all modern languages, so there's not much difference in the number of commonly used words (as long as you ignore basic differences like the existence of articles and the richness of word inflection).
An analysis of English and French texts on Project Gutenburg shows that 93 words account for half of English usage, versus 89 for French [1]. English has fewer words in the middle of the distribution (696 versus 795 at 70%) and in the near tail (6,428 versus 9,050 at 90%, and 14,736 versus 21,231 at 95%), but more in the extreme tail.
So about 50% more words for French. The very extreme tail is less interesting because it captures the size of the dictionary, and words very rarely used.
I think the exponent of the tail would be the most relevant metric, but I can't open those pdf. Can someone plot the inverse CDF and make a log-log plot?
I believe that's the difference between "uses a wider vocabulary" and "has a wider vocabulary".
I would think that literary French generally uses a wider vocabulary than English though, at least because repeating words in French is to be avoided at all costs, while it's more acceptable in English.
After a point, having more words in a dictionary does not necessarily result in the average person using more words in their communication. I would think that education/environment would have a bigger effect.
That said, I would interested if the grandparent could provide more info supporting their statement, as I don't know the truth one way or the other and would like to know more.
This is mostly based on my own observations, I don't have a very specific study on the matter.
Here's a data point though. Many French students learn in English class, as an anecdote, that "shallow" has no equivalent in French. If there were many other words like this, there would be little point in relating this anecdote. "Shallow" is one of the very few words in English with no French equivalent. French words with no English equivalent? I encounter them all the time.
There are different words for the mouth and leg of an animal versus human. (gueule et patte vs bouche et jambe). There's a common word for the lump of flours that form in dough (grumeau). There's a word for a sandwich with a slice of bread on only one side (tartine). There's a single word for brown sugar (cassonade). There's a different word for hair on your head and hair on your body (cheveux, poils), etc.
English words with no French equivalent? Shallow and Serendipity come to mind. I'm googling but I can't find many other at the moment, apart from very technical terms.
English is a bit more subtle. It has inherited a lot of French vocabulary from the Normans, as well as retaining Anglo-Saxon roots. So it often has more words for things other languages have fewer words for, with English having more shades of meaning.
Take happy for example: joyous, joyful, gay, merry, gleeful, etc. I counted some 50 words in the thesaurus under happy, but many of them only translate to joyeaux or gai according to my dictionary checks (although there are also plenty for French, not as many).
English tends to have a lot of words that are close synonyms but with different evocations. It's possible to converse English in language almost completely of French origin, or speak plainly in the common tongue with older words. The former sounds flowery; the latter tends to remind one of the Bible.
Many English dictionaries have more than 300k entries. The reason is that many of the words are close synonyms inherited from different sources. That's what I was trying to get at in my carefully constructed sentence.
IMO the synonyms you link to aren't as close as those at http://www.thesaurus.com/browse/happy - I believe there are further English translations of many of those on your page, but they take things in a different direction. Many of them relate to luck, for example.
I experimented with synonymo.fr a little further. It looks like many distinct English words have a single French word translation, and that French word has different meanings in different contexts. Consequently, looking up the synonyms for that word, you end up seeing synonyms for all the different contextual meanings for that word.
For example, take the word 'sky'. A quick scan seems to suggest French doesn't have distinctly separate words for sky vs heaven, translating both as ciel. Synonymo.fr suggests lots of synonyms, but they are mostly synonyms for heaven, not sky. It's like this with most French words I experimented with; same word, lots of different meanings, where English has separate words without the same degree of "meaning spread". That is, the English words are more precise and less ambiguous. And there are usually more of them, without wandering into different concepts.
> When conveying nuanced ideas, I believe they will be much closer in length, with perhaps a small advantage for French.
I've looked in bookstores at expert translations of French novels to English, and English novels to French, and in either case, the English is shorter. This seems contrary to your claim.
I notice that four of the seven authors with the smallest vocabulary are writing theater (Moliere, Shakespeare, Racine, Corneille). I think it's not a coincidence, because writing in rhymes must be a hard constraint in the choice of words one can use.
A keen observation, and Wilde really makes it four and a half.
But Shakespeare wrote mostly blank verse, and rhyming is not as much of a constraint on vocabulary in French due to the high commonality of word endings. If it's true that works for the stage tend to use fewer words, it might be because dialogue dominates; I imagine narrative, which consists more of description, naturally draws on a bigger vocabulary than even very stylized dialogue.
For a discussion of the ease of rhyming in French versus English, see, e.g., this book [0]. Rhymes are widely considered to be easier to form in French; partly as a result, however, classical French poetry developed several additional stylistic constraints related to rhyming, such as the distinction between "feminine" and "masculine" rhymes of words with and without a final devoiced "e".
I wonder what you would accept as "empirical evidence" of the relative ease of rhyming in different languages. The demand strikes me as typifying one of the obnoxious impulses of the Hacker News community (another of them on display elsewhere in this thread) to dress up a vapid criticism as scientific skepticism.
You could, I suppose, try to do a combinatorial analysis of the ease of rhyming by writing a program to find rhyming pairs in a phonetic dictionary; just make sure your blog post about the results properly acknowledges the shortcomings of your analysis in its title.
So basically you make a claim that french is easier to rhyme than english, then I ask you if you have any evidence and you retorqued with a condescending answer adding a link to a book you took 5 second to google and that you didn't read.
You didn't read it because I just read it and while the first sentence would make it appear to back your point, it doesn't. First because he doesn't compare french to english but merely explain how the main accent is put on the suffix of words and that, in turn, makes it easy in french to build rhyme.
And if you continue to the next page, he explains how french rhyme are not on the same level as their english counterpart "because most English rhymes involve the root rather than the inflectional ending or suffix [...]"
Bottom line, you are wrong, and you try to pass as knowledgeable. You are a fraud.
Perhaps the reason is not the necessity to rhyme, but to be popular; being prolix might be a deterrent to popularity.
Even if so, I find the conclusion unexpected, as Shakespeare not only had a wide popularity in his day, but also coined many new words into the English language.
I'm not sure what you mean by "sophisticated". Languages in the 16th century were just as complex as they always have been.
The vocabularies might be bigger or smaller compared to today. Certainly no one was saying "bae" back then (for instance), but no one calls each other "thou" today, either.
EDIT: except for the older generations of Geordies, apparently.
The findings of this study are unrelated to the title of this post.
First, the authors and books were not chosen randomly and the sample size is tiny, so the results are meaningless, although I do respect the effort that was put into this. The author is up-front and honest about this in the beginning, but unfortunately proceeds to draw unwarranted generalizations based on his study.
Second, even if a similar, but much larger, study were conducted on randomly selected authors and books, the results would still conflate the vocabulary size in common usage with the vocabulary size of single authors. Hence, what the article is attempting to study is not the same thing as the title of this post.
Sampling is a complex topic of its own, but sample size has far less to do with research validity than sample selection.
In the case of literature, random sampling is arguably far less valid than selected sampling of, say, top authors.
The interactions here are complex. There are a finite set of published authors, they're selected by editors, works are bought by the public. So there's both nonrandom and popular selection. Works themselves typically contain many words either from the popular argot or not (e.g., Joyce).
And if you're going to sample from large sets of works, say, newspaper and magazine articles, you'll run into issues such as style guides which often dictate terms that must (or must not) be used.
I think the most unwarranted generalization is yours. The author clearly did not mean to do what you criticize him for failing to do.
But I suppose we could ask the mods to change the submission title. How about this?—"One blogger's analysis of the size of the vocabularies employed by some of his favorite French- and English-language writers, neither claiming nor achieving statistically significant insights into the question of whether French- (on the one hand) or English-speaking peoples (on the other) use more words in common usage."
Or: "Once again, the intent of this blog post is clear from its opening paragraphs (some readers' abiding dissatisfaction with its title notwithstanding), aforementioned intent being not to analyze patterns of general usage of the French and English languages in a statistically rigorous way but to offer an entertaining and well presented disquisition on some really quite insignificant but nonetheless amusing points related to the vocabularies of selected French and English writers of the author's own choosing (aforementioned dissatisfied readers nevertheless succeeding in cluttering up the comments on Hacker News with their whinging)"?
Wouldn't analysis of top 3 contemporary newspapers / magazines in each language yield a more accurate result?
Ie if Dickens or Melville used certain words in their writing - if modern speakers don't know them, we can't really use them to gauge how many words English speakers use today.
I, for one, simply can't read Shakespeare - most of his vocabulary (anybody knows what a 'bodkin' is?) sounds foreign to me. And I scored 99 percentile on the verbal SAT.
Most of Shakespeare's language is accessible to English speakers in England, Ireland, Australia, etc. There might be a few word here or there that are not widely understood ("bodkin" is not one of these, its meaning is generally known in the UK).
However, American English, especially as is daily spoken in the more ethnically diverse areas of the US has a much reduced vocabulary due to the need to accommodate non-native speakers.
For an example of where Shakespeare's language become confusing consider
"Romeo, Romeo, wherefore art thou Romeo."
Modern speakers of English most often interpret "wherefore to mean "where", when it means "why".
Which words in that passage are not easily understandable? The most "obscure" one that I can identify is "notwithstanding" (Google gives 48,000,000 results for "notwithstanding", so it is not exactly an obscure word).
A failure to understand that passage is a failure of comprehension, not of vocabulary.
Understanding the English-speaking world's most famous poet and playwriter might be beyond the abilities of some people, that is for sure. It doesn't mean, though, that the language used is deliberately obscure.
I would estimate that most high school graduates in the UK could correctly parse the passage posted, given enough time.
Those with education specifically in English would parse it at first pass.
Honestly - I don't understand anything. Imagine you had an English text with half the words substituted with Dutch ones (that look similar but make no sense).
So you might think you understand 2-3 words in a row only to come across the fourth word that seems key for the line but means nothing to you in the given context.
What makes it harder to understand is not the vocabulary, but the grammar, as well as Shakespeare's sometimes idiosyncratic use of word meaning. For example, "strain" here means a portion of music, which while not its most common meaning, is certainly within the abilities of most educated native speakers of English to understand, at least from context.
That's exactly what I mean - words that we otherwise know are used with completely different meaning, which renders the text unintelligible.
I disagree with you completely that 'most educated native speakers' can understand this. By the same token one can say that most native English speakers can solve differential equations :)
The definition of strain you mention is #30 in the dictionary - http://dictionary.reference.com/browse/strain. Can you cite its use in that meaning in a contemporary newspaper or magazine?
As to usage of "strain" in music, a quick search lead me to this article from the US dated 16 Feb 2015 [1] "When the trumpets blare and the snares rattle during the opening strain of John Phillip Souza’s The Washington Post March..."
I not proposing that Shakespeare is the easiest thing to read, but merely suggesting that it is not as inaccessible as it might first appear.
If anyone is interested in anecdotal data [0][1] regarding sentence length for equivalent meaning, I have to use both French and English every day, alternating between them (spoken and written) all the time. English is almost always shorter than French, which is more verbose. I am used to "thinking" in whatever language I'm using to talk/write and have noticed many times that I'm a bit slower in French, even if it's technically my native language. It might also be that French seems to have more alternatives to say the same thing in different ways, but that could just be just me.
[0] C'est presque toujours plus long en français qu'en anglais.
[1] It's almost always shorter in English than in French.
I think French spelling plays a part here, though. If I were to write your example phrase in French written as pronounced using Finnish language spelling rules, it would look like this:
"Se preskö tuzuur ply lon on franse kon angle."
This "phonetic compression" made the printed sentence about 25% shorter.
I have the same experience in Portuguese (my native language). Romance languages tend to be more verbose (also anecdotal personal evidence). But, in a way, my feeling is that modern English has substantially more words than Portuguese – which, I suppose, makes sense. English is under a much greater creative pressure due to its wider audience (English as a global language).
Using your example sentence (armchair laboratory):
[0] É quase sempre mais curto em inglês do que em português.
[1] It's almost always shorter in English than in Portuguese.
Being a french that always do EN and FR multi locale websites, I'm quite surprise of this question :) The french text is almost always longer than english one. Not sure if it's about word count, though.
Also, something very common for french people when learning english is to try to translate their french sentence exactly. It always seems over-complicated in english, and quite pedantic. With time, we learn to stop trying using the same words and try to express it directly in english. We realize at this point english sentences are always way more simple. But here again, vocabulary complexity says nothing about word count.
My previous view was that English vocabulary is based on Germanic as well as Latin roots, and so there there are often two terms with the same meaning, one which is subtly different in usage than the other. In contrast, French vocabulary is mostly Latin in origin.
Just because more words exist, does not mean the authors make use of these words. I think the author would agree that English has more words - he cites the dictionary sizes skewing heavily in favor of English.
With that said, I am also surprised. I would like to see a similar analysis using more modern works, as well as a discussion around how texts are selected. It seems like the selection of authors and their novels will significantly impact the results.
FWIW - Both English and French stem from the same original language (Proto-Indo-European). You are correct that English is a Germanic language (this does not mean it stems from German), and the French is a Romance language - with Romance being used in the sense of "came from Rome," not "sounds sexy." If you are interested in more details here, I highly recommend "The History of English" podcast: http://historyofenglishpodcast.com/ I'm still working my way through it, but it does a great job explaining the history of the English language, how it evolved, how it relates to other languages, etc.
English contains words from many other languages. It is not just Latin words, but many more recent languages (such as French) as well.
I'm just going to assume Betteridge's law of headlines is in effect. If the answer to this was "yes", it would be phrased in a declarative rather than interrogative manner.
I have to strongly, strongly disagree. There is no "perfect use" of word clouds in the context of proper data visualization. The arrangement and colorization of the words are virtually on account of randomization (and the sizing of words based on frequency is of dubious value, given how difficult it is for the human eye to evaluate the size comparison)...this randomness is anathema to the proper presentation of data... the variability in interpretation is vast, and depends on each viewer's ability to see colors and read vertically.
Generally, I find word clouds be be pretty useless anyway. They visually highlight the top few words (4 or 5 at most), but beyond that are obscure and uninformative.
Some have pointed out that French is almost always longer. It's a bit more complicated than that. French uses a wider vocabulary than English, and uses many different words to convey different connotations. Words, as a result, tend to be longer, because they carry more information.
English tends to be much more modular and flexible. Nouns can be made into adjectives, adverbs and verbs rather easily, and "prepositions" drastically alter the meaning of verbs.
The end result is that English can be much shorter than French when trying to be concise. A short UI message will always be much shorter in English than in French. However, when conveying nuanced ideas, I believe they will be much closer in length, with perhaps a small advantage for French.