Hacker News new | past | comments | ask | show | jobs | submit login
Estimate your English vocabulary size (testyourvocab.com)
318 points by mike_esspe on July 17, 2011 | hide | past | favorite | 311 comments

I'd be really interested in the percentiles of the non-native speakers. With an alarmingly low 10.700 words, there is not even a percentile for me... And I know that my fluency of English is at least above the median around here (edit: here = where I live).

This also shows how extremely time-consuming it is to learn a second language. I started in school, 10 years old, am moderately well educated (some college drop-out), and use English on a daily basis. I also watch most movies in English (very seldom for people in a German speaking country to do), read some English novels, and also most non-fiction books I have are in English. Internet use is nearly English only.

Still, I probably have the vocabulary of an average 12 year old native speaker. After 17 years of learning and using the language, and at least 10 years of that using it _daily_.


I got a score of 26,000. I'm Finnish, and actually learned French as a second language in school and English as third.

I've never lived in an English-speaking country, but English is so prevalent in Finland these days that I wouldn't be surprised if it gained some kind of official status within the next 50 years. Whether in formal meetings or informal bar encounters, people voluntarily switch to English if there's even one non-Finn present. In my field, this happens nearly every day.

I even use English to communicate with Swedes, even though Swedish is the second official language of Finland and I studied it for 6 years... There's no point in limping through the conversation with my childish Swedish, when it's 99% guaranteed that Swedes speak English.

Holy wowsers. I grew up in England, and i only scored 14,800 on the test! Although English isnt my native language, i did speak it from a very early age.

I was completely honest scoring 16,000, English is my one and only language. Depending on your field/background you could understand more words than this test suggests. Looking up the words that i did not understand coupling them with there relevance and the shear volume of test words or lack thereof makes this test inaccurate in my mind.

Another Finn here... I got 25,200. I guess I need to read a couple books. :P

My Swedish is atrocious. I can't recall a single time that I'd have had use for it outside school. In contrast, English is useful every day, though mostly in written form.

And third Finn here, but my score was mere 13,000. At least I have the excuse of still being a student :)

Last year I met and associated with a Finn while traveling in France. He spoke extraordinarily good English and could have passed for a native of the USA with very little accent correction (it was more the timing of his speech that gave him away). I wouldn't have thought him to be exceptional until I discovered that I was the first native English speaker he had ever met and he had never even left his country before then.

I wish I had been given language instruction of that caliber when I was in school.

I don't know if you can blame it completely on school. They also watch their films subbed, not dubbed.

This vocabulary test got me thinking about the last unknown English word I'd encountered, and that inspired a little post on my blog. Have a look if you're interested -- there's a bit of a coding angle, even:


And I am a native english speaker and you got a higher score than me. I never prided my language skills but damn, that's a shot right in the ego!

Some studies I have seen, showed that the average newspaper has a 6000-8000 word vocabulary.

Add another 2000 to cover technical/domain specific/slang words. And you should have no reason not to be able to communicate freely on day to day basis.

I got 26K (English is my 3rd language) and most of the 'strange' words are from reading too much science fiction as a kid.

+ on showing percentiles of non-native speakers tho

The Oxford Advanced Learner's dictionary has a 'defining vocabulary' of about 3500 words. They claim most definitions are limited to the use of those words and proper names. So, that 6000-8000 looks believable.

Even writers of high-brow literature use a much smaller vocabulary than the one they can read. Probably more than the newspaper, but not likely 35,000 words.

For non-native speakers, I would guess that there are quickly diminishing returns past about 10,000 words. Much more important is how well you're able to use those first 10,000 (or maybe even 5,000). It's one thing to recognize a word when reading it, and another to be able to use it in a natural way, choosing words with the right shade of meaning, and avoiding awkward constructions or unintended connotations.

Googling says that Shakespeare used 31534 words , of which 14376 appear only once, 4343 twice: http://biomet.oxfordjournals.org/content/63/3/435.abstract

Not a modern corpus, but reasonable.

Strong's Exhaustive Concordance of the Bible (that would be the King James version) lists 8674 Hebrew root words for the pentateuch, and 5624 Greek root words for the New Testament.

Wikipedia suggests that full literacy in Chinese requires knowledge of about 4000 characters.

Note that in Chinese, character is not the same as word.

Most modern Mandarin words are bisyllabic, and represented by the combination of two characters, and their meaning does not necessarily derive in a straightforward way from the meanings of their constituent characters (for example, the word for "thing" is composed of the characters for "East" and "West" in conjunction)

Of those unique words, how many did Shakespear invent?

Making up new words is an easy way to boost vocabulary.

Definitely. I would have liked the test to be "would you be comfortable using this word in a sentence? (if you had to)" Even as a native speaker there are times when I know the definition of a word, but opt for a simpler word because I don't know all of the implications of using it.

And even if you were, you have to make sure that your listeners understand you, too.

Right, unless you want to ruin your point entirely by stopping to define words. Communication relies on shared vocabularies, not just one person's vocabulary.

There are plenty of words I know that I generally avoid in conversation for practical reasons. Entertainment reasons, though... I caught my wife teaching our not-yet-two-years-old daughter to use the word "etiolated" when referring to a green bean that was wilted and yellowish.

I am Dutch, and the estimated vocabulary is 15,600 words. It's really humbling, but on the other side surprising that one can participate in scientific discussion with such a small vocabulary. But I bet the distribution is not very similar to an American kid with a vocabulary of that size ;).

It's easy to notice the difference in vocabulary in practice. Native speaker's speech is much more varied.

Also, it's maybe good for native speakers to realize that someone who may seem a bit rude or dumb on the Internet may be a non-native speaker who has difficulty expressing him/herself.

This is actually not surprising since word Frequencies more or less follow a power law probability distribution. With a vocabulary of even 1,000 words you are very well covered. Anyone who engages in a discussion with words that are not very common is doing the very opposite of trying to communicate.

I choose words on the basis of how precisely they convey my desired meaning - which may exist at multiple levels, eg using esoteric words when talking about esoteric words. It seems to me somewhat anti-elitist of you to decry those who enjoy taking such care.

(I scored 35,300; of the words I recognized only about six were ones I don't either read or use with some regularity. I read a fair amount of literature though.)

The vocabulary you need for scientific discussions can't be measured in these kinds of tests, because it is really very specific. So your english vocabulary may stil be a bit bigger.

The words that are measure here are the words that you'd use in "general" conversation I guess.

> surprising that one can participate in scientific discussion with such a small vocabulary

Not really surprising since most of English scientific vocabulary comes from greek or latin roots and is more or less the same in other European languages.

Don't worry, this article (http://howlearnspanish.com/2010/08/how-many-words-do-you-nee...) says that for most people, 3,000 words will get you to 94% of oral communication and 10,000 words constitute the active vocabulary of native speakers with higher education.

The difference between 10,000 and 30,000 is knowing the words "uxorcide" and "tricorn". Before this test, I've never seen the first one and the only time I've seen the second one is in reference to pirate hats.

Frankly, I'd rather have 10,000 word vocabulary in multiple languages than my current situation.


"To change your software so the user experience becomes overwhelmingly worse. e.g. Some critics say Microsoft committed uxorcide with the Office Ribbon"

That's for Spanish, though, which generally uses a somewhat smaller day-to-day vocabulary since it doesn't have the same weird Romance/Germanic thing that English does.

Most languages end up having about the same information bandwidth, but make different tradeoffs between the number of syllables per second and the entropy per syllable. Spanish and Japanese, for instance, tend to be spoken rapidly and have a more regular set of sounds, but languages like English or Chinese tend to be spoken more slowly and have more different sounds in each syllable. I'd imagine that low frequency languages tend to have large vocabularies that they use, but that's just a guess.

"tricorn" is an everyday word in Spanish. The reason behind is we have a kind of police force that still wear "tricornios". And in my opinion they are still ridiculous.

Well, in many cases it helps to know multple languages. Case in point: uxoricide. There were quite a few other words I knew from other languages in the quiz (English is not my mother tongue).

That and the amount of words I know RPGs in the quiz really surprised me. How often do you use a bludgeon, wouldn't you use a bat in stead?

Finally, there's a huge difference between active vocabulary --the words you actually use in speach or writing-- and passive vocabulary that is tested here. In my humble opinion, passive vocabulary should be tested in context of a sentence.

I always thought of "bludgeon" as a verb. First I've heard it being used as a noun (and now my vocab has expanded!).

Talkative people will bludgeon you with their thoughts. It's useful when you've already used other words like "harass" and "bore" and need a synonym.

I knew “uxorcide”, but only from reading about Battlestar Galactica online.

My background is roughly similar to yours (German, 9 1/2 years of English in school, been living in the US for 10 month) and I got 10.900 words.

Especially seeing the second and third page surprised me as I can't recall ever hearing most of these words. My girlfriend noted that many of them are older terms, but still I expected to have picked up some through movies or books.

On the other hand: studying and working in the US with just 11k words works quite well.

I'm a French native, began English when I was 13 (German is my first "school language") although I used basics of English before (to play some games, use DOS). The test estimates my vocabulary size to be 21,100. There is _no way_ that I know that much words. I think it's biased a lot for foreigners, particularly French ones (which gives you a lot of formal English words "for free"). And even more when you have taken 3 years of latin, 1 of greek.

> that much words

that many

I only got an estimate of 11500 words, yet I can still correct you :). My native language is (Swiss-)German, spent a couple years in Montreal where you don't really have many chances to improve your english (at least not the vocabulary), otherwise only used the language in written form.

I'm in similar situation: 9,780 words.

I knew general meaning of many words I didn't check, but I couldn’t exactly define them. I know that I will understand the meaning from the context when I see them in some sentence.

You hit the nail on the head. There is a huge difference between knowing the general meaning of a word and being able to exactly define them. Simple examples: is a cow a deer? Why (not)?. Is a banana a berry? Is a blueberry a berry? A strawberry?

I used the former because I thought it was what people would do (as it is easier to do; people will not what to spend 30 or more seconds on each word), and got 26600. That is below, but about average. I am not a native speaker, but would have expected to score higher in the general population.

I am fairly sure that this test does not reach the general population, though (by the way, it is nice to see that the test adapts to one's level. Try answering almost none of the words on page 1. I got 'my' score down to 28 by confessing to know only one of the words)

The claims being made on the site that the median score of the whole population is 27k words is not substantiated anywhere else. E.g. http://iteslj.org/Articles/Cervatiuc-VocabularyAcquisition.h..., which references 'proper' studies (but I haven't tracked the references), says

"Based on previous research, Nation and Waring (1997) estimate that the receptive vocabulary size of a university-educated native English speaker is around 20,000 base words, while Goulden, Nation, and Read's (1990) intervention indicates that the receptive vocabulary size range of college-educated native English speakers is 13,200 - 20,700 base words (Goulden, Nation, & Read, 1990), with an average of 17,200 base words."

I'm a non-native speaker and got 18 700 words, which I found rather disappointing when I read that that was, according to the site, roughly equivalent to (or even below) the average 15 year old native speaker. Thinking a bit about it however I'm quite sure that that is nonsense - I am regularly asked by native speakers to review their English texts and scholarly articles and am relatively often commented on my broad vocabulary. When people review mine and I push them on why they suggested certain grammatical or stylistic changes, almost invariably it turns out that they are influenced by personal style preferences or local customs (as in, local to the area they grew up in). Now I'm not God's gift to the world and I'm sure that there is much to be improved on my English, but still I'm quite sure I can outperform the 5th percentile of the general population on English vocabulary knowledge. (I mean - that's people with an iq of 80 or less ffs, again not to say that I'm a genius but I find the proposition that I would score worse than most native speakers that qualify under many definitions as mentally retarded to be preposterous).

I'm highly skeptical of the site's claim that the 50% percentile knows 27k words. It seems to be from their own test takers, and I didn't find any references to other researcher's results, of which there are several.

Speaking of animals, how is names of plants and animals counted? Is that part of a vocabulary, or is that special?

I'm also not a native English speaker, and one area where I know there's a huge difference between my Swedish vocabulary and my English is when it comes to names of plants and birds and spices and animals and trees and fish and rocks and flowers and vegetables and fruits. I know maybe thousands of names of such things in Swedish, but in English I know much fewer names. That's a few thousand words I lack and probably will never learn because it's so specialized.

(Btw, bananas are berries, but cows are not deers. :) )

> (Btw, bananas are berries, but cows are not deers. :) )

Not sure if you're joking or not, but to be clear to non-native speakers: bananas are definitely not berries. Berries are smaller and rounder. For example: blueberries, raspberries, blackberries. I guess strawberries, too, but they're outliers.

Also the plural of deer is deer.

Bananas ARE berries, so are watermelons and tomatoes. But strawberries and raspberries are not berries.

Berries have seeds on the inside; a strawberry's seeds are on the outside, and raspberries are little clusters they are called something else. It's a famous quiz question in the UK :)

It depends, of course, if you're talking as a botanist or as a cook. It's one of those words with multiple levels of truth.

Same with tomato - it's a fruit botanically, but in culinary terms it's a vegetable and definitely not a fruit. As long as you get your context right, you won't have problems communicating :)

Wikipedia suggests bananas are berries, at least under some definitions of 'berry':


Actually, you can refer to a female deer as a "cow" (though "doe" is more common).

Site creator here, again -- my research (small sample size) having Brazilians take the test is that a couple years of learning will get you around 1,500-3,000 words, several years around 4,000-6,000, and a good student with, say, 8 years of classes might get around 10,000.

Beyond this, it's pretty much necessary to live abroad for an extended period of time, or be exceptionally good (driven) at languages and watch TONS of TV, use tons of online chat, etc.

English is my second language and I got ~13,500 words. I was surprised that this was only half the average. I read a lot of scientific papers in my field and almost never come across a word I don't know. Writing technical papers well also requires not much vocabulary (it's more about clarity of expression). I believe to get beyond that number you simply have to read lots of books. I notice lots of words I've never heard of when reading, say, Bill Bryson but I very rarely bother to look them up because they usually make enough sense from the context.

> probably have the vocabulary of an average 12 year old native speaker

I very clearly remember my first time in Italy, when I saw a family at a restaurant and realized their 5/6 year old spoke much better Italian than I did. Kind of humbling.

But the average 3-year old Chinese speaks infinite better Chinese than I do (I can't even pronounce 'ni hao' correctly I'm sure, despite my colleagues kind praise of how I say it very understandably); I don't see how that's humbling. Language knowledge is imo rather useless; languages is a very inefficient idea transfer mechanism, and the amount of energy we collectively waste on translations and learning languages is staggering. I see languages as an unfortunate side-effect of human's poor natural communication traits, and I hope that we can do away with most of them asap.

I don't think I understand. How would we communicate without language?

That would be rather hard, even if we had neural interfaces - I hope to do away with most languages so that we have only a few (ideally one) left, and hopefully those wouldn't be too dissimilar either.

I'll admit though that that point was rather tangential to the post I was replying to.

Just so that anyone lower feel good, I did only 8500, I need to read non tech topics probably and never lived in a english country.

Be happy my friend: I've scored a shameful 5700. I see it is very low compared to many others reporting here, anyway my English is more than ok when compared to most of those around me.

I believe language knowledge is very influenced by the surroundings, the place where you live, the people you deal with.

Aware of the weakness, a few days ago I posted a call for suggestions on Google+; now the vocabulary test moved me to ask the same to HN: http://news.ycombinator.com/item?id=2772950

PS: I'm from Bari, Italy.

25,500 ~ Dutch.

I'd hoped to score at least median, disappointing. My exposure to English is pretty much limited to TV, 19th century literature, day-to-day conversations, and tech articles.

I'm a native speaker. I read a lot. Fiction, nonfiction, literature, etc. I've got a master's degree in CS. I'm a published author at multiple respected conferences. I've been learning Spanish as a second language for ~2 years.

I scored about 33,000. I don't think the distributions are very accurate. I can't speak to where you should be, but I think you have an excellent score. I hope this point of comparison helps.

I also got 33K, ~80th percentile. But, I got 98, 99th percentile on the Verbal section of my GREs (which has a lot of vocabulary, though it isn't only vocabulary). I find it hard to believe there's that big of a gap, though I suppose a lot of foreign applicants could drag down the GRE average, pushing me up. But, it's also possible that people were liberal with their claims to understand words.

The biggest difference between natives and non-natives is the amount of "passive vocabulary". Sure, we non-native English speakers can express ourselves well enough, we can participate in discussions, even on a high scientific level, as long as we're familiar with the problem domain. But because we've (I'm generalising, of course) read fewer English books, been exposed to the lingo for a lot less than natives, we're limited in our passive vocabulary. Having been exposed to a language for every waking hour of your lifetime doesn't just enable you to speak it better, it also teaches thousands of words you never use, but nevertheless know the meaning of.

On a side note: this makes me feel better about my 25k score ;-)

For me it's the opposite. My passive vocabulary is rather big, but I can't think of all those words when I am speaking.

11.900, Norwegian. Worked in a multi-nation company where I spoke/wrote English daily for several years, and read books in English regularly. I guess this explains why I have to use the dictionary in the Kindle a lot, and probably should use it more.

I think I have assimilated enough US culture/vocab during a 10 yr stay. Of course studying English since the first grade probably helped a lot too. Non-Native (India) but scored 35,800 :)

I'm german - ignoring words I know but couldn't immediately place properly (raggamuffin for example) I got 19000.

My english comes mostly from watching movies and tv series long after I graduated, reading novels, articles, handbooks and later by chatting in english.

Due to movies and tv series, I adopted an us-american flavor of english.

Also keep in mind that this test didn't ask for domain specific vocabulary - if they would have made a section of computer related vocabulary, we probably would have all scored perfectly. :)

24yo german with 19k here also. I'm not so sure if vocabulary size beyond a certain level is really that significant.

I got the feeling that that I could improve my score the most by thoroughly reading a 18th century novel.

I have a good English. I can write emails, communicate with clients, write articles and blog posts. However, I noticed that I write only in IT related things. If you learned English for a purpose, then you are going to have a rather smaller vocabulary, but it'll be dedicated to the purpose you learned it for.

I got 5,490 words. May be my vocabulary is larger, but it's not broad: it targets specific fields.

While the exercice is interesting in that it is humbling for non-native speakers, it does seem dubious. I repeated the test 3 times, and got widely different results (10400, then 16700, then 18300). Since they are all quite outside the reported quantiles for native speakers, it may be expected to have high variance for non-native, but that means the numbers are not very useful as is.

I can't confirm your observation: my first attempt got me 11,500 words, the second 11,800. Of course I didn't check the words that I learned from the first run (I used a dictionary to verify whether my understanding was correct when in doubt, and to find out what some of the words mean that I didn't know).

I got 30800, and I am a native Dutch speaker. Though I have to say that I consider English my co-native language. Most of the media I consume is in English, and in my work environment (IT) English is the default language. But most of the words in this test I knew from reading books, as it is one of the few places where using a very wide vocabulary is not frowned upon.

Same here. Also a native Dutch speaker, but by far the most of my daily communication happens in English. I scored 25600 on the test.

My use of English is limited to daily-use for the most part. Apart from IT terminology, I rarely encounter field-specific words. Nor do I read literature.

Spanish here. Got an estimated vocabulary size of 17,7000 words. I lived and worked in the U.K for a couple of years. I think I might be in a better position than some of you when it comes to words derivated from Latin, as they are almost always the same in Spanish. Leaving those out I am sure I would get similar results.

French here, and I concur. Quite a few terms are directly borrowed from French or have very clear Latin roots, making them easy to understand even without having actually encountered them in English.

5,160 words here. I was expecting a worse result for a man who never really learned English in school.

Similar experience, 17,400 words.

German, speaking English on a daily basis, had 8 years of English in school and 9 years of Latin, which probably helped me a bit. (Though I hope my teachers don't read this)

Does anyone know a similar test for grammatical proficiency?

I'd be really interested in the percentiles of the non-native speakers.

I'm Russian by origin and have been living in Australia for the last 10 years. I think I read (and understand) a lot in English, but I still only got 13,000.

I'm a native spanish speaker, And I got 26,800. But then again more than half of the culture I consume is in english, and I spends an awful lot of time in forums and such.

German, 26 Years, 8 Years of English in school. Working for a US company, lived in the US for 3 months, American GF, lots of pirated american TV shows:

15,500 words

I'm French; my estimate is 39,400.

I've been reading English since I was 13 or so - quite possibly the bulk of my reading has been in English (a quirk of mine).

Native Finnish speaker, 31 100. Most of the books I've read in the last 20 years have been in English.

29.800, Greek. Judging by the replies of the other non-native speakers,I guess that's an okay score.

I am in a similar situation and my estimated vocabulary size is 10100 words. Doh! :D

I am at 10.200 being studying Enlgish for 8 years and using it on a daily basis for social and professional life. My level of English is above almost all my pears from I come from.

Pears, just like apples, usually don't speak at all, let alone in foreign languages.

Sorry for the pun but since this is a thread about vocabulary and it's sunday when most of the regulars are away instead of procrastinating at work anyway I hope I may be forgiven.

I am a non-native speaker, too, I got 10800. I guess when reading books, I should quit my habit I developed lately of skipping over those words I don't know :-)

I got 37,300. They claim this is not quite 95th percentile, which I am a tad skeptical accurately represents my vocabulary-size percentile relative to the general population. Perhaps this survey is being forwarded around unusually literate people at the top end, or more than 5% of responders are cheating. Where are the fake words to catch cheaters? I Googled a lot of what I didn't recognize, and everything I checked was real.

I just counted, and you've used at least 12685 dictionary words in HP:MOR [1]. I was expecting it to be more to be honest, but it didn't seem right not to post just because it didn't match my expectations.

It certainly seems unlikely that someone who can produce an excellent 500,000 word work of fiction (aside: thanks very much by the way,) in addition to reams of technical writing, has a vocabulary not in the 95th percentile of the population. OTOH, HP:MOR has fewer words in it than I expected, and even the upper bound of 14795 seems low. Maybe the working is wrong; it's shown below.

    $ cat Harry\ Potter\ and\ the\ Methods\ of\ Rationality\ 1-72.txt |
    tr -cs 'a-zA-Z' '[\n*]' | tr '[:upper:]' '[:lower:]' |
    sort --uniq | cat - /usr/share/dict/words | sort |
    awk '{count[$1]++; if (count[$1]==2) print}' | wc -l

    $ cat Harry\ Potter\ and\ the\ Methods\ of\ Rationality\ 1-72.txt |
    tr -cs 'a-zA-Z' '[\n*]' | tr '[:upper:]' '[:lower:]' |
    sort --uniq | wc -l

[1] http://www.fanfiction.net/s/5782108/1/Harry_Potter_and_the_M... — Eliezer's amazing Harry Potter fanfiction in case you're missing out.

There's a huge difference between writing something targeted at a selected audience, in a given time period, limited range of topics the characters will discuss, etc. and listing out all of the vocabulary words you know personally across all domains of knowledge. Even though HP:MoR Harry has a broad vocabulary, for example, and may use some words not all readers will know, there are still tons of topics (with their own specialized vocabulary...) that will never come up in the written storyline -- even if Harry would know them well.

Harry also presumably does some practical limiting of his vocabulary in conversation, because only shared vocabulary is useful if you're trying to actually communicate and don't want to stop to give definitions all the time.

It might be more interesting to compare the unique word count of HP:MoR against some of the "real" Harry Potter books, if you can get your hands on the text.

Thanks to the internet†, I can reveal a surprise: with the same methodology, the dictionary words in the seven real Harry Potter books concatenated is 19,245 and the total unique words is 21,441.

Total word count is 1,122,131 which is longer than HP:MoR by a factor of three. Plotting mean unique word count for the whole, halves and quarters of MoR gives a fit of uniques=168*length^0.3357, which makes sense given Zipf's law. That formula predicts about 18,050 words for a work of the same length as the original HP.

(Edit to add obvious test in the other direction.) The first 386,829 words of the original HP contain 12,255 unique words. The last 386,829 words contain 13,635 uniques. So, its comparable but perhaps slightly more varied (MoR had 12,685).

In light of those figures, is it possible Eliezer's vocabulary is less good than he thinks (Dunning-Kruger)? Especially as the Harry Potter book were written for children and presumably edited as such.

On the other hand, the fact that Eliezer seems to have used fewer words in his writing than you'd expect if his vocab was excellent doesn't mean that his known vocab is poor — he might just not use all the words he knows in writing.

Additionally, given the success of J K Rowling as an author, you might expect her vocabulary to be excellent, so it is conceivable that he's good and she's better.

† I have all the Harry Potter books on a shelf at home. Is torrenting the pdfs at work so I can word count them infringing copyright? I could have done it manually, it just would have taken longer.

I thought Eliezer's Lesswrong sequences might give different results. Applying your tests to those (from http://jb55.com/lesswrong/), I get 257,646 total words, 11,666 unique dictionary words, and 12,721 unique words (I'm surprised there aren't more unique words, given that the quantum physics sequence is in there). 168*257,646^.3357 = 11,010, so the sequences seem to be at about HP level.

Excellent work, by the way; thanks for the analysis.

Does this include word derivations?

Site creator here -- you're right, survey participants are incredibly literate. I suppose that's Internet users in general, disregarding YouTube commenters :), or else the particular people who have spread the test, or are interested in taking it. Average verbal SAT score on the site is 700 (out of 800), far above the population's average of around 500.

Right; most people with "normal" or worse vocabularies are emphatically not going to see a "test your vocabulary" link and say "Hey, instead of doing something fun, let's see how poor my vocabulary really is! Then I'll tell all my friends!"

And when they see some egghead friend on Facebook has posted their vocabulary score and is challenging them to respond... they'll roll their eyes, and move on to their Farmville updates.

Don't get me wrong -- I love these things, and it came back with 37K for me -- but there's no way I'm posting that score, or even the link, to Facebook. I know how to maintain friendships, and saying "look how smart I am; I'm probably smarter than you" does not figure into it.

I can't imagine that the percentiles are reflective of the general population...I got 27,000 which it claims is the median score, and from practical experience my vocabulary is quite a bit larger than most people's. If this is measuring, say, the HN crowd, then perhaps it's accurate.

I found the same thing. I suspect a combination of the early respondents being quite a bit above average and the possibility that some people are checking words they don't actually know, or simple believe they know a correct definition for.

I think it would be a good idea to weight the survey with some test questions that ask if you know a definition to some of the less common words and then ask you to pick a correct definition from a list of 5 with 4 incorrect answers. At least this way they can approximate how much someone may exaggerate their knowledge.

However as someone who answered as honestly as I could (without spending the time to verify my definition of each word) it is cool to know what my personal vocabulary is.

Also this article does claim 24k-30k is the average for native english speakers: http://www.independent.co.uk/news/world/americas/english-lan...

I only got ~22k and I managed to score in the top 15% on the GRE verbal portion not too long ago. So either people who take the GRE are on average below median or the results aren't quite accurate.

Likewise. I answered honestly and got a little over 22k. I've always tested extremely well on verbal portions of standardized tests and feel that my vocabulary is well above normal but this would put me at about the 20th percentile.

They could also test you on some of the less-recognized words you selected to help identify cheating.

OR they could actually make it a REAL test, e.g. multiple choice

I scored far lower than I would have expected, which although it hurts my ego a bit, I can easily dismiss because of the nature of this test.

How is multiple choice a REAL test?

In all the exams I had in my life, from 1st grade to BS in CS, I only had one multiple choice test.

Let's think about it logically. If you were to actually test the users instead of asking them 'which of these do you know?', the ONLY option is multiple choice since analysing text-field input from the user to determine if their definition is correct is at best extremely difficult, and more than likely impossible ... right ?

I agree that the quality of the result is highly dependent on the honesty of the person tested, but with multiple choice, wouldn't you be able to deduct the meaning of the word from choices you are given?

E.g.: One of the words I encountered in the test was 'terpsichorean'. While I knew that Terpsichore is one of the Muses, I did not know which one and left the box unticked. Had there been multiple choices, I might have guessed the correct solution.

I ticked 'terpsichorean', because it made me think "dreamy travelogue writing of a scenic beach with either terpsichorean sky or sea, it means X looks a pale shade of blue-green".

Google tells me it means dancing so I was way way off (maybe conflating turquoise and cerulean?).

But I have no way of knowing how many words that I feel comfortable defining are actually nowhere near correct, so to be any kind of accurate, they need to do some verification of correctness. All 'honesty' means is 'don't deliberately cheat' not 'don't be dumb'.

I had "discomfit" as one of my words that I wasn't clear of the definition of, I was pretty close when I looked it up but couldn’t have guaranteed it. It's probably easily confused with discomfort ... which made me think that this needs to be a little more tested. Commonly misread words could easily inflate scores.

However, I think a multiple choice test could also inflate scores unless the definitions were very cunningly constructed.

I scored 75-80th percentile (32,800) which surprised me. It seems quite a lot of words, for one. For another I consider my vocab' to be very good and I don't think I'm being bigheaded in that. Ergo I expected to be ranked higher.

On the second page there was an entire column of words of which I recognised only three sufficiently to provide a guaranteed accurate definition. One of that column was terpischorean, another tatterdemalion.

Whilst looking up tatterdemalion I found little use of it after the 1930s except as a proper noun (a Marvel Comics character for example). What I did find however is that Google Books is useless for finding dates. One citation from an author Sir Edward Bulwer Lytton is given a date of 1999. That's a reprint date, the author died in the 19th century.

I happened to know that one due to my music degree, but I ticked almost none of the other words of similar difficulty.

I did not tick the word, but I know Terpsichore is a muse. Do the other eight also have english adjectives?

Actually, the SAT also employs multiple choice, and quite successfully. You see, multiple choice can not only give you hints, but can also be used to lead you astray. In the end, it balances out.

True, but it's still much better than just relying on people submitting accurate results without even testing them ... at least with multiple choice everyone is being tested to approximately the same metric, and you can get a more accurate percentile

BTW, the word is 'deduce' ;)

haha, got me :-)

To my defense, they are both derived from 'deducere', to lead away, and 'were not distinguished in sense until the mid 17th cent' according to my system’s dictionary

they could add some non-existent words! some people just check all checks i think! (and i was honest and got 8,560)

Eliezer, I think you appreciate some of the ideas behind the FAQ I'll repost here with adaptation to the current situation:


As I commented previously when we had a poll on the ages of HNers, the data can't be relied on to make such an inference. That's because the data are not from a random sample of the relevant population. One professor of statistics, who is a co-author of a highly regarded AP statistics textbook, has tried to popularize the phrase that "voluntary response data are worthless" to go along with the phrase "correlation does not imply causation." Other statistics teachers are gradually picking up this phrase.

-----Original Message----- From: Paul Velleman [SMTPfv2@cornell.edu] Sent: Wednesday, January 14, 1998 5:10 PM To: apstat-l@etc.bc.ca; Kim Robinson Cc: mmbalach@mtu.edu Subject: Re: qualtiative study

Sorry Kim, but it just aint so. Voluntary response data are worthless. One excellent example is the books by Shere Hite. She collected many responses from biased lists with voluntary response and drew conclusions that are roundly contradicted by all responsible studies. She claimed to be doing only qualitative work, but what she got was just plain garbage. Another famous example is the Literary Digest "poll". All you learn from voluntary response is what is said by those who choose to respond. Unless the respondents are a substantially large fraction of the population, they are very likely to be a biased -- possibly a very biased -- subset. Anecdotes tell you nothing at all about the state of the world. They can't be "used only as a description" because they describe nothing but themselves.


For more on the distinction between statistics and mathematics, see




I think Professor Velleman promotes "Voluntary response data are worthless" as a slogan for the same reason an earlier generation of statisticians taught their students the slogan "correlation does not imply causation." That's because common human cognitive errors run strongly in one direction on each issue, so the slogan has take the cognitive error head-on. Of course, a distinct pattern in voluntary responses tells us SOMETHING (maybe about what kind of people come forward to respond), just as a correlation tells us SOMETHING (maybe about a lurking variable correlated with both things we observe), but it doesn't tell us enough to warrant a firm conclusion about facts of the world. The Literary Digest poll



is a spectacular historical example of a voluntary response poll with a HUGE sample size and high response rate that didn't give a correct picture of reality at all.

When I have brought up this issue before, some other HNers have replied that there are some statistical tools for correcting for response-bias effects, IF one can obtain a simple random sample of the population of interest and evaluate what kinds of people respond. But we can't do that here on HN, nor can we for the online vocabulary estimation.

Another reply I frequently see when I bring up this issue is that the public relies on voluntary response data all the time to make conclusions about reality. To that I refer careful readers to what Professor Velleman is quoted as saying above (the general public often believes statements that are baloney) and to what Google's director of research, Peter Norvig, says about research conducted with better data,


that even good data (and Norvig would not generally characterize voluntary response data as good data) can lead to wrong conclusions if there isn't careful thinking behind a study design. Again, human beings have strong predilections to believe certain kinds of wrong data and wrong conclusions. We are not neutral evaluators of data and conclusions, but have predispositions (cognitive illusions) that lead to making mistakes without careful training and thought. Here, the conclusion "those other guys are cheating and that dragged down my vocabulary percentile score" is an example of a conclusion resulting from human predispositions.

Another frequently seen reply is that sometimes a "convenience sample" (this is a common term among statisticians for a sample that can't be counted on to be a random sample) of a population offers just that, convenience, and should not be rejected on that basis alone. But the most thoughtful version of that frequent reply I recently saw did correctly point out that if we know from the get-go that the sample was not done statistically correctly, then even if we are confident (enough) that HN participants are young or that their vocabularies are large, we wouldn't want to extrapolate from that to conclude that the users of any technology site are young, or that respondents to online surveys as a whole have large vocabularies.

On my part, I wildly guess that most HNers are younger than I am in part because this kind of poll recurs often on HN. I similarly guess that participants in online surveys of vocabulary size are likely to have larger vocabularies than average people in the general public because most people I meet find discussions of word meanings boring. But neither guess gives me a good quantitative basis for estimating how much users here differ from the general population.

I'm questioning the way respondents are classified.

I chose 'Canada' as my region since I'm from Montreal. My first language is English and I'm fluent French. I did the first half of elementary school in French. Firstly non-Quebec anglophones tend to have better grammar and larger vocabularies than anglophone Quebecers. Secondly it doesn't take into account that English can be a 3rd language . Most immigrants to Quebec are required by law to attend French language elementary and high schools (there are exceptions). Immigrant children who's first language isn't English or French (the majority) take on two new languages, English being their 3rd after French. English tends to be the social language for many.

Montreal has a strong tech industry employing bilingual/multilingual people many of which read HN and possibly took part in the survey. My gut feeling is that English speaking Quebecers are skewing the stats. More granular control over region will be useful; show some insight to this reality.

Note: I traveled through China and south east Asia last year and found the quality of English to be much better than I expected. Considering Indochina ruled by the French I didn't find a person who could speak it. To possibly classify any country as "non-English-speaking" is kind of silly. Every country is "other-English-speaking" but then again it's a subjective classification isn't it. Doesn't China have the largest English speaking population now...

> Doesn't China have the largest English speaking population now...

Doesn't matter. Look at the density instead.

By contrast, I scored 23,300 putting me on par with the average result for a 15 year old, if this test is to be believed. I was brutally honest about not selecting words I only recognised and might be able infer meaning from in context, but couldn't articulate a clear definition for (of which there were a surprising number).

And yet I look at the vocabulary used in the comment posts of those claiming high scores, and wonder how they ever scored so highly (accepting that comments are not neccessarily reflective of ones general writing or vocabulary). I believe that as this test is so open to cheating, that using responses to it as the corpus for determining median scores renders the entire exercise completely meaningless.

None of the words were domain-specific; they were all general vocabulary. You're a polymath, but a lot of your knowledge is focused into technical areas that weren't represented on the test.

Illiterates are vastly underrepresented.

For those of you concerned with your score, you are deovting an undue amount of your time in discussing the results of what is—let's be honest with ourselves—the literati version of a "Are U A Vampire Or A Werewolf?" quiz.

P.S., 36,700 .. I took this before it got a lot of general circulation, and my standings have improved considerably. I suspect this makes me more worthy of oxygen.

19,600. I'm willing to accept this, although I'm not going to lie -- I'm very upset at myself. I'm used to scoring 99th percentile in every standardized test; it's kind of a shock to realize that I'm nowhere near the median of even my age group, let alone the general populace (I'm 20).

That said, I'm currently reading A Dance with Dragons and there are tons of words in this series (A Song of Ice and Fire) that I'm not familiar with. Most of the ones I missed are words I recognize from this series, although since I'm not 100% sure of them, so I left them unchecked.

I don't think this is very accurate. I scored 95+% on both SAT and GRE verbal, and I only got a 27,400. I imagine there are quite a few test takers who are not completely honest with themselves.

Don't be upset, I got 19,700.

Using big words make a good communicator not. The point of using a language is to communicate. Using esoteric words only a fraction of the population knows defeats this purpose.

If anything, given the same amount of practice but with a smaller vocabulary will make you a better communicator, IMHO.

I must admit, as an American male, a little older than you, my score was also so low, I'm too embarrassed to even mention. All those years of cheating on vocabulary tests (merely by memorizing the words 5 minutes prior to taking the test) in high school did not help. It's one of the those things I look back on in life and regret. Does anyone have any suggestions on ways to catch up ?

edit: forgot to mention, while I am a native English speaker, both my parents tend to speak Polish most of the time, while I always reply in English. I have to wonder how much this had an affect on me.

Read a lot. In fact, buy a Kindle and read a lot. It has a built-in dictionary which is quite decent, and you'll actually look up words that you would otherwise skim past with a mediocre context-based understanding.


I scored a little over 40k on this, and did well on other verbal tests for the general population when I was still taking tests.

I attribute much of my facility to 1)reading fantasy and 2)looking up words I don't know. Since I loathe interrupting the flow of a story, I read with a pencil and make a list of words to batch learning later.


I filled pages of word lists while reading Cryptonomicon. It was amazing.

22,300 as an American 22 year-old. I write and read slightly moreso than most people too (though nothing too formal or high-academia).

I also thought I had an unfair advantage early on because I knew a lot of words from playing Blizzard games all my life. Sorely disappointed :(.

20.7K myself and I'm 24 from the UK. I think the test is a little harsh or doesn't really mean what people would think. On the second group of words, half of those words aren't in popular use any more -- it's almost like a lesson in history.

I was a bit disappointed with 23,700, but it appears to be good for a non-native speaker. My wife, a native English speaker, virtually lapped me at 42,600 words. :)

Perhaps the results of this test don't predict academic success very well. I graduated at the top of my program (of about 30 graduates) last year and scored only 20.5k.

Maybe I fell behind my peers by only reading comp sci, math, business, and communications related stuff for 4 years. Guess I've got some catching up to do.

I'm 22 came in at 19.5k. I guess I need to use the dictionary in my Kindle a lot more.

I'm 23, I took this thing half awake and came in at 24k. I'm not an avid reader. The test seems to pull random words from a dictionary and not words used in a general mannor. Not only that but it messed all the delicious skill-trade grammar I know and love.


Would you have checked the word "manner"?

Or "manor"?

or "messed"?

This makes me feel a lot better about my 15500 as someone from Holland who has been studying in England for 4 months.

GRRM tends to use a bunch of unusual words pretty frequently: "dandle", "garron", "palfrey", "hauberk", "solar".

I thought in aDwD that "leal" was just yet another Kindle OCR mess-up for "loyal" until I found out it actually really is a word - an old English word for "loyal"! Goes to show... something, I guess! :)

Also, leal means loyal in spanish, that's how I knew, when I was reading that book :)

That's why I really like reading with the Kindle app. The dictionary is only a tap away. I'm now in the habit of looking up every word I don't know.

I strongly doubt that their methodology has any validity at all for non-native speakers. Extrapolating from a selection of 60-80 words presupposes a relatively normal developmental history; otherwise one would not be able to draw the primary inference at work here, namely that somebody who knows a definition for "mawkish" knows definitions for all words of similar difficulty and frequency.

Atypical language acquisition (e.g., as a second language, or through a non-standard channel like technology or fantasy literature) disrupts this extrapolation step. For instance, a German programmer that knows the word "polymorphic" via OOP is less likely to know similarly frequent and difficult but programming-unrelated words than British or American peers. So adding, say, 100 to the total would be utterly unfounded. Same thing for a science-fiction nerd: Acquaintance with obscure words from one domain doesn't extrapolate to other domains.

Unless they somehow control for domain specificity and atypical acquisition, let's not get too frustrated. (Disclaimer: Not a native speaker -- result around median.)

Hi, creator of the site here -- I worked for several years teaching English to foreigners, and my own (informal) statistical research has found that native and non-native language acquisition, while certainly somewhat different, is not tremendously different. I've run this test on Americans and Brazilians at all different levels, and the size of the progression from known to unknown words is rather consistent.

That being said, cognates between languages can give an artificially inflated score. I intentionally avoided any words with the same roots in English and Portuguese (my second language), which should hopefully also be true for most Romance languages. This way, you shouldn't be able to "guess" meanings you've never actually learned. However, it wouldn't surprise me if German speakers, for example, were able to "guess" an additional number of words correctly.

Also, the 60-80 words are only what you are tested on -- the word selection on pages 2-4 changes depending on your answers on the first page, so it attempts to narrow down your vocab knowledge "at the margin".

There were around 10 french words in the list I got. Maybe more, maybe less, I didn't count exactly.

Non-Native speaker here. I found out that in almost all cases i knew a synonym of the words is didn't know. The problem is these are the ones that are used very frequently so i am rarely exposed to the other forms. While i know high literature in German, i must admit i just have read non-fiction in English so far. This is really a shame and i am going to change this.

This test made me aware of that fact, regardless of it's scientific accuracy.

I back you up. Non-native english speaker here. My result: I know 10,600 words. I really doubt it, I have an english dictionary that contains about 4,000 words and I don't think I need another dictionary. Plus my computing terminologies, 10K words is still way beyond me

I'm not sure it's valid for native speakers, either.

Your basic vocabulary will be the same as most other speakers because we have structured learning in our schools.

But the less-well-known words are generally learned in context... So if you haven't happened across them in a book, you probably don't know them... But there are probably many others that you do, instead. This test could easily hit a bunch you don't know and totally miss all the ones you do.

Their statistic is probably inflated due to linguistic subreddit, where this test originated:


"Don't check boxes for words you know you've seen before, but whose meaning you aren't exactly sure of."

This is a bewildering instruction to me, since I've learned most of my vocabulary through reading, and rarely look up the definition of a word, instead learning its meaning through repeated exposures to its use in context.

Take for example "garron" -- like many of us I've been reading GRRM lately, so I've seen the word used some 78 times in the past few months, and I'm sure I've encountered the word a few dozen times before. I know it's a slightly undesirable horse of some kind. Likely this means it's a gelding or a small pony-ish horse. Do I need to have looked up and remembered the three specific submeanings of the word, or that it's a specific breed of horse from Galloway to be able to say I am "exactly sure of" the word?

I don't think that's how language works, but it's how this test seems to want it to work. My score of 35,300 is suspect on multiple levels.

Something often missing from vocabulary tests -- and arguably more impressive than raw, crystallized vocab -- is the ability to deduce the meaning of new words on the fly, whether from context or absent any context (in which case, the meaning is ascertained from roots, prefixes, suffixes, etc.). I'd love to see a test of this skill. For instance, I doubt many people encounter the word "reck" in their daily lives. But I'd give props to someone whose brain would quickly draw the connection to the more frequently seen "reckless," thereby deducing that the word had something to do with caution or concern.

Similarly, your point has some validity. Most of the time, we don't acquire language through reading the dictionary; we acquire it primarily through context in the course of reading or conversation. This is why, when pressed to define words, we'll often reach for a string of synonyms, or else provide usage in sample sentences. I bet few people here, let alone anywhere, could render dictionary-acceptable definitions of 99% of the words they know.

(On the flipside, this is also why we forget most of the words we crammed in preparation for the SAT back in the day; we learned them completely out of context and in an artificial way).

One of my favorite things about reading on a Kindle is the ease with which I can look up words -- "garron" being one of the most recent, in fact.

lots of people here are saying they scored lower than what they expected, and that maybe other people cheated. that could be it, but it could also be that hacker news folks tend to be overconfident. this would match the stereotype of this group being mainly male nerd entreprenuers, which could score worse on things like this but perceive themselves to score much higher (a feeling not a fact backed by studies that i can remember). who knows; just voicing this thought since no one has mentioned it yet.

I got just under the median on this test, but I scored around the 99th percentile in the SAT verbal, and I feel it's reasonable to say the two test approximately the same things. It seems unlikely that I've slipped so far in just two years :)

I never really considered Hacker News to be full of overconfident people. If anything, to me being part of HN is a humbling experience. It reminds me that there are so many people out there that are smarter or more experienced than me, and I've seen other people say similar things.

Hacker News is _very_ full of over-confident people, believe me.

During a discussion of 'whether open source contributions were being overly important to job seekers' a while back, a surprising number of HN commenters automatically put themselves in the role of employer, peering dubiously over their glasses at me. Many of these commenters were hilariously under-qualified to be taking on that kind of role with respect to anyone.

I don't think SAT verbal is nearly as intensely focused on obscure vocab. I didn't do it, but did do a GRE verbal back in 1994 to get into grad school. For what it's worth, my score on this and my GRE verbal were quite consistent.

> Hacker News is _very_ full of over-confident people, believe me.


"Many of these commenters were hilariously under-qualified to be taking on that kind of role with respect to anyone."

You inferred this based on user profiles?

Correct. Most of the folks in question had filled in enough to be able to be straightforwardly findable in terms of linkedin, blogs, etc. It wasn't rocket science.

I'm pretty much in the same boat. This scored me below median, but I scored in the 99th percentile on the SAT verbal as well (although that was a decade ago for me).

I think one contributing factor is the structure of the questions. If you ask me if I KNOW at least one definition for the word mawkish for example, I'll choose no I don't know a definition of it. I do however have a good enough feel for the word, that I can almost guarantee I can get a SAT style analogy with it correct, or if you gave me a multiple choice selection of definitions I can probably pick out the right one. I don't consider either of those skills equivalent to actually knowing a definition of a word.

Yeah, on the more difficult words, it becomes a test of your ability to tell yourself you know what they mean.

Same with me (SAT)... It says I have below the median vocabulary. They should have at least had some sort of base standard to compare the voluntary data with so that they could measure their sample drift.

i scored just over 90th percentile and i promise i didn't lie. it seems to me that a lot of the words were very old-fashioned - they were the kind of words i learnt from context while reading books by people like dickens. they're not words i would use in normal conversation, or when writing.

anyway, what i'm saying is that i suspect sat verbal is testing something quite different - isn't it much more aimed at ability to use the language rather than whether you can recognise obscure words?

[what surprised me the most was how graded it was - on both tests there was a pretty clear cut-off point where the words became unknown. i didn't expect it to be that ordered.]

I doubt that many people are cheating, but the "percentiles" are basically meaningless because the survey obviously self-selects for 1) people who are on the Internet, 2) people who are on Internet sites where this survey will get reposted 2) people who suspect they have pretty good vocabularies (who is going to take a non-mandatory 'test' that doesn't even earn them any Facebook Credits to inform them they aren't so bright), etc.

End result: substantial inflation of scores relative to what you'd see if you gave the same survey to the general population (and assuming everyone is relatively honest).

That's possible, even likely. Another explanation would be that tests like this tend to be biased towards people with very evenly distributed interests, preferrably with a literary, classical bias or background. Someone who knows 10,000 words from one or two specialzed disciplines will always score worse than someone who diligently read through all the high school text books and topped it off with some gutenberg.org.

(a feeling not a fact backed by studies that i can remember)

Maybe you're thinking of the Dunning-Kruger effect? (http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect)

Just as a test, ticking all the boxes scores 45,000 words. Which seems to indicate that they haven't seeded the quiz with fake words to weed out cheaters : Pity, since there was an opportunity to unbias it in at least one dimension. (I also tried deselecting just 1 of a few of the really tough words : Each one caused the score to lower).

Site creator here -- you're right, there's no cheating detection. I ask people not to fill out the survey results if they're not being entirely truthful, of course they are free to disregard the instructions.

But I created the site, less interested in absolute vocabulary size numbers (these vary widely depending upon methodology), and more in relative changes among age groups, SAT scores, etc. And hopefully, cheating would not be correlated to any of those...

But at the end of the day, this is not a controlled, scientific survey. It is a voluntary quiz, though I am doing my best to control for other factors.

I'm really, really hoping that you're logging referrer data along with scores - if you're not already, you might consider starting. Even if this is not a source of proper scientific data, it would be highly interesting to see how, for instance, the Reddit referrals do vs. HN vs. Facebook, etc. We can probably all make some fairly accurate guesses about where the high and low performers would come from, but I'd be real curious to see the exact breakdowns.

Obvious point : The only people interested in finding out their scores will be the kind of people who think their vocab is something worth competing on. There's no way this is a fair sample across all English-speakers.

I wouldn't say the ONLY people. This could be the next 'IQ test' where it seems the near opposite, the lesser intelligent ones are who are obsessed with them.

Going by magazines, common forwards and so on it seems the average man loves a good self-quiz.

But to remove some of the bias, we'd need approximately equal numbers of above and below average people to be going to the trouble of answering the quiz.

And in the back of my mind, I thought that the bulk of quiz magazines were sold to women, no? (Though I'm probably being too pedantic...)

It could be only either very below average and very above average people are interested in self-quizzes, too, so equal numbers alone won't be enough.

They ought to include some fake but plausible words to correct for cheaters. (Perhaps they do?)

Some of those words had a definite brillig quality.

And yet not a slythy tove in sight, gyring or otherwise.

The thing that always surprises me with this sort of test is how many words I half-know: I recognise the word, I have a (correct) general sense of what it means, but either I couldn't articulate an accurate definition or the definition I would give is an uncommon usage and I hadn't come across the more common meaning(s).

Following the rules strictly (so I didn't tick the half-known words), I came in slightly below average for my age, assuming the trend continues beyond the 32 years that their table currently gives.

For comparison, if I also included the words I half-knew, I gained nearly 4,000 new words and went up about 20%.

Yet, are rather cromulent

ObScore: 36000, 32yo male US-english native speaker. There were a few more I'd seen but wasn't really sure enough to define. I maxed out the SAT verbal section (and got 1 wrong on math, which was enough to drop to 780/800) back in the 1990s.

Don't care too much for percentile and age stats at the end.. clearly doesn't represent general pop.

What IS interesting tho, from a language learners perspective, is the vocab size estimation. A metric a lot of us use as a rough benchmark of vocab needed for fluency in a foreign language is 10,000words. Comparing this with what an educated adult native speaker knows in their own language (using my own truthful score of 24k) is pretty interesting.

Would love to have something like this to quickly gauge my vocab in other languages!

I agree: the test seems to correctly estimate vocabulary size, but is incorrect in calculating percentile.

I got 12900 (English is my second language) and it seems to be about accurate. I also feel that I'm fluent in English, and it confirms that 10000 words is sufficient for fluency.

The test is also missing professional lingo: where are such words as SQL, lisp, ai, startup, PG, HN and other that we all know so well?

The psychology of these things is interesting to me. My reflexive reaction was, of course, "I have to know!" and then my immediate counter-reaction was "This is just intellectual phallometry and is ultimately of no consequence to me."

Of course, I very quickly rationalized away the counter-reaction and took the test anyway, and then considered sharing it with my friends. What drives this?

Phallometry is nice when you have the big phallus.

Unless it's being used by a dystopian regime to decide who poses the greatest threat to the oligarchs.

Maui, clearly.

People who lead revolutions tend not to be very humble, from what I can glean from history. So what you're suggesting might work.

Yes, it's amazing how many people are "casually" posting their scores. Dick measuring indeed. (98,750)

I scored 38500 - seemed to be a test that would be helped by reading a lot of older fantasy literature, where 'terpsichorean' and 'turpitude' (to give a couple 'terp' examples that spring to mind) are the sort of words that authors like Jack Vance liked to wheel out in order to create a mood.

I'm not sure that the people suggesting that the failure to correlate with the SAT adds much; I don't think the SAT really goes all-out of the more flowery bits of archaic vocabulary in the way that this test did.

My 3rd grade son got 10200, and enjoyed discussing the words he didn't get. I think every 3rd grader should know "mawkish". :-)

I only scored 24k which seems low based on the statistics at the end. I also only selected words that I absolutely knew the definition of, even though some I think I knew based on the root.

Memorizing trivia words is just something that has never interested me. Instead I keep a thesaurus and dictionary handy at all times :)


37,100 I'm ashamed I didn't do better. I'm considerably older than the average HN reader. I did degrees in 4 different subjects (mind you, I was classed officially as retarded at my high school - in the same classes as the arsonists).

So no-one should feel the score is that important. I'm a very mediocre programmer. I'd much rather halve my vocab score to double my maths ability.

The population is self selecting; I wouldn't trust their percentiles.

Not just self selecting, but the scores are basically self-reported. It didn't actually test anything.

I got 28,800 http://testyourvocab.com/?r=38317 So apparently I should be 31 instead of almost 21.

I had a phase around 7th through 10th grade where I thought learning lots of vocabulary would make me smarter, especially words others didn't know well. (And so I'd use them in English essays for Extra Points since your grades are often determined by how little sense you make, because if the reader doesn't understand it obviously it's too smart for them!) I also had a general grammar nazi-ism.

Anyway, I think this exchange kind of tipped me over the edge to stop caring. (Of course that's led to forgetting a lot.)

William Faulkner, on Ernest Hemingway: "He has never been known to use a word that might send a reader to the dictionary."

Hemingway: "Poor Faulkner. Does he really think big emotions come from big words? He thinks I don't know the ten-dollar words. I know them all right. But there are older and simpler and better words, and those are the ones I use."

Of course, having some background in French and Latin probably helps for inferring a few words.

Ah, cool I know 80 english words. There must be something utterly wrong with how this test works in opera mini, clicking on continue on page two brought me to page one, going back and clicking continue again gave me just a subset of the choices from page one... (at least I guess it is 80, the number was displayed right over the middle of the word "words" in the result captcha).

So you have used up nearly 70% of your vocabulary to write this comment? I have to admit, your proficiency in English grammar is extraordinary given this.

Oh, no, he just learnt barely enough words to make that comment and a few similar ones. Quite frugal, really.

Just from getting some friends to do this it seems to me the median score overall and the median score for each age are a bit inflated. Just my thoughts, but I think people aren't being 100% truthful. Although I may just have a poor vocabulary http://testyourvocab.com/?r=36208

"You will never become proficient in a foreign language by studying vocabulary lists. Rather, you must hear and speak (or read and write) the language to gain proficiency. The same is true for learning computer languages."

Coincidentally, I just happened to come across this quote in Peter Norvig's "Paradigms of Artificial Intelligence Programming".

I was actually a bit surprised, not by the specific number of words (that seems reasonably fair given some statistics posted above on this thread), but on the percentage ranking. I'm a doctoral student whose vocabulary score was in the 96th percentile on the GRE and I knew every word when I took the WAIS-IV. Here, my score of 28,300 is just a bit above average. Either people are lying (possible) or this curve is clearly not a normal curve representative of the population (most likely). I'm a pretty avid reader, even though most of the older authors like Dickens bore me (thanks ADD!), but the person that came up with those words is possibly the most voracious reader I have ever met.

I also just realized that taking the test primed me to write in a way more intelligently sounding manner than I usually do. Not an LOL in sight!

I realise that the ego-stroking scoring is the driver behind this site's popularity , but I would also like to see definitions of words that I missed. It's pretty daunting to spend hours copying and pasting words into Google (well, ahem, maybe it is for some of us!)

I felt the same. Also there were some words I thought I knew what they meant, but my definition would only be partially correct.

Also some definitions you can guess at - do these count as part of your vocabulary or not? E.g. Clerisy, can be easily deduced to be the class of clerics, but not sure if I'd be able to say if it was a real word or not.

I am a non-native speaker (NNS) and got 15.700.

I think this test is not very telling for NNSs as it doesn't consider specialist vocabulary which many of us have a lot of of, because of how we /really/ learned the language.

When I left school my English was very average. When I started communicating with email with people from all over the world, but mostly the US, it improved a lot in 1-2 years. When I first went to a congress in the US in my med 20's I was blown away by my aptitude to communicate in that language.

But these were all people from my field. What I'm saying is that the distribution of words pertaining certain subjects in my vocabulary is severely skewed by the field I work in -- visual effects (and IT). I believe this goes for many NNSs.

>the distribution of words pertaining certain subjects in my vocabulary is severely skewed

You clearly have a very good vocabulary IMO. However, if I may, I think it should be "pertaining to certain subjects". It can be used without the "to" but sounds a bit conceited to this native speakers ear.

For example see http://www.thefreelibrary.com/_/search/Search.aspx?By=0&.... All the literature examples written out use "pertaining to" in some form: "pertaining thereto", "pertaining only to", "to them pertaining", et cetera.

Very interesting. The details on how it works are here: http://testyourvocab.com/details.php

(My result:) http://testyourvocab.com/?r=35795

Perhaps the most interesting part of how they're doing it is that they have an algorithm that doesn't rely on the results of other test takers, except for the eventual statistics.

Also worth seeing: http://testyourvocab.com/hard.php

Learning to program did help. I could check "shard" and "bloat", which are apparently quite rare in general context compared to words I know.

How about you? Were there words you could check because you encountered it often in programming context?

"The Second Edition of the 20-volume Oxford English Dictionary contains full entries for 171,476 words in current use"

It's interesting and humbling to contemplate that I know only about 20% of the words in my native language.

Site creator here -- what a surprise to wake up this morning, see my inbox full of messages about this site, and discover that everyone was coming from Hacker News, which I visit every day!

Thanks for all the participation, and comments -- I didn't submit this myself, so thanks, mike_esspe.

I've responded to a few points down below; there's a lot more info on the details of the test at:

http://testyourvocab.com/faq.php http://testyourvocab.com/details.php

Hi crazygringo,

I was intrigued by your estimation procedure and was thinking of a way of playing around with it:

(1) Create a random bit vector of size 45,000. (2) Pick 40 positions in the vector at random and use those positions to estimate the proportion of ones in the original bit vector (so far this is easy). (3) Select an additional 120 positions and use the more sophisticated procedure to refine the estimate from part (2).

I was wondering if you have code or pseudo-code for how you implemented part(3) specifically how you choose the 120 words for extra testing. It seems there are a lot of different natural ways you can do it. Did you write an academic paper about this?


13700 German, 22 years old

I had 9 years of English in school. Because of my hobbies I read a lot of stuff in English. I also watched many TV shows and spent about 6 month living in Australia.

Yet I feel insecure even typing this. Knowing lots of words is one thing. But what makes it hard are all the subtleties you have to take care of when building sentences. I also think that I get grammar wrong most of the time.

Another thing is that my sentences are almost always way too long.

As someone else pointed out before: I don't want my former English teacher to read this, either.

Think I have the lowest score here. 16,400 words. English is not my native language but I speak English daily and I wouldn't say my English is bad. Pretty disappointed with the score and also surprised the median is way way higher than I expected.

Edit: And, also to add, I followed 2 criteria for whether I know the word or not.

1. What's the absolute definition?

2. And can I find the equivalent or meaning of it in my native language? (which is Tamil, an Indian language, if anyone cares.)

I think a lot of the words are ones that you will only ever encounter in reading fiction (and flower/older fiction at that). Also, keep in mind that this test doesn't include field-specific technical jargon.

21,400. My native tongue is Tamil too. In the second list I was surprised at some of the words, which I've never ever seen before.

The lowest on this site has less than 14k words (guess how I reached this conclusion).

I scored only 20400, but it makes me ask myself: Perhaps I was being too honest? There were certainly words I'd seen before, and could make educated guesses as to the general meanings of, but I chose not to check those off.

I'm Canadian born and raised, with English as my first language. Honestly, I'm surprised to be told I'm that far below the median and average.

"Is English your first (native) language, or a second (non-native) language?" my english is actually my 3rd non-native language :)

It seems to be fairly accurate. Maybe it isn't. My score: http://testyourvocab.com/?r=35822

Edit: I just had a look at the median word count for adults who took the survey. It's around 27,000. I wonder whether that's true or not.. it seems to me that I'm lacking.

Methinks these survey takers doth be liars


Putting aside all the comparing. Many non-native speakers here say they read and watch many things in english. I do that do and I'm quite positive that 95% of all the reading and listening I do each day is in english.

Now with a low score of 17500 I wonder, if it isn't enough to completely endulge oneself in the language, what is?

Of course, watching the Simpsons all day won't teach me some of the rarely used words. But there must be some stepping stones. I still haven't read Wuthering Heights because I don't want to have a dictionary lying around just to understand the story. And looking up something, reading on and forgetting it at the end of the day is quite common for me.

Also I'm sure that 15 year old americans haven't read that many novels, still their vocabulary is supposed to be larger than most of the well read non-native speakers around here.


Sorry, couldn't resist. For a minute I thought I'd missed out on an alternate spelling. Then I realised that was unpossible ;0)>

Too bad, I'm not from Ohio. But this proves my point. Improve spelling? Increase vocabulary? It seems that swimming in the language isn't enough.

39,000 I use more of the words on page 2 that I should, I'm probably unbearably obnoxious to be around...

I was really surprised by the result of this. I was flipping through a friends learner's Chinese-English dictionary that claimed to contain over 150,000 words and in a couple minutes of thumbing through it I didn't find any I didn't know. But on this test, I didn't even get 50,000. Then according to the info at the bottom, the median was far less than that.

Honestly I think the evaluation method is terrible. My collection of sci-fi/fantasy books alone probably contain over 100,000 headwords. A single biology text book might be as many as they claim the median person knows. Avid WoW players would similarly destroy the curve (if the test included the kinds of words they'd know instead of archaic religious words),

What your friend's dictionary contains a hundred and fifty thousand of are almost certainly not the same thing that the OED contains only a hundred and seventy thousand of. Take a look at a random OED page sometime and see how many words you don't know.

I only checked words that I can use in a sentence: left one blank on the first set, a handful blank on the second set.

42,500 (http://testyourvocab.com/?r=37216)

Apparently the OED has 7 times more words I don't know. That's offal...

"You can lead a horticulture, but you can't make her think." -- Dorothy Parker

That was one of Dave Sim's favourite lines. I think he thinks he made it up himself, though.

I'm a non-native speaker and I fare pretty well with reading stuff but I'm a bit chocked at my result (< 19k).

The thing I find a bit funny is that of all the words I didn't check I've seen almost all of them in books and articles. When I see them in a sentence and in context I do understand them fine but I can't give a definition for them.

I wonder if this is common when reading another language? It might be a better idea to look up the words in a dictionary when seeing them but I just can't be bothered, after seeing them in context a few times I can usually get a feel for their meanings. There are a few exceptions to be sure, adjectives are particularly bad at this.

Got 21,500 or so. Even as a non-native speaker I was slightly disappointed with this result. Many of the words were just ridiculously obscure and esoteric, and I haven't even seen some of them anywhere in the literature I read.

37,700 http://testyourvocab.com/?r=36826 without cheating and not counting ones that I only thought I could puzzle out.

Maybe working on that English Minor is panning out...

I didn't see any science-related words when I took the test. Not sure if this is because of the process by which they made the word lists, or because there truly are not many common science-related words.

At http://testyourvocab.com/details.php#which_sample they explain how they chose the words the test is based on:


Too limited. Words that are specifically American or British (in meaning or spelling), or slang, or scientific/medical, or anything labeled archaic, or anything else that isn't part of broad, general English. Also, no animals or ingredients, which depend too much on where you live.

>or anything else that isn't part of broad, general English

So scientific words aren't general English? This suggests that the authors don't think anyone ever talks about science?

I'm guessing they selected words from a large corpus of written English based on their relative frequency.

It isn't surprising that there isn't any technical vocabulary. Most technical vocabulary falls into one of the following categories: a) Acronyms b) Overloading of existing non-technical words c) Names and other proper nouns d) Phrases longer than one word

There's actually an argument for excluding highly specific vocabulary (some corpuses explicitly exclude textbooks for this reason) because knowledge of them doesn't correlate as well with overall vocabulary.

22,700 http://testyourvocab.com/?r=38336

What difference does it make? The site doesn't say what it means in everyday life. I'm guessing if you exclude high achieving SAT vocab nerds, it finds the difference between people who care about the meaning of each word and people who will guess through context because they have no patience for a dictionary. Or people who don't read fancy texts, like the Scarlett Letter for example, after failing to read that I stopped reading books.

29,500, non-native English speaker (but studied in the US). Retook the test and omitted all the words I could not define with total confidence on the spot; the original score was 31,400. The test is peculiar in that the distribution appears to be uneven. Subjectively there is a sharp break between words that one would know from Shakespeare, Tolkien and Dunsany, and words no one would ever know unless they studied the OED. For statistical significance they would need more words.

I just just "5,340" as a result, this is an hint for me that I spend too much time trying to improve my hearing skills and too little trying to learn words outside the domain I mostly use English for (computers, programming, technology, ...).

I wonder if there is some good web site that helps you learning new words. An iPhone application will also work for me, but I need one that is able to also tell me the sound of the word. I searched a bit in the past without good results.

7830 here. (Swiss-German native speaker) Like you I don't have enough exposure to other domains outside of programming and technology.

I'm not sure if vocabulary size matters once you reach around 25,000 words. The words I didn't know were in part because I've never had any need to know them; if I had run into any of them while reading anything written in the past 80 years, I'd be angry at the author for showing off.

When I was young, I thought that if I wanted to be a writer I should have a huge vocabulary... but now, when choosing words/synonyms I dismiss most options because they're much too obscure.

> I'm not sure if vocabulary size matters once you reach around 25,000 words.

This is what I was thinking. I scored 34K, and rarely encounter a word that I don't understand in regular speech or reading. I also know several thousand jargon words, none of which were on that test. I know what I need to know. Memorizing another 16K words to reach Shakespeare's magical 50K (and feel good about myself) would be a waste of precious mental resources.

As someone else pointed out here, Shakespeare's vocab was just over 30k (which was the number I was familiar with).

Along with many of the flaws already adressed, the most relevant for me was "absolutely sure of" and also taking the words out of context. There were a good number of words that I was pretty sure I knew and had I read them in context would have been correct in my meaning and never given it a second thought.

I'm certain that if I took this same test using a base of 'novel in fulltext' vs. 'list of all unique words in the novel', my recog would be FAR better on the novel.

12,900: http://testyourvocab.com/?r=39955

I am not a native speaker and I was happy to get more than >10,000, really.

Most interesting to me were the words I recognized, but weren't sure of their meaning (e.g. malapropism, which turned out to mean ludicrous misuse of a word).

BTW: a nice thing about online dictionaries is they have sound files for pronunciation. e.g. http://www.thefreedictionary.com/malapropism With a plugin, you can highlight and right-click to open in another tab.

Speaking English does not expand your vocabulary. You must read to encounter the larger portion of the dictionary. Does this test select for computer-users who don't read?


If you found this test engaging, then it might be time to give this a go again... http://freerice.com/

Pretty interesting, but I do hope they do not try to extract any meaningful statistics based on this. I guarantee you 95% of the people are cheating.

I scored far lower than I thought I would, and am genuinely surprised given that I tend to write a lot and have always thought I had a decent vocabulary.

I would love to be able to compare my score to what it would have been before moving to a non-English country and learning/speaking a new language. I definitely feel that a large part of my memory is now dedicated to Japanese and not English...

No matter how many I did (I'm fine, thanks), I would love to see a similar system that estimates one's vocabulary using already existing articles.

To put aside the ego matters, I'm curious if there are any interesting correlations for writings published in magazines. For instance, between the estimated vocabulary size and the average price for ads (I bet that there is a huge correlation.)

I got 35,900, and haven't lived in an English-speaking country in 10 years. I often get frustrated at myself when I feel like I'm losing my vocab. There were a couple of words on the list that I'm sure I once knew, but couldn't conjure up the meaning on the spot, so I skipped them. I was never a sci-fi/fantasy fan, but I do like to read literature for fun.

I got ~12K, but I highly doubt the accuracy... I suspect that I know ~1K words, maybe 2K. I remember hearing that people usually use around 100 or 200 distinct words/day.

According to http://math.ucdenver.edu/~wbriggs/qr/shakespeare.html, Shakespeare used ~32K words in all of his works.

I got 17,200. I'm 22 with a Bachelor's degree and I was pretty surprised with how low I scored. Anyone in a similar category?

Same exact score. 22 years old with a Bachelor's degree. It doesn't concern me that I don't come across rare English words in my daily reading enough to know the right column. I also suspect that the average quiz-taker voted yes to words they were contextually familiar with, but couldn't give you a straight definition.

18,100 words, 24 years old, non-native speaker. I've never read a fiction book in English, only technical ones.

14,000; though English isn't my vernacular. Reading more English fiction seem to be a recurring suggestion, guess I should!

to read your post i looked up the word 'vernacular' in the dictionary

I got to be in the 1st percentile :-). But well, considering that I have never read a book in my life I guess it figures.

I moved to the US from Russia when I was 10, and my vocabulary is a pathetic 18,000 or so. Which is funny, because I consider myself pretty well-read. I usually just infer the meaning of new words through their context without looking them up so I don't feel comfortable saying I know the definition of those words.

43,300. I strongly suspect learning Latin and Greek at school helped more than a little with this particular test.

At http://testyourvocab.com/details.php#which_sample they hint at omitting a lot of Latin words:


No cognates or false-friends with Portuguese. This probably knocks out at least half the dictionary, since Romance languages have plenty in common with English. False friends need to be avoided as well, since a Brazilian beginner will see "pretend" and assume he knows it means pretender, which actually means "intend." Interestingly, the no-Portuguese rule leaves the test with a strongly pronounced short Anglo-Saxon flavor.

Even so, many of the most obscure words are potentially guessable by someone with a good knowledge of Latin and Greek. Funambulist, opsimath, hypnopompic. I don't think a classical education will be much help with cantles and williwaws, though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact