I'd be really interested in the percentiles of the non-native speakers. With an alarmingly low 10.700 words, there is not even a percentile for me... And I know that my fluency of English is at least above the median around here (edit: here = where I live).
This also shows how extremely time-consuming it is to learn a second language. I started in school, 10 years old, am moderately well educated (some college drop-out), and use English on a daily basis. I also watch most movies in English (very seldom for people in a German speaking country to do), read some English novels, and also most non-fiction books I have are in English. Internet use is nearly English only.
Still, I probably have the vocabulary of an average 12 year old native speaker. After 17 years of learning and using the language, and at least 10 years of that using it _daily_.
I got a score of 26,000. I'm Finnish, and actually learned French as a second language in school and English as third.
I've never lived in an English-speaking country, but English is so prevalent in Finland these days that I wouldn't be surprised if it gained some kind of official status within the next 50 years. Whether in formal meetings or informal bar encounters, people voluntarily switch to English if there's even one non-Finn present. In my field, this happens nearly every day.
I even use English to communicate with Swedes, even though Swedish is the second official language of Finland and I studied it for 6 years... There's no point in limping through the conversation with my childish Swedish, when it's 99% guaranteed that Swedes speak English.
I was completely honest scoring 16,000, English is my one and only language. Depending on your field/background you could understand more words than this test suggests. Looking up the words that i did not understand coupling them with there relevance and the shear volume of test words or lack thereof makes this test inaccurate in my mind.
Last year I met and associated with a Finn while traveling in France. He spoke extraordinarily good English and could have passed for a native of the USA with very little accent correction (it was more the timing of his speech that gave him away). I wouldn't have thought him to be exceptional until I discovered that I was the first native English speaker he had ever met and he had never even left his country before then.
I wish I had been given language instruction of that caliber when I was in school.
This vocabulary test got me thinking about the last unknown English word I'd encountered, and that inspired a little post on my blog. Have a look if you're interested -- there's a bit of a coding angle, even:
The Oxford Advanced Learner's dictionary has a 'defining vocabulary' of about 3500 words. They claim most definitions are limited to the use of those words and proper names. So, that 6000-8000 looks believable.
Even writers of high-brow literature use a much smaller vocabulary than the one they can read. Probably more than the newspaper, but not likely 35,000 words.
For non-native speakers, I would guess that there are quickly diminishing returns past about 10,000 words. Much more important is how well you're able to use those first 10,000 (or maybe even 5,000). It's one thing to recognize a word when reading it, and another to be able to use it in a natural way, choosing words with the right shade of meaning, and avoiding awkward constructions or unintended connotations.
Note that in Chinese, character is not the same as word.
Most modern Mandarin words are bisyllabic, and represented by the combination of two characters, and their meaning does not necessarily derive in a straightforward way from the meanings of their constituent characters (for example, the word for "thing" is composed of the characters for "East" and "West" in conjunction)
Definitely. I would have liked the test to be "would you be comfortable using this word in a sentence? (if you had to)" Even as a native speaker there are times when I know the definition of a word, but opt for a simpler word because I don't know all of the implications of using it.
Right, unless you want to ruin your point entirely by stopping to define words. Communication relies on shared vocabularies, not just one person's vocabulary.
There are plenty of words I know that I generally avoid in conversation for practical reasons. Entertainment reasons, though... I caught my wife teaching our not-yet-two-years-old daughter to use the word "etiolated" when referring to a green bean that was wilted and yellowish.
I am Dutch, and the estimated vocabulary is 15,600 words. It's really humbling, but on the other side surprising that one can participate in scientific discussion with such a small vocabulary. But I bet the distribution is not very similar to an American kid with a vocabulary of that size ;).
It's easy to notice the difference in vocabulary in practice. Native speaker's speech is much more varied.
Also, it's maybe good for native speakers to realize that someone who may seem a bit rude or dumb on the Internet may be a non-native speaker who has difficulty expressing him/herself.
This is actually not surprising since word Frequencies more or less follow a power law probability distribution. With a vocabulary of even 1,000 words you are very well covered. Anyone who engages in a discussion with words that are not very common is doing the very opposite of trying to communicate.
I choose words on the basis of how precisely they convey my desired meaning - which may exist at multiple levels, eg using esoteric words when talking about esoteric words. It seems to me somewhat anti-elitist of you to decry those who enjoy taking such care.
(I scored 35,300; of the words I recognized only about six were ones I don't either read or use with some regularity. I read a fair amount of literature though.)
The difference between 10,000 and 30,000 is knowing the words "uxorcide" and "tricorn". Before this test, I've never seen the first one and the only time I've seen the second one is in reference to pirate hats.
Frankly, I'd rather have 10,000 word vocabulary in multiple languages than my current situation.
That's for Spanish, though, which generally uses a somewhat smaller day-to-day vocabulary since it doesn't have the same weird Romance/Germanic thing that English does.
Most languages end up having about the same information bandwidth, but make different tradeoffs between the number of syllables per second and the entropy per syllable. Spanish and Japanese, for instance, tend to be spoken rapidly and have a more regular set of sounds, but languages like English or Chinese tend to be spoken more slowly and have more different sounds in each syllable. I'd imagine that low frequency languages tend to have large vocabularies that they use, but that's just a guess.
Well, in many cases it helps to know multple languages. Case in point: uxoricide. There were quite a few other words I knew from other languages in the quiz (English is not my mother tongue).
That and the amount of words I know RPGs in the quiz really surprised me. How often do you use a bludgeon, wouldn't you use a bat in stead?
Finally, there's a huge difference between active vocabulary --the words you actually use in speach or writing-- and passive vocabulary that is tested here. In my humble opinion, passive vocabulary should be tested in context of a sentence.
My background is roughly similar to yours (German, 9 1/2 years of English in school, been living in the US for 10 month) and I got 10.900 words.
Especially seeing the second and third page surprised me as I can't recall ever hearing most of these words. My girlfriend noted that many of them are older terms, but still I expected to have picked up some through movies or books.
On the other hand: studying and working in the US with just 11k words works quite well.
I'm a French native, began English when I was 13 (German is my first "school language") although I used basics of English before (to play some games, use DOS). The test estimates my vocabulary size to be 21,100. There is _no way_ that I know that much words. I think it's biased a lot for foreigners, particularly French ones (which gives you a lot of formal English words "for free"). And even more when you have taken 3 years of latin, 1 of greek.
I only got an estimate of 11500 words, yet I can still correct you :). My native language is (Swiss-)German, spent a couple years in Montreal where you don't really have many chances to improve your english (at least not the vocabulary), otherwise only used the language in written form.
You hit the nail on the head. There is a huge difference between knowing the general meaning of a word and being able to exactly define them. Simple examples: is a cow a deer? Why (not)?. Is a banana a berry? Is a blueberry a berry? A strawberry?
I used the former because I thought it was what people would do (as it is easier to do; people will not what to spend 30 or more seconds on each word), and got 26600. That is below, but about average. I am not a native speaker, but would have expected to score higher in the general population.
I am fairly sure that this test does not reach the general population, though (by the way, it is nice to see that the test adapts to one's level. Try answering almost none of the words on page 1. I got 'my' score down to 28 by confessing to know only one of the words)
"Based on previous research, Nation and Waring (1997) estimate that the receptive vocabulary size of a university-educated native English speaker is around 20,000 base words, while Goulden, Nation, and Read's (1990) intervention indicates that the receptive vocabulary size range of college-educated native English speakers is 13,200 - 20,700 base words (Goulden, Nation, & Read, 1990), with an average of 17,200 base words."
I'm a non-native speaker and got 18 700 words, which I found rather disappointing when I read that that was, according to the site, roughly equivalent to (or even below) the average 15 year old native speaker. Thinking a bit about it however I'm quite sure that that is nonsense - I am regularly asked by native speakers to review their English texts and scholarly articles and am relatively often commented on my broad vocabulary. When people review mine and I push them on why they suggested certain grammatical or stylistic changes, almost invariably it turns out that they are influenced by personal style preferences or local customs (as in, local to the area they grew up in). Now I'm not God's gift to the world and I'm sure that there is much to be improved on my English, but still I'm quite sure I can outperform the 5th percentile of the general population on English vocabulary knowledge. (I mean - that's people with an iq of 80 or less ffs, again not to say that I'm a genius but I find the proposition that I would score worse than most native speakers that qualify under many definitions as mentally retarded to be preposterous).
I'm highly skeptical of the site's claim that the 50% percentile knows 27k words. It seems to be from their own test takers, and I didn't find any references to other researcher's results, of which there are several.
Speaking of animals, how is names of plants and animals counted? Is that part of a vocabulary, or is that special?
I'm also not a native English speaker, and one area where I know there's a huge difference between my Swedish vocabulary and my English is when it comes to names of plants and birds and spices and animals and trees and fish and rocks and flowers and vegetables and fruits. I know maybe thousands of names of such things in Swedish, but in English I know much fewer names. That's a few thousand words I lack and probably will never learn because it's so specialized.
(Btw, bananas are berries, but cows are not deers. :) )
> (Btw, bananas are berries, but cows are not deers. :) )
Not sure if you're joking or not, but to be clear to non-native speakers: bananas are definitely not berries. Berries are smaller and rounder. For example: blueberries, raspberries, blackberries. I guess strawberries, too, but they're outliers.
Site creator here, again -- my research (small sample size) having Brazilians take the test is that a couple years of learning will get you around 1,500-3,000 words, several years around 4,000-6,000, and a good student with, say, 8 years of classes might get around 10,000.
Beyond this, it's pretty much necessary to live abroad for an extended period of time, or be exceptionally good (driven) at languages and watch TONS of TV, use tons of online chat, etc.
English is my second language and I got ~13,500 words. I was surprised that this was only half the average. I read a lot of scientific papers in my field and almost never come across a word I don't know. Writing technical papers well also requires not much vocabulary (it's more about clarity of expression). I believe to get beyond that number you simply have to read lots of books. I notice lots of words I've never heard of when reading, say, Bill Bryson but I very rarely bother to look them up because they usually make enough sense from the context.
But the average 3-year old Chinese speaks infinite better Chinese than I do (I can't even pronounce 'ni hao' correctly I'm sure, despite my colleagues kind praise of how I say it very understandably); I don't see how that's humbling. Language knowledge is imo rather useless; languages is a very inefficient idea transfer mechanism, and the amount of energy we collectively waste on translations and learning languages is staggering. I see languages as an unfortunate side-effect of human's poor natural communication traits, and I hope that we can do away with most of them asap.
That would be rather hard, even if we had neural interfaces - I hope to do away with most languages so that we have only a few (ideally one) left, and hopefully those wouldn't be too dissimilar either.
I'll admit though that that point was rather tangential to the post I was replying to.
I'm a native speaker. I read a lot. Fiction, nonfiction, literature, etc. I've got a master's degree in CS. I'm a published author at multiple respected conferences. I've been learning Spanish as a second language for ~2 years.
I scored about 33,000. I don't think the distributions are very accurate. I can't speak to where you should be, but I think you have an excellent score. I hope this point of comparison helps.
I also got 33K, ~80th percentile. But, I got 98, 99th percentile on the Verbal section of my GREs (which has a lot of vocabulary, though it isn't only vocabulary). I find it hard to believe there's that big of a gap, though I suppose a lot of foreign applicants could drag down the GRE average, pushing me up. But, it's also possible that people were liberal with their claims to understand words.
The biggest difference between natives and non-natives is the amount of "passive vocabulary". Sure, we non-native English speakers can express ourselves well enough, we can participate in discussions, even on a high scientific level, as long as we're familiar with the problem domain. But because we've (I'm generalising, of course) read fewer English books, been exposed to the lingo for a lot less than natives, we're limited in our passive vocabulary. Having been exposed to a language for every waking hour of your lifetime doesn't just enable you to speak it better, it also teaches thousands of words you never use, but nevertheless know the meaning of.
On a side note: this makes me feel better about my 25k score ;-)
11.900, Norwegian. Worked in a multi-nation company where I spoke/wrote English daily for several years, and read books in English regularly. I guess this explains why I have to use the dictionary in the Kindle a lot, and probably should use it more.
I have a good English. I can write emails, communicate with clients, write articles and blog posts. However, I noticed that I write only in IT related things. If you learned English for a purpose, then you are going to have a rather smaller vocabulary, but it'll be dedicated to the purpose you learned it for.
I got 5,490 words. May be my vocabulary is larger, but it's not broad: it targets specific fields.
While the exercice is interesting in that it is humbling for non-native speakers, it does seem dubious. I repeated the test 3 times, and got widely different results (10400, then 16700, then 18300). Since they are all quite outside the reported quantiles for native speakers, it may be expected to have high variance for non-native, but that means the numbers are not very useful as is.
I can't confirm your observation: my first attempt got me 11,500 words, the second 11,800. Of course I didn't check the words that I learned from the first run (I used a dictionary to verify whether my understanding was correct when in doubt, and to find out what some of the words mean that I didn't know).
I got 30800, and I am a native Dutch speaker. Though I have to say that I consider English my co-native language. Most of the media I consume is in English, and in my work environment (IT) English is the default language.
But most of the words in this test I knew from reading books, as it is one of the few places where using a very wide vocabulary is not frowned upon.
Spanish here. Got an estimated vocabulary size of 17,7000 words. I lived and worked in the U.K for a couple of years. I think I might be in a better position than some of you when it comes to words derivated from Latin, as they are almost always the same in Spanish. Leaving those out I am sure I would get similar results.
I got 37,300. They claim this is not quite 95th percentile, which I am a tad skeptical accurately represents my vocabulary-size percentile relative to the general population. Perhaps this survey is being forwarded around unusually literate people at the top end, or more than 5% of responders are cheating. Where are the fake words to catch cheaters? I Googled a lot of what I didn't recognize, and everything I checked was real.
I just counted, and you've used at least 12685 dictionary words in HP:MOR . I was expecting it to be more to be honest, but it didn't seem right not to post just because it didn't match my expectations.
It certainly seems unlikely that someone who can produce an excellent 500,000 word work of fiction (aside: thanks very much by the way,) in addition to reams of technical writing, has a vocabulary not in the 95th percentile of the population. OTOH, HP:MOR has fewer words in it than I expected, and even the upper bound of 14795 seems low. Maybe the working is wrong; it's shown below.
There's a huge difference between writing something targeted at a selected audience, in a given time period, limited range of topics the characters will discuss, etc. and listing out all of the vocabulary words you know personally across all domains of knowledge. Even though HP:MoR Harry has a broad vocabulary, for example, and may use some words not all readers will know, there are still tons of topics (with their own specialized vocabulary...) that will never come up in the written storyline -- even if Harry would know them well.
Harry also presumably does some practical limiting of his vocabulary in conversation, because only shared vocabulary is useful if you're trying to actually communicate and don't want to stop to give definitions all the time.
It might be more interesting to compare the unique word count of HP:MoR against some of the "real" Harry Potter books, if you can get your hands on the text.
Thanks to the internet†, I can reveal a surprise: with the same methodology, the dictionary words in the seven real Harry Potter books concatenated is 19,245 and the total unique words is 21,441.
Total word count is 1,122,131 which is longer than HP:MoR by a factor of three. Plotting mean unique word count for the whole, halves and quarters of MoR gives a fit of uniques=168*length^0.3357, which makes sense given Zipf's law. That formula predicts about 18,050 words for a work of the same length as the original HP.
(Edit to add obvious test in the other direction.) The first 386,829 words of the original HP contain 12,255 unique words. The last 386,829 words contain 13,635 uniques. So, its comparable but perhaps slightly more varied (MoR had 12,685).
In light of those figures, is it possible Eliezer's vocabulary is less good than he thinks (Dunning-Kruger)? Especially as the Harry Potter book were written for children and presumably edited as such.
On the other hand, the fact that Eliezer seems to have used fewer words in his writing than you'd expect if his vocab was excellent doesn't mean that his known vocab is poor — he might just not use all the words he knows in writing.
Additionally, given the success of J K Rowling as an author, you might expect her vocabulary to be excellent, so it is conceivable that he's good and she's better.
† I have all the Harry Potter books on a shelf at home. Is torrenting the pdfs at work so I can word count them infringing copyright? I could have done it manually, it just would have taken longer.
I thought Eliezer's Lesswrong sequences might give different results. Applying your tests to those (from http://jb55.com/lesswrong/), I get 257,646 total words, 11,666 unique dictionary words, and 12,721 unique words (I'm surprised there aren't more unique words, given that the quantum physics sequence is in there).
168*257,646^.3357 = 11,010, so the sequences seem to be at about HP level.
Excellent work, by the way; thanks for the analysis.
Site creator here -- you're right, survey participants are incredibly literate. I suppose that's Internet users in general, disregarding YouTube commenters :), or else the particular people who have spread the test, or are interested in taking it. Average verbal SAT score on the site is 700 (out of 800), far above the population's average of around 500.
Right; most people with "normal" or worse vocabularies are emphatically not going to see a "test your vocabulary" link and say "Hey, instead of doing something fun, let's see how poor my vocabulary really is! Then I'll tell all my friends!"
And when they see some egghead friend on Facebook has posted their vocabulary score and is challenging them to respond... they'll roll their eyes, and move on to their Farmville updates.
Don't get me wrong -- I love these things, and it came back with 37K for me -- but there's no way I'm posting that score, or even the link, to Facebook. I know how to maintain friendships, and saying "look how smart I am; I'm probably smarter than you" does not figure into it.
I can't imagine that the percentiles are reflective of the general population...I got 27,000 which it claims is the median score, and from practical experience my vocabulary is quite a bit larger than most people's. If this is measuring, say, the HN crowd, then perhaps it's accurate.
I found the same thing. I suspect a combination of the early respondents being quite a bit above average and the possibility that some people are checking words they don't actually know, or simple believe they know a correct definition for.
I think it would be a good idea to weight the survey with some test questions that ask if you know a definition to some of the less common words and then ask you to pick a correct definition from a list of 5 with 4 incorrect answers. At least this way they can approximate how much someone may exaggerate their knowledge.
However as someone who answered as honestly as I could (without spending the time to verify my definition of each word) it is cool to know what my personal vocabulary is.
Likewise. I answered honestly and got a little over 22k. I've always tested extremely well on verbal portions of standardized tests and feel that my vocabulary is well above normal but this would put me at about the 20th percentile.
Let's think about it logically. If you were to actually test the users instead of asking them 'which of these do you know?', the ONLY option is multiple choice since analysing text-field input from the user to determine if their definition is correct is at best extremely difficult, and more than likely impossible ... right ?
I agree that the quality of the result is highly dependent on the honesty
of the person tested, but with multiple choice, wouldn't you be able to
deduct the meaning of the word from choices you are given?
E.g.: One of the words I encountered in the test was 'terpsichorean'. While I knew that Terpsichore is one of the Muses, I did not know which one and left the box unticked. Had there been multiple choices, I might have guessed the correct solution.
True, but it's still much better than just relying on people submitting accurate results without even testing them ... at least with multiple choice everyone is being tested to approximately the same metric, and you can get a more accurate percentile
I ticked 'terpsichorean', because it made me think "dreamy travelogue writing of a scenic beach with either terpsichorean sky or sea, it means X looks a pale shade of blue-green".
Google tells me it means dancing so I was way way off (maybe conflating turquoise and cerulean?).
But I have no way of knowing how many words that I feel comfortable defining are actually nowhere near correct, so to be any kind of accurate, they need to do some verification of correctness. All 'honesty' means is 'don't deliberately cheat' not 'don't be dumb'.
I had "discomfit" as one of my words that I wasn't clear of the definition of, I was pretty close when I looked it up but couldn’t have guaranteed it. It's probably easily confused with discomfort ... which made me think that this needs to be a little more tested. Commonly misread words could easily inflate scores.
However, I think a multiple choice test could also inflate scores unless the definitions were very cunningly constructed.
I scored 75-80th percentile (32,800) which surprised me. It seems quite a lot of words, for one. For another I consider my vocab' to be very good and I don't think I'm being bigheaded in that. Ergo I expected to be ranked higher.
On the second page there was an entire column of words of which I recognised only three sufficiently to provide a guaranteed accurate definition. One of that column was terpischorean, another tatterdemalion.
Whilst looking up tatterdemalion I found little use of it after the 1930s except as a proper noun (a Marvel Comics character for example). What I did find however is that Google Books is useless for finding dates. One citation from an author Sir Edward Bulwer Lytton is given a date of 1999. That's a reprint date, the author died in the 19th century.
Eliezer, I think you appreciate some of the ideas behind the FAQ I'll repost here with adaptation to the current situation:
VOLUNTARY RESPONSE POLLS
As I commented previously when we had a poll on the ages of HNers, the data can't be relied on to make such an inference. That's because the data are not from a random sample of the relevant population. One professor of statistics, who is a co-author of a highly regarded AP statistics textbook, has tried to popularize the phrase that "voluntary response data are worthless" to go along with the phrase "correlation does not imply causation." Other statistics teachers are gradually picking up this phrase.
-----Original Message----- From: Paul Velleman [SMTPfv2@cornell.edu] Sent: Wednesday, January 14, 1998 5:10 PM To: firstname.lastname@example.org; Kim Robinson Cc: email@example.com Subject: Re: qualtiative study
Sorry Kim, but it just aint so. Voluntary response data are worthless. One excellent example is the books by Shere Hite. She collected many responses from biased lists with voluntary response and drew conclusions that are roundly contradicted by all responsible studies. She claimed to be doing only qualitative work, but what she got was just plain garbage. Another famous example is the Literary Digest "poll". All you learn from voluntary response is what is said by those who choose to respond. Unless the respondents are a substantially large fraction of the population, they are very likely to be a biased -- possibly a very biased -- subset. Anecdotes tell you nothing at all about the state of the world. They can't be "used only as a description" because they describe nothing but themselves.
I think Professor Velleman promotes "Voluntary response data are worthless" as a slogan for the same reason an earlier generation of statisticians taught their students the slogan "correlation does not imply causation." That's because common human cognitive errors run strongly in one direction on each issue, so the slogan has take the cognitive error head-on. Of course, a distinct pattern in voluntary responses tells us SOMETHING (maybe about what kind of people come forward to respond), just as a correlation tells us SOMETHING (maybe about a lurking variable correlated with both things we observe), but it doesn't tell us enough to warrant a firm conclusion about facts of the world. The Literary Digest poll
is a spectacular historical example of a voluntary response poll with a HUGE sample size and high response rate that didn't give a correct picture of reality at all.
When I have brought up this issue before, some other HNers have replied that there are some statistical tools for correcting for response-bias effects, IF one can obtain a simple random sample of the population of interest and evaluate what kinds of people respond. But we can't do that here on HN, nor can we for the online vocabulary estimation.
Another reply I frequently see when I bring up this issue is that the public relies on voluntary response data all the time to make conclusions about reality. To that I refer careful readers to what Professor Velleman is quoted as saying above (the general public often believes statements that are baloney) and to what Google's director of research, Peter Norvig, says about research conducted with better data,
that even good data (and Norvig would not generally characterize voluntary response data as good data) can lead to wrong conclusions if there isn't careful thinking behind a study design. Again, human beings have strong predilections to believe certain kinds of wrong data and wrong conclusions. We are not neutral evaluators of data and conclusions, but have predispositions (cognitive illusions) that lead to making mistakes without careful training and thought. Here, the conclusion "those other guys are cheating and that dragged down my vocabulary percentile score" is an example of a conclusion resulting from human predispositions.
Another frequently seen reply is that sometimes a "convenience sample" (this is a common term among statisticians for a sample that can't be counted on to be a random sample) of a population offers just that, convenience, and should not be rejected on that basis alone. But the most thoughtful version of that frequent reply I recently saw did correctly point out that if we know from the get-go that the sample was not done statistically correctly, then even if we are confident (enough) that HN participants are young or that their vocabularies are large, we wouldn't want to extrapolate from that to conclude that the users of any technology site are young, or that respondents to online surveys as a whole have large vocabularies.
On my part, I wildly guess that most HNers are younger than I am in part because this kind of poll recurs often on HN. I similarly guess that participants in online surveys of vocabulary size are likely to have larger vocabularies than average people in the general public because most people I meet find discussions of word meanings boring. But neither guess gives me a good quantitative basis for estimating how much users here differ from the general population.
I'm questioning the way respondents are classified.
I chose 'Canada' as my region since I'm from Montreal. My first language is English and I'm fluent French. I did the first half of elementary school in French. Firstly non-Quebec anglophones tend to have better grammar and larger vocabularies than anglophone Quebecers. Secondly it doesn't take into account that English can be a 3rd language . Most immigrants to Quebec are required by law to attend French language elementary and high schools (there are exceptions). Immigrant children who's first language isn't English or French (the majority) take on two new languages, English being their 3rd after French. English tends to be the social language for many.
Montreal has a strong tech industry employing bilingual/multilingual people many of which read HN and possibly took part in the survey. My gut feeling is that English speaking Quebecers are skewing the stats. More granular control over region will be useful; show some insight to this reality.
Note: I traveled through China and south east Asia last year and found the quality of English to be much better than I expected. Considering Indochina ruled by the French I didn't find a person who could speak it. To possibly classify any country as "non-English-speaking" is kind of silly. Every country is "other-English-speaking" but then again it's a subjective classification isn't it. Doesn't China have the largest English speaking population now...
By contrast, I scored 23,300 putting me on par with the average result for a 15 year old, if this test is to be believed. I was brutally honest about not selecting words I only recognised and might be able infer meaning from in context, but couldn't articulate a clear definition for (of which there were a surprising number).
And yet I look at the vocabulary used in the comment posts of those claiming high scores, and wonder how they ever scored so highly (accepting that comments are not neccessarily reflective of ones general writing or vocabulary). I believe that as this test is so open to cheating, that using responses to it as the corpus for determining median scores renders the entire exercise completely meaningless.
For those of you concerned with your score, you are deovting an undue amount of your time in discussing the results of what is—let's be honest with ourselves—the literati version of a "Are U A Vampire Or A Werewolf?" quiz.
P.S., 36,700 .. I took this before it got a lot of general circulation, and my standings have improved considerably. I suspect this makes me more worthy of oxygen.
I strongly doubt that their methodology has any validity at all for non-native speakers. Extrapolating from a selection of 60-80 words presupposes a relatively normal developmental history; otherwise one would not be able to draw the primary inference at work here, namely that somebody who knows a definition for "mawkish" knows definitions for all words of similar difficulty and frequency.
Atypical language acquisition (e.g., as a second language, or through a non-standard channel like technology or fantasy literature) disrupts this extrapolation step. For instance, a German programmer that knows the word "polymorphic" via OOP is less likely to know similarly frequent and difficult but programming-unrelated words than British or American peers. So adding, say, 100 to the total would be utterly unfounded. Same thing for a science-fiction nerd: Acquaintance with obscure words from one domain doesn't extrapolate to other domains.
Unless they somehow control for domain specificity and atypical acquisition, let's not get too frustrated. (Disclaimer: Not a native speaker -- result around median.)
Hi, creator of the site here -- I worked for several years teaching English to foreigners, and my own (informal) statistical research has found that native and non-native language acquisition, while certainly somewhat different, is not tremendously different. I've run this test on Americans and Brazilians at all different levels, and the size of the progression from known to unknown words is rather consistent.
That being said, cognates between languages can give an artificially inflated score. I intentionally avoided any words with the same roots in English and Portuguese (my second language), which should hopefully also be true for most Romance languages. This way, you shouldn't be able to "guess" meanings you've never actually learned. However, it wouldn't surprise me if German speakers, for example, were able to "guess" an additional number of words correctly.
Also, the 60-80 words are only what you are tested on -- the word selection on pages 2-4 changes depending on your answers on the first page, so it attempts to narrow down your vocab knowledge "at the margin".
Non-Native speaker here. I found out that in almost all cases i knew a synonym of the words is didn't know. The problem is these are the ones that are used very frequently so i am rarely exposed to the other forms. While i know high literature in German, i must admit i just have read non-fiction in English so far. This is really a shame and i am going to change this.
This test made me aware of that fact, regardless of it's scientific accuracy.
I back you up.
Non-native english speaker here. My result: I know 10,600 words. I really doubt it, I have an english dictionary that contains about 4,000 words and I don't think I need another dictionary. Plus my computing terminologies, 10K words is still way beyond me
I'm not sure it's valid for native speakers, either.
Your basic vocabulary will be the same as most other speakers because we have structured learning in our schools.
But the less-well-known words are generally learned in context... So if you haven't happened across them in a book, you probably don't know them... But there are probably many others that you do, instead. This test could easily hit a bunch you don't know and totally miss all the ones you do.
"Don't check boxes for words you know you've seen before, but whose meaning you aren't exactly sure of."
This is a bewildering instruction to me, since I've learned most of my vocabulary through reading, and rarely look up the definition of a word, instead learning its meaning through repeated exposures to its use in context.
Take for example "garron" -- like many of us I've been reading GRRM lately, so I've seen the word used some 78 times in the past few months, and I'm sure I've encountered the word a few dozen times before. I know it's a slightly undesirable horse of some kind. Likely this means it's a gelding or a small pony-ish horse. Do I need to have looked up and remembered the three specific submeanings of the word, or that it's a specific breed of horse from Galloway to be able to say I am "exactly sure of" the word?
I don't think that's how language works, but it's how this test seems to want it to work. My score of 35,300 is suspect on multiple levels.
Something often missing from vocabulary tests -- and arguably more impressive than raw, crystallized vocab -- is the ability to deduce the meaning of new words on the fly, whether from context or absent any context (in which case, the meaning is ascertained from roots, prefixes, suffixes, etc.). I'd love to see a test of this skill.
For instance, I doubt many people encounter the word "reck" in their daily lives. But I'd give props to someone whose brain would quickly draw the connection to the more frequently seen "reckless," thereby deducing that the word had something to do with caution or concern.
Similarly, your point has some validity. Most of the time, we don't acquire language through reading the dictionary; we acquire it primarily through context in the course of reading or conversation. This is why, when pressed to define words, we'll often reach for a string of synonyms, or else provide usage in sample sentences. I bet few people here, let alone anywhere, could render dictionary-acceptable definitions of 99% of the words they know.
(On the flipside, this is also why we forget most of the words we crammed in preparation for the SAT back in the day; we learned them completely out of context and in an artificial way).
lots of people here are saying they scored lower than what they expected, and that maybe other people cheated. that could be it, but it could also be that hacker news folks tend to be overconfident. this would match the stereotype of this group being mainly male nerd entreprenuers, which could score worse on things like this but perceive themselves to score much higher (a feeling not a fact backed by studies that i can remember). who knows; just voicing this thought since no one has mentioned it yet.
I got just under the median on this test, but I scored around the 99th percentile in the SAT verbal, and I feel it's reasonable to say the two test approximately the same things. It seems unlikely that I've slipped so far in just two years :)
I never really considered Hacker News to be full of overconfident people. If anything, to me being part of HN is a humbling experience. It reminds me that there are so many people out there that are smarter or more experienced than me, and I've seen other people say similar things.
Hacker News is _very_ full of over-confident people, believe me.
During a discussion of 'whether open source contributions were being overly important to job seekers' a while back, a surprising number of HN commenters automatically put themselves in the role of employer, peering dubiously over their glasses at me. Many of these commenters were hilariously under-qualified to be taking on that kind of role with respect to anyone.
I don't think SAT verbal is nearly as intensely focused on obscure vocab. I didn't do it, but did do a GRE verbal back in 1994 to get into grad school. For what it's worth, my score on this and my GRE verbal were quite consistent.
I'm pretty much in the same boat. This scored me below median, but I scored in the 99th percentile on the SAT verbal as well (although that was a decade ago for me).
I think one contributing factor is the structure of the questions. If you ask me if I KNOW at least one definition for the word mawkish for example, I'll choose no I don't know a definition of it. I do however have a good enough feel for the word, that I can almost guarantee I can get a SAT style analogy with it correct, or if you gave me a multiple choice selection of definitions I can probably pick out the right one. I don't consider either of those skills equivalent to actually knowing a definition of a word.
Same with me (SAT)... It says I have below the median vocabulary. They should have at least had some sort of base standard to compare the voluntary data with so that they could measure their sample drift.
i scored just over 90th percentile and i promise i didn't lie. it seems to me that a lot of the words were very old-fashioned - they were the kind of words i learnt from context while reading books by people like dickens. they're not words i would use in normal conversation, or when writing.
anyway, what i'm saying is that i suspect sat verbal is testing something quite different - isn't it much more aimed at ability to use the language rather than whether you can recognise obscure words?
[what surprised me the most was how graded it was - on both tests there was a pretty clear cut-off point where the words became unknown. i didn't expect it to be that ordered.]
That's possible, even likely. Another explanation would be that tests like this tend to be biased towards people with very evenly distributed interests, preferrably with a literary, classical bias or background. Someone who knows 10,000 words from one or two specialzed disciplines will always score worse than someone who diligently read through all the high school text books and topped it off with some gutenberg.org.
I doubt that many people are cheating, but the "percentiles" are basically meaningless because the survey obviously self-selects for 1) people who are on the Internet, 2) people who are on Internet sites where this survey will get reposted 2) people who suspect they have pretty good vocabularies (who is going to take a non-mandatory 'test' that doesn't even earn them any Facebook Credits to inform them they aren't so bright), etc.
End result: substantial inflation of scores relative to what you'd see if you gave the same survey to the general population (and assuming everyone is relatively honest).
Just as a test, ticking all the boxes scores 45,000 words. Which seems to indicate that they haven't seeded the quiz with fake words to weed out cheaters : Pity, since there was an opportunity to unbias it in at least one dimension. (I also tried deselecting just 1 of a few of the really tough words : Each one caused the score to lower).
Site creator here -- you're right, there's no cheating detection. I ask people not to fill out the survey results if they're not being entirely truthful, of course they are free to disregard the instructions.
But I created the site, less interested in absolute vocabulary size numbers (these vary widely depending upon methodology), and more in relative changes among age groups, SAT scores, etc. And hopefully, cheating would not be correlated to any of those...
But at the end of the day, this is not a controlled, scientific survey. It is a voluntary quiz, though I am doing my best to control for other factors.
I'm really, really hoping that you're logging referrer data along with scores - if you're not already, you might consider starting. Even if this is not a source of proper scientific data, it would be highly interesting to see how, for instance, the Reddit referrals do vs. HN vs. Facebook, etc. We can probably all make some fairly accurate guesses about where the high and low performers would come from, but I'd be real curious to see the exact breakdowns.
Obvious point : The only people interested in finding out their scores will be the kind of people who think their vocab is something worth competing on. There's no way this is a fair sample across all English-speakers.
And yet not a slythy tove in sight, gyring or otherwise.
The thing that always surprises me with this sort of test is how many words I half-know: I recognise the word, I have a (correct) general sense of what it means, but either I couldn't articulate an accurate definition or the definition I would give is an uncommon usage and I hadn't come across the more common meaning(s).
Following the rules strictly (so I didn't tick the half-known words), I came in slightly below average for my age, assuming the trend continues beyond the 32 years that their table currently gives.
For comparison, if I also included the words I half-knew, I gained nearly 4,000 new words and went up about 20%.
ObScore: 36000, 32yo male US-english native speaker. There were a few more I'd seen but wasn't really sure enough to define. I maxed out the SAT verbal section (and got 1 wrong on math, which was enough to drop to 780/800) back in the 1990s.
Don't care too much for percentile and age stats at the end.. clearly doesn't represent general pop.
What IS interesting tho, from a language learners perspective, is the vocab size estimation. A metric a lot of us use as a rough benchmark of vocab needed for fluency in a foreign language is 10,000words.
Comparing this with what an educated adult native speaker knows in their own language (using my own truthful score of 24k) is pretty interesting.
Would love to have something like this to quickly gauge my vocab in other languages!
The psychology of these things is interesting to me. My reflexive reaction was, of course, "I have to know!" and then my immediate counter-reaction was "This is just intellectual phallometry and is ultimately of no consequence to me."
Of course, I very quickly rationalized away the counter-reaction and took the test anyway, and then considered sharing it with my friends. What drives this?
I scored 38500 - seemed to be a test that would be helped by reading a lot of older fantasy literature, where 'terpsichorean' and 'turpitude' (to give a couple 'terp' examples that spring to mind) are the sort of words that authors like Jack Vance liked to wheel out in order to create a mood.
I'm not sure that the people suggesting that the failure to correlate with the SAT adds much; I don't think the SAT really goes all-out of the more flowery bits of archaic vocabulary in the way that this test did.
My 3rd grade son got 10200, and enjoyed discussing the words he didn't get. I think every 3rd grader should know "mawkish". :-)
37,100 I'm ashamed I didn't do better. I'm considerably older than the average HN reader. I did degrees in 4 different subjects (mind you, I was classed officially as retarded at my high school - in the same classes as the arsonists).
So no-one should feel the score is that important. I'm a very mediocre programmer. I'd much rather halve my vocab score to double my maths ability.
I had a phase around 7th through 10th grade where I thought learning lots of vocabulary would make me smarter, especially words others didn't know well. (And so I'd use them in English essays for Extra Points since your grades are often determined by how little sense you make, because if the reader doesn't understand it obviously it's too smart for them!) I also had a general grammar nazi-ism.
Anyway, I think this exchange kind of tipped me over the edge to stop caring. (Of course that's led to forgetting a lot.)
William Faulkner, on Ernest Hemingway: "He has never been known to use a word that might send a reader to the dictionary."
Hemingway: "Poor Faulkner. Does he really think big emotions come from big words? He thinks I don't know the ten-dollar words. I know them all right. But there are older and simpler and better words, and those are the ones I use."
Of course, having some background in French and Latin probably helps for inferring a few words.
Ah, cool I know 80 english words. There must be something utterly wrong with how this test works in opera mini, clicking on continue on page two brought me to page one, going back and clicking continue again gave me just a subset of the choices from page one... (at least I guess it is 80, the number was displayed right over the middle of the word "words" in the result captcha).
Just from getting some friends to do this it seems to me the median score overall and the median score for each age are a bit inflated. Just my thoughts, but I think people aren't being 100% truthful. Although I may just have a poor vocabulary http://testyourvocab.com/?r=36208
"You will never become proficient in a foreign language by studying vocabulary lists.
Rather, you must hear and speak (or read and write) the language to gain proficiency.
The same is true for learning computer languages."
Coincidentally, I just happened to come across this quote in Peter Norvig's "Paradigms of Artificial Intelligence Programming".
I was actually a bit surprised, not by the specific number of words (that seems reasonably fair given some statistics posted above on this thread), but on the percentage ranking. I'm a doctoral student whose vocabulary score was in the 96th percentile on the GRE and I knew every word when I took the WAIS-IV. Here, my score of 28,300 is just a bit above average. Either people are lying (possible) or this curve is clearly not a normal curve representative of the population (most likely). I'm a pretty avid reader, even though most of the older authors like Dickens bore me (thanks ADD!), but the person that came up with those words is possibly the most voracious reader I have ever met.
I also just realized that taking the test primed me to write in a way more intelligently sounding manner than I usually do. Not an LOL in sight!
I realise that the ego-stroking scoring is the driver behind this site's popularity , but I would also like to see definitions of words that I missed. It's pretty daunting to spend hours copying and pasting words into Google (well, ahem, maybe it is for some of us!)
I felt the same. Also there were some words I thought I knew what they meant, but my definition would only be partially correct.
Also some definitions you can guess at - do these count as part of your vocabulary or not? E.g. Clerisy, can be easily deduced to be the class of clerics, but not sure if I'd be able to say if it was a real word or not.
I think this test is not very telling for NNSs as it doesn't consider specialist vocabulary which many of us have a lot of of, because of how we /really/ learned the language.
When I left school my English was very average. When I started communicating with email with people from all over the world, but mostly the US, it improved a lot in 1-2 years.
When I first went to a congress in the US in my med 20's I was blown away by my aptitude to communicate in that language.
But these were all people from my field. What I'm saying is that the distribution of words pertaining certain subjects in my vocabulary is severely skewed by the field I work in -- visual effects (and IT).
I believe this goes for many NNSs.
>the distribution of words pertaining certain subjects in my vocabulary is severely skewed
You clearly have a very good vocabulary IMO. However, if I may, I think it should be "pertaining to certain subjects". It can be used without the "to" but sounds a bit conceited to this native speakers ear.
I was intrigued by your estimation procedure and was thinking of a way of playing around with it:
(1) Create a random bit vector of size 45,000.
(2) Pick 40 positions in the vector at random and use those positions to estimate the proportion of ones in the original bit vector (so far this is easy).
(3) Select an additional 120 positions and use the more sophisticated procedure to refine the estimate from part (2).
I was wondering if you have code or pseudo-code for how you implemented part(3) specifically how you choose the 120 words for extra testing. It seems there are a lot of different natural ways you can do it. Did you write an academic paper about this?
I had 9 years of English in school. Because of my hobbies I read a lot of stuff in English. I also watched many TV shows and spent about 6 month living in Australia.
Yet I feel insecure even typing this. Knowing lots of words is one thing. But what makes it hard are all the subtleties you have to take care of when building sentences. I also think that I get grammar wrong most of the time.
Another thing is that my sentences are almost always way too long.
As someone else pointed out before: I don't want my former English teacher to read this, either.
Think I have the lowest score here. 16,400 words. English is not my native language but I speak English daily and I wouldn't say my English is bad. Pretty disappointed with the score and also surprised the median is way way higher than I expected.
Edit: And, also to add, I followed 2 criteria for whether I know the word or not.
1. What's the absolute definition?
2. And can I find the equivalent or meaning of it in my native language? (which is Tamil, an Indian language, if anyone cares.)
I think a lot of the words are ones that you will only ever encounter in reading fiction (and flower/older fiction at that). Also, keep in mind that this test doesn't include field-specific technical jargon.
I scored only 20400, but it makes me ask myself: Perhaps I was being too honest? There were certainly words I'd seen before, and could make educated guesses as to the general meanings of, but I chose not to check those off.
I'm Canadian born and raised, with English as my first language. Honestly, I'm surprised to be told I'm that far below the median and average.
19,600. I'm willing to accept this, although I'm not going to lie -- I'm very upset at myself. I'm used to scoring 99th percentile in every standardized test; it's kind of a shock to realize that I'm nowhere near the median of even my age group, let alone the general populace (I'm 20).
That said, I'm currently reading A Dance with Dragons and there are tons of words in this series (A Song of Ice and Fire) that I'm not familiar with. Most of the ones I missed are words I recognize from this series, although since I'm not 100% sure of them, so I left them unchecked.
I must admit, as an American male, a little older than you, my score was also so low, I'm too embarrassed to even mention. All those years of cheating on vocabulary tests (merely by memorizing the words 5 minutes prior to taking the test) in high school did not help. It's one of the those things I look back on in life and regret. Does anyone have any suggestions on ways to catch up ?
edit: forgot to mention, while I am a native English speaker, both my parents tend to speak Polish most of the time, while I always reply in English. I have to wonder how much this had an affect on me.
Read a lot. In fact, buy a Kindle and read a lot. It has a built-in dictionary which is quite decent, and you'll actually look up words that you would otherwise skim past with a mediocre context-based understanding.
I scored a little over 40k on this, and did well on other verbal tests for the general population when I was still taking tests.
I attribute much of my facility to 1)reading fantasy and 2)looking up words I don't know. Since I loathe interrupting the flow of a story, I read with a pencil and make a list of words to batch learning later.
20.7K myself and I'm 24 from the UK. I think the test is a little harsh or doesn't really mean what people would think. On the second group of words, half of those words aren't in popular use any more -- it's almost like a lesson in history.
I'm 23, I took this thing half awake and came in at 24k. I'm not an avid reader. The test seems to pull random words from a dictionary and not words used in a general mannor. Not only that but it messed all the delicious skill-trade grammar I know and love.
I thought in aDwD that "leal" was just yet another Kindle OCR mess-up for "loyal" until I found out it actually really is a word - an old English word for "loyal"! Goes to show... something, I guess! :)
Putting aside all the comparing. Many non-native speakers here say they read and watch many things in english. I do that do and I'm quite positive that 95% of all the reading and listening I do each day is in english.
Now with a low score of 17500 I wonder, if it isn't enough to completely endulge oneself in the language, what is?
Of course, watching the Simpsons all day won't teach me some of the rarely used words. But there must be some stepping stones. I still haven't read Wuthering Heights because I don't want to have a dictionary lying around just to understand the story. And looking up something, reading on and forgetting it at the end of the day is quite common for me.
Also I'm sure that 15 year old americans haven't read that many novels, still their vocabulary is supposed to be larger than most of the well read non-native speakers around here.
I was really surprised by the result of this. I was flipping through a friends learner's Chinese-English dictionary that claimed to contain over 150,000 words and in a couple minutes of thumbing through it I didn't find any I didn't know. But on this test, I didn't even get 50,000. Then according to the info at the bottom, the median was far less than that.
Honestly I think the evaluation method is terrible. My collection of sci-fi/fantasy books alone probably contain over 100,000 headwords. A single biology text book might be as many as they claim the median person knows. Avid WoW players would similarly destroy the curve (if the test included the kinds of words they'd know instead of archaic religious words),
What your friend's dictionary contains a hundred and fifty thousand of are almost certainly not the same thing that the OED contains only a hundred and seventy thousand of. Take a look at a random OED page sometime and see how many words you don't know.
I'm a non-native speaker and I fare pretty well with reading stuff but I'm a bit chocked at my result (< 19k).
The thing I find a bit funny is that of all the words I didn't check I've seen almost all of them in books and articles. When I see them in a sentence and in context I do understand them fine but I can't give a definition for them.
I wonder if this is common when reading another language? It might be a better idea to look up the words in a dictionary when seeing them but I just can't be bothered, after seeing them in context a few times I can usually get a feel for their meanings. There are a few exceptions to be sure, adjectives are particularly bad at this.
Got 21,500 or so. Even as a non-native speaker I was slightly disappointed with this result. Many of the words were just ridiculously obscure and esoteric, and I haven't even seen some of them anywhere in the literature I read.
I didn't see any science-related words when I took the test. Not sure if this is because of the process by which they made the word lists, or because there truly are not many common science-related words.
Too limited. Words that are specifically American or British (in meaning or spelling), or slang, or scientific/medical, or anything labeled archaic, or anything else that isn't part of broad, general English. Also, no animals or ingredients, which depend too much on where you live.
I'm guessing they selected words from a large corpus of written English based on their relative frequency.
It isn't surprising that there isn't any technical vocabulary. Most technical vocabulary falls into one of the following categories:
b) Overloading of existing non-technical words
c) Names and other proper nouns
d) Phrases longer than one word
There's actually an argument for excluding highly specific vocabulary (some corpuses explicitly exclude textbooks for this reason) because knowledge of them doesn't correlate as well with overall vocabulary.
What difference does it make? The site doesn't say what it means in everyday life. I'm guessing if you exclude high achieving SAT vocab nerds, it finds the difference between people who care about the meaning of each word and people who will guess through context because they have no patience for a dictionary. Or people who don't read fancy texts, like the Scarlett Letter for example, after failing to read that I stopped reading books.
29,500, non-native English speaker (but studied in the US). Retook the test and omitted all the words I could not define with total confidence on the spot; the original score was 31,400. The test is peculiar in that the distribution appears to be uneven. Subjectively there is a sharp break between words that one would know from Shakespeare, Tolkien and Dunsany, and words no one would ever know unless they studied the OED. For statistical significance they would need more words.
I just just "5,340" as a result, this is an hint for me that I spend too much time trying to improve my hearing skills and too little trying to learn words outside the domain I mostly use English for (computers, programming, technology, ...).
I wonder if there is some good web site that helps you learning new words. An iPhone application will also work for me, but I need one that is able to also tell me the sound of the word. I searched a bit in the past without good results.
I'm not sure if vocabulary size matters once you reach around 25,000 words. The words I didn't know were in part because I've never had any need to know them; if I had run into any of them while reading anything written in the past 80 years, I'd be angry at the author for showing off.
When I was young, I thought that if I wanted to be a writer I should have a huge vocabulary... but now, when choosing words/synonyms I dismiss most options because they're much too obscure.
> I'm not sure if vocabulary size matters once you reach around 25,000 words.
This is what I was thinking. I scored 34K, and rarely encounter a word that I don't understand in regular speech or reading. I also know several thousand jargon words, none of which were on that test. I know what I need to know. Memorizing another 16K words to reach Shakespeare's magical 50K (and feel good about myself) would be a waste of precious mental resources.
Along with many of the flaws already adressed, the most relevant for me was "absolutely sure of" and also taking the words out of context. There were a good number of words that I was pretty sure I knew and had I read them in context would have been correct in my meaning and never given it a second thought.
I'm certain that if I took this same test using a base of 'novel in fulltext' vs. 'list of all unique words in the novel', my recog would be FAR better on the novel.
I scored far lower than I thought I would, and am genuinely surprised given that I tend to write a lot and have always thought I had a decent vocabulary.
I would love to be able to compare my score to what it would have been before moving to a non-English country and learning/speaking a new language. I definitely feel that a large part of my memory is now dedicated to Japanese and not English...
No matter how many I did (I'm fine, thanks), I would love to see a similar system that estimates one's vocabulary using already existing articles.
To put aside the ego matters, I'm curious if there are any interesting correlations for writings published in magazines. For instance, between the estimated vocabulary size and the average price for ads (I bet that there is a huge correlation.)
I got 35,900, and haven't lived in an English-speaking country in 10 years. I often get frustrated at myself when I feel like I'm losing my vocab. There were a couple of words on the list that I'm sure I once knew, but couldn't conjure up the meaning on the spot, so I skipped them. I was never a sci-fi/fantasy fan, but I do like to read literature for fun.
Same exact score. 22 years old with a Bachelor's degree. It doesn't concern me that I don't come across rare English words in my daily reading enough to know the right column. I also suspect that the average quiz-taker voted yes to words they were contextually familiar with, but couldn't give you a straight definition.
I moved to the US from Russia when I was 10, and my vocabulary is a pathetic 18,000 or so.
Which is funny, because I consider myself pretty well-read. I usually just infer the meaning of new words through their context without looking them up so I don't feel comfortable saying I know the definition of those words.
No cognates or false-friends with Portuguese. This probably knocks out at least half the dictionary, since Romance languages have plenty in common with English. False friends need to be avoided as well, since a Brazilian beginner will see "pretend" and assume he knows it means pretender, which actually means "intend." Interestingly, the no-Portuguese rule leaves the test with a strongly pronounced short Anglo-Saxon flavor.
Even so, many of the most obscure words are potentially guessable by someone with a good knowledge of Latin and Greek. Funambulist, opsimath, hypnopompic. I don't think a classical education will be much help with cantles and williwaws, though.
My result isn't too bad but there's still quite a few words I can learn especially at the end. I did cheat a little since I learned a few of the more curious looking ones when this showed up on /b/ 2 nights ago.
Wouldn't a multiple choice quiz with definitions be more accurate? Force people to choose a definition (or none of the above?) to show they actually know the word. You'd still have the issue of cheaters but at least you would know people just don't assume they know the definition of "like"
Not sure I buy the results though. I would think that the rate of increase would start to decrease quite significantly after high school/college but it appears to stay pretty much linear throughout the data.
Site creator here -- That has been the most interesting finding so far -- I certainly didn't expect it. Unfortunately I haven't had enough participation yet from children, or older folks :), to see how it continues to extend in either direction.
I will be a lot more interested in the Brazilian site when it goes live! I'm not making any particular effort to expand my English vocabulary (having a smaller vocabulary is not among the top ten reasons my writing in English is worse than Shakespeare's) but my Portuguese vocabulary is muito terrivel. It would be great to have a way to measure my progress.
Edit: Oh. The Brazilian site is just a Portuguese version of the instructions for the English site? :(
No one getting high scores did so because of or despite their education. They did it by having the sort of brain that happens to retains words like uxoricide and reading copious quantities of material in which such abstruse words are employed.
I got a high score and went to public schools in the US, and didn't finish college either.
At a certain point, somewhere in the middle ranges of these scores, formal education at even the best schools will stop having any effect on your vocabulary. It becomes more a matter of where someone's interest lies. Do they read? A lot? Outside of the usual curriculum?
And at the upper end, it's more about personal predilections than anything else. Are rare words like little gems that you save and collect?
Actually, I immediately knew what "uxoricide" meant, not because I could recall seeing the word, but because of my education, which included four years of Latin in high school. ("Uxor" is Latin for "wife", and "-icide" is a very common suffix for "killing".)
So, I beg to differ. Education can definitely help. Bravo to you, though, for building your vocabulary on your own.
At one point, the sun was generally widely accepted to revolve around the earth.
It's likely that education improves one's vocabulary, however there are a lot of variables there. People who go to good schools likely come from families where learning is important in any case, are wealthier, etc...
I checked all the boxes on the first page but one, "vibrissae" (roughly, whiskers), and saw even more boxes on the second page and groaned. So I punted, and went back to the first page, reloaded it, and checked only one box: vibrissae.
After finding 17 words on the second page, I left them all blank, following the same methodology. My vocabulary size was estimated to be 20 words.
From this, I deduced that my total vocabulary size was all the words ever known to any English speaker anywhere, anytime - minus 20.
This made me very pleased with myself, even though I knew my assumptions were pretty terrible.
(Interesting that the spell check in my browser doesn't recognize vibrissae.)
lampoon: verb: Publicly criticize (someone or something) by using ridicule or sarcasm.
I knew lampoon had something to do with criticism, so I checked the box, but I had no idea that the definition specified a public context. Does that mean I didn't know the word?
I suspect a problem with the test is that it's easy to know enough to figure out the gist of a word's definition without having any knowledge of the specificity of the definition, if that makes any sense.
Self-selected survey respondents != actual population.
Though I suppose, also -- IQ doesn't say much at all about what your vocabulary will be. If you don't read much, or mostly read popular literature, it doesn't matter how clever your are; your vocabulary won't be any bigger than the words you've encountered enough to form (or look up) a definition.
Though it's also worth noting that beyond a certain point, building a broader vocabulary isn't very useful; when you communicate, you generally need to limit yourself to vocabulary your audience knows. Likewise, when you're reading, it's pretty common that the authors using particularly obscure vocabulary are using it to muddy their meaning, not clarify it.
I got 25,600 but that was below the median of 27,123. I didn't select words I recognized but couldn't define and I double checked some definitions. I think I have a decent vocabulary. It's like I'm competing with the High IQ Society book club.
You need a test where the definition is part of the test. Otherwise anyone can just check boxes.
My god, none of this shit matters! Why does anyone put any stock in any of this kind of thing? The best a test can do is make you feel smug, the worst it can do is totally destroy your confidence. It's a lose-lose proposition.