Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Estimate the size of your English vocabulary (2014) (vocabulary.ugent.be)
119 points by pushedx on June 20, 2019 | hide | past | favorite | 108 comments


Took it once and was very conservative on words I didn’t “know”. Got 49% with no false positives. Upon inspection if I’d answered yes to everything I thought was a word, I’d have gotten much higher with no false positives. So I took it again and answered yes to everything I could guess the meaning of and got 89% with no false positives.

An example of something I said yes to on the second pass that I’d have said no to on the first: I guess superexcellent is a word, the meaning is very clear but I don’t know that I’ve ever seen it used before and would never consider using it myself.


I had the inverse issue. I answered yes to too many non-words, despite the fact in hindsight they weren't weren't a word - but I thought they sounded correct enough that they could be used as a word.

Went back and also got a higher score by being less trigger happy with my right finger.


Yeah - somehow "humanly" is a word while "autereric" is not. Cats sit humanly, but filmmakers don't have auteureric predilictions.

I got 87-0 being strict and 93-17 by including "maybes", and I'm not sure which is more reasonable.


Same issue here. I would never use “indigenously”, and the google links to the word are all to online dictionaries, not to actual usage examples.


I can't imagine how anybody could use it in a non-awkward or unclear way

Validity is not sanity; just conjulate the worbs, I guess


This test made me feel quite the blatherskite with my hemineglect of words a bibliolatrous individual like myself surely should know. It quite made me turn a pale shade of ecru; my vocabulary thinner than a shingling. Perhaps the ratiocinative thing to do would be to rusticate for a while, and form the cullet in my mind into something more .... ah, forgive my aposiopetic nature, for I must schuss off somewhere else.


Why waste time say lot word when few word do trick?


"Brevity is the soul of wit." -WS


"I'm anaspeptic, frasmotic, even compunctuous to have caused you such pericombobulations." https://www.youtube.com/watch?v=hOSYiT2iG08


"I'm losing my perspicacity!"


I wonder if this test is trying to detect something else than what it pretends to be. Dislexya, mispellings... It didn't feel like a vocabulary test.


Obviously it’s trying to detect left handedness


I think it might actually just swap the buttons if you are left handed


I'm left handed and it used F for "no" and J for "yes".


88% with 1 false positive, but it was 'wargish' which I think should be accepted as a real word.

To be fair, I did also hit yes for words that I could work out the greek/latin roots for.

---

I know the intent of this test, but English has a special problem in that there is no real definition of what is and is not an English word. For example, there are a large number of words on this test that come directly from another language and are used as a term of art in a specific field.

See: mora, furuncle, cercal

Then you get transliterations like dacoit.

Are units of currency automatically English: piastre

Another genre is words that may be technically English, but would never be used - see delegable (delegatable would be used)

Then you have several alternate spellings from British English/American English/alternative transliterations.


I would consider "wargish" to be a word meaning "in the style of a warg", with warg being a word for wolf that appears in modern English works (Tolkien being the obvious one).


Agreed. If an author described a character to me as "wargish", I would instantly have an idea in my head of what that person looked like - ugly, brutish, and long-faced like a wolf. It should then I think, by extension, be classed as a word; or at least not be used in a test like this.


That's how I would use it.


"That's perfectly cromulent."


As a native english speaker, answering only known words: 80% true positive, 0% false positive.

As a native english speaker, answering for known or plausible words (with empty profile to not contaminate the test): 96% true positive, 23% false positive.


German, 56 % true positive, 0 % false positive, could propably get about 90 % of the word non-word decisions right by guessing, the non-word are pretty non-English. There werde also a hand full of true-positives were I tended to think I know the meaning but decided to err on the other side because of the promised heavy penality.

PS: Right-handed


It doesn't estimate the size of your vocab. It measures your propensity to distinguish wordy looking things from nonwordy looking things.


What counts as "English vocabulary"? I definitely don't count the name of the currency of Malaysia. It seems like they're relying on random selection from thefreedictionary and user feedback, but their take on "English word" vs "other language that happens to use the Latin alphabet" differs from mine.


I got “gaucho” and that scored as an English word. I’m not sure I’ve ever heard it used other than to talk about Spanish speaking cowboys.


I got "halal". Definitely a word you hear as an English speaker, but not a word that came from English.

I have a feeling Scrabble players do well on this test.


What does "came from English" even mean? At some point all the words in English came from something else, like French, Latin or old Norse.


halal is a loanword.


Yeah that's what I was getting at. Kind of like what the Germans call "denglish".


On a second run I got

Barras (French baras) n (Biography) Paul François Jean Nicolas, Vicomte de Barras. 1755–1829, French revolutionary: member of the Directory (1795–99)

Rigorous methodology this is not.


I felt smart having only missed 9 real words. Then I went to look them up. The definition for affricate made me feel real stupid again when I didn't understand almost every third word in the Google provided definition:

>a phoneme which combines a plosive with an immediately following fricative or spirant sharing the same place of articulation


I wouldn't feel bad...those (phoneme, plosive, fricative, spirant, and affricate) are specialized vocabulary words from a specific domain (linguistics). I only know them because I have an interest in the topic. Other than phoneme, they describe how words are pronounced and/or formed in the mouth.


Another linguistics enthusiast checking in. To expand on this, a “plosive” is a sound made with the tongue stopping the airflow and then releasing it (think “t”, “p”, “k” sounds), while a fricative involves continuous airflow (think “s”, “f”, and so on). What that definition is saying is that an “affricative” is a sound that combines two of these made with the tongue in the same position. In English, an example is the “t-ch” sound at the beginning of the word “chat”. Other languages have different examples, like the “t-s” sound written as ц in Russian.


I got 80%, although I got penalized for "prostuttion" which I accepted by accident. I think it's a bit much to penalize people for accepting misspellings of real words. After all if someone wrote "In Germany prostuttion is legal" the meaning is perfectly clear.


I got two slightly misspelled words and it didn't mark them as wrong, so this whole test seems wildly inconsistent(not to mention it seems to have some foreign words and even names marked as real english words according to some other commenters...).


This was a cute test, but it dinged me for not "knowing" the word "toreador". Except I do: it's a Spanish word, not an English one. I also don't consider it a loan word, because the English word for "toreador" is "bullfighter".


This will speak to your soul, then: https://en.wikipedia.org/wiki/Uncleftish_Beholding


The English word for "royal" and "regal" is "kingly"


This is what's known as "overthinking a shitty online test."


It got me with seppuku.


89% with no false positives.

I wonder how tightly coupled this will be to education, gender, age, and especially handedness. Will righties rule and lefties drool, or will an unthinkable upset occur? I expect the U.K. to dominate because all the smart movies use U.K. actors, full of $2 words.


Hah, leftie here with the exact same score as you. I’m probably going to be thrown out from any statistics they derive as an outlier because I’m a high school dropout programmer who ravenously read books in his youth.


86% here with no false positives but with a four beer handicap and over 60yo but still "in the top level" (which has no definition). So I'll take this and skedaddle. Cheers :)


84% - 0%: Non native speaker. Have to brag :) And I beat my native speaker partner who had 86% - 13% = 73%. :)


I know the non-words are really "not English", but I wonder how similar the sentences would be if asked to "use in a sentence".


73% with 0% false positive. Most of my false negatives were "Can you really stick that many suffixes on this word?"


There were words like "bate" and "earliness" that I figured could have worked and would be legal but considered they could have been made up. Still happy with 79% - 0% as native English ( well Aussie)


problem I have with this test, is a slight dyslexia when hitting the appropriate key. Several times I knew I hit the wrong one a moment after, but had no way to go back and correct it.

Also, in some cases I noticed that there was what appeared to be a real word, but it was slightly misspelled, so I'm sure that counted as an incorrect input.


I sometimes hit the "no" key before I even fully processed the word. In my experience this is a problem I frequently have with this kind of test where I'm presented with such choices in quick succession. After a while I tend to just choose before I started really thinking about it, and then realized I chose wrong.


I didn't see how to estimate the size of your vocabulary, I just saw a percentage. Did I miss something? Like I want to know something like, based on your answers were estimate you know 8311 words.

Edit: oops replied to wrong parent


It is saying what percent of ALL words you know. So just take the total count of English words and multiply by your percentage.


The ‘total count’ is in hundreds of thousands, especially since the test includes some specialized terms. A comprehensive dictionary is estimated to have over a million words, but that might include trivial variations.

Meanwhile, a native speaker's everyday dictionary is typically on the order of ten to twenty thousand words. No way that their ‘reserve’ knowledge extends twenty times beyond this. I'd guess maybe several thousand infrequent and specialized words, with which the total knowledge would still clock in at low tens of thousands.


I don't think this can be right. the Oxford English Dictionary has over 300,000 main words, and I got a score over 80%. That would suggest I have a vocabulary of 240,000 words, which seems a bit high, to say the least!


Their FAQ claims 60,000 ("The word list should contain almost all English words...") so 48k+ works known, which is somewhat more believable. OED 2nd Edition (1989) has 171k full, current entries [1] (and another 47k obsolete, which I will disregard), so 137k+ words, which is quite a few but perhaps not beyond the realm of imagination.

[1] https://en.oxforddictionaries.com/explore/how-many-words-are...


But I think for that to work we'd need to know the size of _their_ pool of word, not the size of all known words (which is debated anyway)


The size of their pool is slightly over 60k.


There's a definite grey zone of wordy word-like strings that are being cobbled together from chunks of other words, where it's very easy to imagine a possible definition for a word that sounds like that. Also, once or twice, I'm noticing things that feel like proper first names.

Meanwhile, the upper 10% of words includes highly specialized terms from niche sections of fields such as geology, chemistry and medicine, which arguably, are not words, even if you've seen them before. In some respects, a proper name isn't a word, and so, is a corporate invented product name really a word? Isn't that also a proper noun like "John" or "Texas"?

In this 10% grey area, I see a few words that I know to be words, but I cannot immediately tell you the definition, and if I were to try and use them in a sentence, I'd be able to apply the word class (noun, verb, adjective, adverb) within the structure of a sentence. But, I don't actually know the word. I only know that it's probably a word, but since I can't guess the definition or use it, I should dump it as an unknown, but more often than not, I'm getting away with some of these, mostly because after multiple passes, I can detect how the non-words are being generated, and the real words possess a more obvious validity of structure.

That said, I'm able to hover at or above 90% with occasional disembiggening penalties for over-cromulence.



A lot of the time it's easy to guess what real words are based on what appear to be structured correctly. I scored much higher guessing (even though I had a couple of false positives) then when I answered truly.

Also a lot of the complex words are medicine/medical which makes sense since there are an awful lot of them but isn't necessraily representive of 73% (my score) of the words I will encounter outside of medical literature.


87% with no false positives. In my view many of these are not what I consider English ‘words’ - plant names, trade marks and archaic words. Still fun though.


This is a neat format but I wouldn't trust the results. Some of the "real" words are not actual English words (I got a Spanish word and an acronym) and some of the "fake" words are real, but obscure (for example, "coath"). Still a lot of fun, which seems to be the point.


A few words (kilocycle, advertency, implacability, teacupful, outcrop) I would accept in day to day use, but I wasn't sure if they were legal conjugations of the root words. In fact firefox's dictionary does not have advertency.


Today I learned that "effing" is actually considered a word nowadays.


"effing and blinding/jeffing" is something old people say, I imagine "effing" by itself as a minced curse is far older.


The test, that really estimates the size, not just the percent: http://testyourvocab.com/


I misinterpreted the test, since I assumed they were all real words and only selected the ones I at least could figure out the likely definition for. That said, I'm still feeling pretty good about my results, only missing a few real words (of which I recognized 2-3 of them, but didn't know the meaning), and had one non-word which was very similar to two real English words.


I don't understand the purpose of the test result.

So basically I know 83% of the English words... that were asked in this test. If next year 1000 new words are introduced to the dictionary, that doesn't automatically mean I know 830 of them. In other words the minimum size of my vocabulary is 83 words. Nothing more. Nothing less.


If you poll a hundred voters and 70 of them say they're voting Wolf and them say they're voting Sheep, do you know nothing more than at least 70 people will vote Wolf and 30 Sheep?


For the record, I found 67% of the words, and said yes to 0% of the nonwords. I proceeded conservatively, only saying yes when I have seen the word before, not when I guessed it should be a word.

They say this is fairly high for a native speaker.

I gave my mother tongue as German, so is this "fairly high" for a native German speaker?

It felt too easy.


84% with no false positives. I missed mostly medical lingo, one legal term and "stitchwort" which I did suspect was a plant but I had no idea which one.

This is pretty much what I expected, since as a non-native speaker; though even if you told me what stitchwort was in Finnish I'd still not know what it actually is besides a plant.


90% / 0% Native English, I was guessing most of the time. Really surprised the medical words didn't screw me over. If it sounded and looked like an English word, I would answer Yes. I answered No to anything that seemed even a little odd. I think if I retook the test my score would vary wildly...


Didn’t realize there where non-words in the tests and was getting disheartened at how many words I don’t know...


I wonder how they take into account the kind of people who take a test like this are likely to have a good vocabulary. I suppose that might not be true but I don't think I'd do something similar for a test that measured something along the lines of matching faces to pop musicians.


I bet the only truly important correlation is how many books a person has read. Too bad they didn’t ask that.

“Would you say you’ve read less than 10, less than 100, less than 1000, or all the books ever?”


I've read probably around 500 books in my life (and I'm 30), and I only got 70% with zero false positives.

I think it depends on the type of books one reads, not quantity.


It makes me sad that between your and the parent comment we take 100, 500 and 1000 to be a lot of books to have read in a lifetime.


The web lately did more for my dictionary than books, since I'm pretty much living on it while I'm getting my books in audio form. Audio helps with training to understand spoken language and improved the numbers of ‘read’ books a lot, but there's definitely the downside that unfamiliar words often fly by without impact.


96%. The two I missed were misclicks (e.g., I accidentally hit ‘no’ on ‘parboil.’)

A number of those were intuition, though.


Going on intuition I figured that “boxthorn” and “razorbills” were undoubtedly English words but not knowing them I followed the rules and answered no. They both are, apparently, English words. 87%.


Part of the 96% comes, I think, from the fact that I did doctoral work in English lit, which gives me a better “feel” for those sorts of archaisms. I can’t say that I knew all of the words that cropped up in the test, but many of them sounded plausible on the basis of having read quite a bit of pre-19thC stuff.


I found the test significantly easier on subsequent attempts, as I familiarized myself with how strict (as it turns out, not very) their definitions are - I wonder if they will do any correcting for this?

(Scores for reference: 70, 77, 87, 84, 84, 84)


93% with 3 false positives, all of which were literally me being a dumbass and hitting the wrong button because my hands REALLY wanted yes to be on F.

Which means nothing except that I've seen a lot of words and remember them.


If you're interested in this you might also like http://testyourvocab.com/

I'm working on something similar for German at the moment.


Non-native speaker, got a score of 63% with 1 false positive, so that makes it 60%.

However, it said that it is relatively high for a non-native speaker. How did they reach this conclusion?


From their FAQ, "...a proficient native speaker will know some 40,000 words of the list (i.e., 67%) ... [for second language speakers] our estimates range from 6,000 words (10%) for a medium proficiency speaker to 20,000 words (33%) for a high-proficiency speaker."

So well above their estimates, but they don't extrapolate on the source of those numbers - they might be available somewhere at http://crr.ugent.be


Aha, thanks for that data. Idk, somehow I feel second language speakers would perform better than 33% Thanks for that link too


84% but penalized 3% for saying "steepy" is a word.



That was fun!

My score:

You said yes to 63% of the existing words.

You said yes to 0% of the nonwords.

This gives you a corrected score of 63% - 0% = 63%.

I already knew I wasn't a wordsmith so that's not awful. About what I expected.


You said yes to 79% of the existing words.

You said yes to 0% of the nonwords.

This gives you a corrected score of 79% - 0% = 79%.

This is a high level for a native speaker.

I'm impressed, I thought I would score way less since I'm not a native speaker.


For science! 67% for me.

My brain small. Sad!


Don't worry, I got less than that, but 0 bad guesses!

I'd bet there is a bias from those reporting scores here -- a humblebrag, for science, of course!


As a none native speaker (English would be my 3rd language at best, maybe 4th) I expected to dramatically underperform compared to most scores here, but:

You said yes to 76% of the existing words.

You said yes to 0% of the nonwords.

This gives you a corrected score of 76% - 0% = 76%.

Which I find oddly high.


77% existing, 7% non existing.

"This is a high level for a native speaker."

I'm happy because I'm Swedish. I would like to see a histogram also by native language.


Didn't estimate my vocabulary, just scored results of my test.

From the description I was hoping for a statistical estimate of how many English worsd I know...


From the FAQ: "The master list includes 60,469 words and 304,275 nonwords. From these you get a random sample of 70 words and 30 nonwords. The word list should contain almost all English words (non-inflected forms)."

So just multiply by 60k for how many you know. You've also got the info you need for stuff like stddev.


That's not how that works. To obtain an absolute number the words have to be ranked by frequency and then the 100 words have to be picked so that frequent words appear first and less frequent words appear at the end. The estimator then knows whether you know a 90%, 99%, 99.9% or 99.9% percentile words and uses that to estimate that you know enough words to cover x% of all words that will appear in a text. Then you use that percentile on your word ranking to count how many words are below that percentile.

The posted test site does nothing of the sort.

Imagine the test only asked you a single 90% percentile word. vocabulary.ugent.be would tell you: You know 100% of the words. The process I described would tell you: You know 90% of the words.

Obviously both tests become more accurate as the number of words increases but the vocabulary.ugent.be test requires you to test all 60k words to get an accurate result. The second test will probably need less than 1000 words to be dead on.

http://testyourvocab.com/ seems to be a pretty good implementation of the approach I described.


The test you described is useful for determining how many words in a text you will know - i.e. what percentage in a given set of English. That's probably a more useful number, but it's not what parent asked: "I was hoping for [...] how many English worsd [sic] I know."

> Obviously both tests become more accurate as the number of words increases but the vocabulary.ugent.be test requires you to test all 60k words to get an accurate result.

They could never get an accurate result for words known in a body wih this test - it needs a separate dataset of word frequency. That's not what they're testing.


Lol. "should contain almost"


I wonder how much of this study is permanently skewed by being on HN and it's specific audience - as well as all the redos.


Weird that they provide a button to improve your score.. wouldn't that mess with their results if many people retest themselves


96% true positive, 0% false positive.

Agreed that this seems like a measurement of something other than vocabulary.


Few attempts with around ~60%. I noticed my response times for known words were quite higher those in question.


73% with 0% non-words. While I was taking it, it felt like I was doing worse, but not a bad result overall.


Interesting, found the word not from Merriam-Webster dictionary: araliaceous.


73% with 0% false positives, ESL

I am curious about how common is having a ~75% score


I really wish they gave the histogram of scores.

83% (86% known - 3% false positive)


Russian, 66% counting the 7% penalty for non-words.


87% and no false positives. Native speaker. Fun!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: