
Estimate the size of your English vocabulary (2014) - pushedx
http://vocabulary.ugent.be
======
furyofantares
Took it once and was very conservative on words I didn’t “know”. Got 49% with
no false positives. Upon inspection if I’d answered yes to everything I
thought was a word, I’d have gotten much higher with no false positives. So I
took it again and answered yes to everything I could guess the meaning of and
got 89% with no false positives.

An example of something I said yes to on the second pass that I’d have said no
to on the first: I guess superexcellent is a word, the meaning is very clear
but I don’t know that I’ve ever seen it used before and would never consider
using it myself.

~~~
Aromasin
I had the inverse issue. I answered yes to too many non-words, despite the
fact in hindsight they weren't weren't a word - but I thought they sounded
correct enough that they _could_ be used as a word.

Went back and also got a higher score by being less trigger happy with my
right finger.

~~~
trentlott
Yeah - somehow "humanly" is a word while "autereric" is not. Cats sit humanly,
but filmmakers don't have auteureric predilictions.

I got 87-0 being strict and 93-17 by including "maybes", and I'm not sure
which is more reasonable.

------
minikomi
This test made me feel quite the blatherskite with my hemineglect of words a
bibliolatrous individual like myself surely should know. It quite made me turn
a pale shade of ecru; my vocabulary thinner than a shingling. Perhaps the
ratiocinative thing to do would be to rusticate for a while, and form the
cullet in my mind into something more .... ah, forgive my aposiopetic nature,
for I must schuss off somewhere else.

~~~
domsnar
Why waste time say lot word when few word do trick?

~~~
jbottoms
"Brevity is the soul of wit." -WS

------
alain94040
I wonder if this test is trying to detect something else than what it pretends
to be. Dislexya, mispellings... It didn't feel like a vocabulary test.

~~~
mxcrossb
Obviously it’s trying to detect left handedness

~~~
epse
I think it might actually just swap the buttons if you are left handed

~~~
alpaca128
I'm left handed and it used F for "no" and J for "yes".

------
csours
88% with 1 false positive, but it was 'wargish' which I think should be
accepted as a real word.

To be fair, I did also hit yes for words that I could work out the greek/latin
roots for.

\---

I know the intent of this test, but English has a special problem in that
there is no real definition of what is and is not an English word. For
example, there are a large number of words on this test that come directly
from another language and are used as a term of art in a specific field.

See: mora, furuncle, cercal

Then you get transliterations like dacoit.

Are units of currency automatically English: piastre

Another genre is words that may be technically English, but would never be
used - see delegable (delegatable would be used)

Then you have several alternate spellings from British English/American
English/alternative transliterations.

~~~
thenewwazoo
I would consider "wargish" to be a word meaning "in the style of a warg", with
warg being a word for wolf that appears in modern English works (Tolkien being
the obvious one).

~~~
Aromasin
Agreed. If an author described a character to me as "wargish", I would
instantly have an idea in my head of what that person looked like - ugly,
brutish, and long-faced like a wolf. It should then I think, by extension, be
classed as a word; or at least not be used in a test like this.

------
czr
As a native english speaker, answering only known words: 80% true positive, 0%
false positive.

As a native english speaker, answering for known or _plausible_ words (with
empty profile to not contaminate the test): 96% true positive, 23% false
positive.

~~~
danbruc
German, 56 % true positive, 0 % false positive, could propably get about 90 %
of the word non-word decisions right by guessing, the non-word are pretty non-
English. There werde also a hand full of true-positives were I tended to think
I know the meaning but decided to err on the other side because of the
promised heavy penality.

PS: Right-handed

------
hardlianotion
It doesn't estimate the size of your vocab. It measures your propensity to
distinguish wordy looking things from nonwordy looking things.

------
ebg13
What counts as "English vocabulary"? I definitely don't count the name of the
currency of Malaysia. It seems like they're relying on random selection from
thefreedictionary and user feedback, but their take on "English word" vs
"other language that happens to use the Latin alphabet" differs from mine.

~~~
wincy
I got “gaucho” and that scored as an English word. I’m not sure I’ve ever
heard it used other than to talk about Spanish speaking cowboys.

~~~
joezydeco
I got "halal". Definitely a word you hear as an English speaker, but not a
word that came from English.

I have a feeling Scrabble players do well on this test.

~~~
Ma8ee
What does "came from English" even mean? At some point all the words in
English came from something else, like French, Latin or old Norse.

------
slg
I felt smart having only missed 9 real words. Then I went to look them up. The
definition for affricate made me feel real stupid again when I didn't
understand almost every third word in the Google provided definition:

>a phoneme which combines a plosive with an immediately following fricative or
spirant sharing the same place of articulation

~~~
alanthonyc
I wouldn't feel bad...those (phoneme, plosive, fricative, spirant, and
affricate) are specialized vocabulary words from a specific domain
(linguistics). I only know them because I have an interest in the topic. Other
than _phoneme_ , they describe how words are pronounced and/or formed in the
mouth.

~~~
lovecg
Another linguistics enthusiast checking in. To expand on this, a “plosive” is
a sound made with the tongue stopping the airflow and then releasing it (think
“t”, “p”, “k” sounds), while a fricative involves continuous airflow (think
“s”, “f”, and so on). What that definition is saying is that an “affricative”
is a sound that combines two of these made with the tongue in the same
position. In English, an example is the “t-ch” sound at the beginning of the
word “chat”. Other languages have different examples, like the “t-s” sound
written as ц in Russian.

------
mantap
I got 80%, although I got penalized for "prostuttion" which I accepted by
accident. I think it's a bit much to penalize people for accepting
misspellings of real words. After all if someone wrote _" In Germany
prostuttion is legal"_ the meaning is perfectly clear.

~~~
alpaca128
I got two slightly misspelled words and it didn't mark them as wrong, so this
whole test seems wildly inconsistent(not to mention it seems to have some
foreign words and even names marked as real english words according to some
other commenters...).

------
thenewwazoo
This was a cute test, but it dinged me for not "knowing" the word "toreador".
Except I do: it's a Spanish word, not an English one. I also don't consider it
a loan word, because the English word for "toreador" is "bullfighter".

~~~
aasasd
This will speak to your soul, then:
[https://en.wikipedia.org/wiki/Uncleftish_Beholding](https://en.wikipedia.org/wiki/Uncleftish_Beholding)

------
merpnderp
89% with no false positives.

I wonder how tightly coupled this will be to education, gender, age, and
especially handedness. Will righties rule and lefties drool, or will an
unthinkable upset occur? I expect the U.K. to dominate because all the smart
movies use U.K. actors, full of $2 words.

~~~
wincy
Hah, leftie here with the exact same score as you. I’m probably going to be
thrown out from any statistics they derive as an outlier because I’m a high
school dropout programmer who ravenously read books in his youth.

~~~
sunstone
86% here with no false positives but with a four beer handicap and over 60yo
but still "in the top level" (which has no definition). So I'll take this and
skedaddle. Cheers :)

------
starmole
84% - 0%: Non native speaker. Have to brag :) And I beat my native speaker
partner who had 86% - 13% = 73%. :)

------
rgoulter
I know the non-words are really "not English", but I wonder how similar the
sentences would be if asked to "use in a sentence".

------
munchbunny
73% with 0% false positive. Most of my false negatives were "Can you really
stick that many suffixes on this word?"

------
martyvis
There were words like "bate" and "earliness" that I figured could have worked
and would be legal but considered they could have been made up. Still happy
with 79% - 0% as native English ( well Aussie)

------
derekp7
problem I have with this test, is a slight dyslexia when hitting the
appropriate key. Several times I knew I hit the wrong one a moment after, but
had no way to go back and correct it.

Also, in some cases I noticed that there was what appeared to be a real word,
but it was slightly misspelled, so I'm sure that counted as an incorrect
input.

~~~
tombh
I didn't see how to estimate the size of your vocabulary, I just saw a
percentage. Did I miss something? Like I want to know something like, based on
your answers were estimate you know 8311 words.

Edit: oops replied to wrong parent

~~~
cortesoft
It is saying what percent of ALL words you know. So just take the total count
of English words and multiply by your percentage.

~~~
inetsee
I don't think this can be right. the Oxford English Dictionary has over
300,000 main words, and I got a score over 80%. That would suggest I have a
vocabulary of 240,000 words, which seems a bit high, to say the least!

~~~
larkeith
Their FAQ claims 60,000 ("The word list should contain almost all English
words...") so 48k+ works known, which is somewhat more believable. OED 2nd
Edition (1989) has 171k full, current entries [1] (and another 47k obsolete,
which I will disregard), so 137k+ words, which is quite a few but perhaps not
beyond the realm of imagination.

[1] [https://en.oxforddictionaries.com/explore/how-many-words-
are...](https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-
the-english-language)

------
embiggulent
There's a definite grey zone of wordy word-like strings that are being cobbled
together from chunks of other words, where it's very easy to imagine a
possible definition for a word that sounds like that. Also, once or twice, I'm
noticing things that feel like proper first names.

Meanwhile, the upper 10% of words includes highly specialized terms from niche
sections of fields such as geology, chemistry and medicine, which arguably,
are not words, even if you've seen them before. In some respects, a proper
name isn't a word, and so, is a corporate invented product name really a word?
Isn't that also a proper noun like "John" or "Texas"?

In this 10% grey area, I see a few words that I know to be words, but I cannot
immediately tell you the definition, and if I were to try and use them in a
sentence, I'd be able to apply the word class (noun, verb, adjective, adverb)
within the structure of a sentence. But, I don't actually know the word. I
only know that it's probably a word, but since I can't guess the definition or
use it, I should dump it as an unknown, but more often than not, I'm getting
away with some of these, mostly because after multiple passes, I can detect
how the non-words are being generated, and the real words possess a more
obvious validity of structure.

That said, I'm able to hover at or above 90% with occasional disembiggening
penalties for over-cromulence.

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=7949313](https://news.ycombinator.com/item?id=7949313)

------
batiudrami
A lot of the time it's easy to guess what real words are based on what appear
to be structured correctly. I scored much higher guessing (even though I had a
couple of false positives) then when I answered truly.

Also a lot of the complex words are medicine/medical which makes sense since
there are an awful lot of them but isn't necessraily representive of 73% (my
score) of the words I will encounter outside of medical literature.

------
xxxxxxxx
87% with no false positives. In my view many of these are not what I consider
English ‘words’ - plant names, trade marks and archaic words. Still fun
though.

------
sterkekoffie
This is a neat format but I wouldn't trust the results. Some of the "real"
words are not actual English words (I got a Spanish word and an acronym) and
some of the "fake" words are real, but obscure (for example, "coath"). Still a
lot of fun, which seems to be the point.

------
finnthehuman
A few words (kilocycle, advertency, implacability, teacupful, outcrop) I would
accept in day to day use, but I wasn't sure if they were legal conjugations of
the root words. In fact firefox's dictionary does not have advertency.

------
hristov
Today I learned that "effing" is actually considered a word nowadays.

~~~
pessimizer
"effing and blinding/jeffing" is something old people say, I imagine "effing"
by itself as a minced curse is far older.

------
monort
The test, that really estimates the size, not just the percent:
[http://testyourvocab.com/](http://testyourvocab.com/)

------
starky
I misinterpreted the test, since I assumed they were all real words and only
selected the ones I at least could figure out the likely definition for. That
said, I'm still feeling pretty good about my results, only missing a few real
words (of which I recognized 2-3 of them, but didn't know the meaning), and
had one non-word which was very similar to two real English words.

------
imtringued
I don't understand the purpose of the test result.

So basically I know 83% of the English words... that were asked in this test.
If next year 1000 new words are introduced to the dictionary, that doesn't
automatically mean I know 830 of them. In other words the minimum size of my
vocabulary is 83 words. Nothing more. Nothing less.

~~~
taejo
If you poll a hundred voters and 70 of them say they're voting Wolf and them
say they're voting Sheep, do you know nothing more than at least 70 people
will vote Wolf and 30 Sheep?

------
hibbelig
For the record, I found 67% of the words, and said yes to 0% of the nonwords.
I proceeded conservatively, only saying yes when I have seen the word before,
not when I guessed it should be a word.

They say this is fairly high for a native speaker.

I gave my mother tongue as German, so is this "fairly high" for a native
German speaker?

It felt too easy.

------
chousuke
84% with no false positives. I missed mostly medical lingo, one legal term and
"stitchwort" which I did suspect was a plant but I had no idea which one.

This is pretty much what I expected, since as a non-native speaker; though
even if you told me what stitchwort was in Finnish I'd still not know what it
actually is besides a plant.

------
RenRav
90% / 0% Native English, I was guessing most of the time. Really surprised the
medical words didn't screw me over. If it sounded and looked like an English
word, I would answer Yes. I answered No to anything that seemed even a little
odd. I think if I retook the test my score would vary wildly...

------
noncoml
Didn’t realize there where non-words in the tests and was getting disheartened
at how many words I don’t know...

------
te_platt
I wonder how they take into account the kind of people who take a test like
this are likely to have a good vocabulary. I suppose that might not be true
but I don't think I'd do something similar for a test that measured something
along the lines of matching faces to pop musicians.

~~~
merpnderp
I bet the only truly important correlation is how many books a person has
read. Too bad they didn’t ask that.

“Would you say you’ve read less than 10, less than 100, less than 1000, or all
the books ever?”

~~~
RandomBacon
I've read probably around 500 books in my life (and I'm 30), and I only got
70% with zero false positives.

I think it depends on the type of books one reads, not quantity.

~~~
animal531
It makes me sad that between your and the parent comment we take 100, 500 and
1000 to be a lot of books to have read in a lifetime.

------
cbfrench
96%. The two I missed were misclicks (e.g., I accidentally hit ‘no’ on
‘parboil.’)

A number of those were intuition, though.

~~~
shereadsthenews
Going on intuition I figured that “boxthorn” and “razorbills” were undoubtedly
English words but not knowing them I followed the rules and answered no. They
both are, apparently, English words. 87%.

~~~
cbfrench
Part of the 96% comes, I think, from the fact that I did doctoral work in
English lit, which gives me a better “feel” for those sorts of archaisms. I
can’t say that I knew all of the words that cropped up in the test, but many
of them sounded plausible on the basis of having read quite a bit of pre-19thC
stuff.

------
larkeith
I found the test significantly easier on subsequent attempts, as I
familiarized myself with how strict (as it turns out, not very) their
definitions are - I wonder if they will do any correcting for this?

(Scores for reference: 70, 77, 87, 84, 84, 84)

------
micmil
93% with 3 false positives, all of which were literally me being a dumbass and
hitting the wrong button because my hands REALLY wanted yes to be on F.

Which means nothing except that I've seen a lot of words and remember them.

------
taejo
If you're interested in this you might also like
[http://testyourvocab.com/](http://testyourvocab.com/)

I'm working on something similar for German at the moment.

------
andromeda20
Non-native speaker, got a score of 63% with 1 false positive, so that makes it
60%.

However, it said that it is relatively high for a non-native speaker. How did
they reach this conclusion?

~~~
larkeith
From their FAQ, "...a proficient native speaker will know some 40,000 words of
the list (i.e., 67%) ... [for second language speakers] our estimates range
from 6,000 words (10%) for a medium proficiency speaker to 20,000 words (33%)
for a high-proficiency speaker."

So well above their estimates, but they don't extrapolate on the source of
those numbers - they might be available somewhere at
[http://crr.ugent.be](http://crr.ugent.be)

~~~
andromeda20
Aha, thanks for that data. Idk, somehow I feel second language speakers would
perform better than 33% Thanks for that link too

------
bhouston
84% but penalized 3% for saying "steepy" is a word.

~~~
mamon
Turns out it is: [https://www.merriam-
webster.com/dictionary/steepy](https://www.merriam-
webster.com/dictionary/steepy)

------
oblib
That was fun!

My score:

You said yes to 63% of the existing words.

You said yes to 0% of the nonwords.

This gives you a corrected score of 63% - 0% = 63%.

I already knew I wasn't a wordsmith so that's not awful. About what I
expected.

~~~
ggcdn
For science! 67% for me.

My brain small. Sad!

~~~
benburleson
Don't worry, I got less than that, but 0 bad guesses!

I'd bet there is a bias from those reporting scores here -- a humblebrag, for
science, of course!

~~~
PappaPatat
As a none native speaker (English would be my 3rd language at best, maybe 4th)
I expected to dramatically underperform compared to most scores here, but:

You said yes to 76% of the existing words.

You said yes to 0% of the nonwords.

This gives you a corrected score of 76% - 0% = 76%.

Which I find oddly high.

------
bobowzki
77% existing, 7% non existing.

"This is a high level for a native speaker."

I'm happy because I'm Swedish. I would like to see a histogram also by native
language.

------
philipswood
Didn't estimate my vocabulary, just scored results of my test.

From the description I was hoping for a statistical estimate of how many
English worsd I know...

~~~
larkeith
From the FAQ: "The master list includes 60,469 words and 304,275 nonwords.
From these you get a random sample of 70 words and 30 nonwords. The word list
should contain almost all English words (non-inflected forms)."

So just multiply by 60k for how many you know. You've also got the info you
need for stuff like stddev.

~~~
imtringued
That's not how that works. To obtain an absolute number the words have to be
ranked by frequency and then the 100 words have to be picked so that frequent
words appear first and less frequent words appear at the end. The estimator
then knows whether you know a 90%, 99%, 99.9% or 99.9% percentile words and
uses that to estimate that you know enough words to cover x% of all words that
will appear in a text. Then you use that percentile on your word ranking to
count how many words are below that percentile.

The posted test site does nothing of the sort.

Imagine the test only asked you a single 90% percentile word.
vocabulary.ugent.be would tell you: You know 100% of the words. The process I
described would tell you: You know 90% of the words.

Obviously both tests become more accurate as the number of words increases but
the vocabulary.ugent.be test requires you to test all 60k words to get an
accurate result. The second test will probably need less than 1000 words to be
dead on.

[http://testyourvocab.com/](http://testyourvocab.com/) seems to be a pretty
good implementation of the approach I described.

~~~
larkeith
The test you described is useful for determining how many words in a text you
will know - i.e. what percentage in a given set of English. That's probably a
more useful number, but it's not what parent asked: "I was hoping for [...]
how many English worsd [sic] I know."

> Obviously both tests become more accurate as the number of words increases
> but the vocabulary.ugent.be test requires you to test all 60k words to get
> an accurate result.

They could _never_ get an accurate result for words known in a body wih this
test - it needs a separate dataset of word frequency. That's not what they're
testing.

------
JoshTko
I wonder how much of this study is permanently skewed by being on HN and it's
specific audience - as well as all the redos.

------
darepublic
Weird that they provide a button to improve your score.. wouldn't that mess
with their results if many people retest themselves

------
erisinger
96% true positive, 0% false positive.

Agreed that this seems like a measurement of something other than vocabulary.

------
dondo
Few attempts with around ~60%. I noticed my response times for known words
were quite higher those in question.

------
IAmGraydon
73% with 0% non-words. While I was taking it, it felt like I was doing worse,
but not a bad result overall.

------
xvilka
Interesting, found the word not from Merriam-Webster dictionary: araliaceous.

------
rmujica
73% with 0% false positives, ESL

I am curious about how common is having a ~75% score

------
abetusk
I really wish they gave the histogram of scores.

83% (86% known - 3% false positive)

------
p2t2p
Russian, 66% counting the 7% penalty for non-words.

------
pw
87% and no false positives. Native speaker. Fun!

