It doesn’t list inflections, proper names, adjectives for colors such as yellowish and words used in an entry that derive from that entry (the dictionary mentions blearily and bleary-eyed being used in the definition of bleary)
They also say they occassionally had to use a word not in the list, but don’t say how often they had to. Those words _are_ defined in the dictionary, so it is possible that the reference graph does not have any cycles.
So, I guess 3,000 is a good first guess.
I didn't go searching, big was literally the first word on the list I read after going down a few pages, and I wondered about large, so I searched for it. I just looked a bit more, and there's "child", "childhood", and "grandchild", which while not the same problem, does illustrate that they are fairly liberal with their inclusions because they appear to want to use the minimum vocabulary to define something idiomatically, which is a slightly different question than what's the minimum required.
This problem actually seems to share a lot in common with database normalization.
“Our language is an imperfect instrument created by ancient and ignorant men. It is an animistic language that invites us to talk about stability and constants, about similarities and normal and kinds, about magical transformations, quick cures, simple problems, and final solutions. Yet the world we try to symbolize with this language is a world of process, change, differences, dimensions, functions, relationships, growths, interactions, developing, learning, coping, complexity. And the mismatch of our ever-changing world and our relatively static language forms is a problem.” - Wendell Johnson
A great book and fantastic movie.
In fact, it almost seems like there is no other way to describe such problems. They are conceptual, ephemeral, not wholly in the realm of things you can see or witness, but only really describe.
Edit: Some Consequences of Four Incapacities is another that deals with how we understand things.
That's where 1984's "Newspeak" came from. See "Orwell, the Lost Writings".
Interesting. But the subject is the nature of the definition. What is the OED definition of definition (circularity intended):
a precise statement of the nature, properties, scope, or essential qualities of a thing; an explanation of a concept, etc.; a statement or formal explanation of the meaning of a word or phrase
Well that's nice. The first component would be amenable to a sclerotic positivism (which denied subjective phenomena as inaccessible to measurement ergo epiphenomena to be ignored; this jettisoned by contemporary cognitivism and phenomenology ); the second addresses the conceptual without a hint of pragmatic methodology; and the salient element of third component is the word meaning which OED defines as:
that which is or is intended to be expressed or indicated by a sentence, word, dream, symbol, action, etc.
So the definition of definition by the ipse dixit English authority on definitions alternates between a call for precision and some rather vague references to intentionality. That was the intent of the above tidbit on the topic of definition. Namely some labels for subjects are amenable to degrees of precision in definition while others with only conceptual referents will have their proffered definitions disputed, diluted, or otherwise hedged and seemingly imprecise.
Steven Stitch in Fragmentation of Reason which is a personal overview of contemporary epistemology alludes to the inherent vagueness of consensual definitions and eventually settles into what he calls pragmatic epistemology
It doesn't matter how contested a word is. You can nevertheless describe its main conventional uses. That would simply be an empirical observation.
Also, just a friendly suggestion: you are writing too much, and using too many long and unnecessary words. Simplicity is often better, both analytically, and to read.
“Put out” also has a zillion variants in those two pages, but itself also is in the list of defining words.
(Though again I'm unsure if the endless English phrasal verbs are counted as distinct in these estimates, not doing which would probably be cheating.)
It's impossible. An English dictionary defined using English words has to have cycles.
The combination of a set of semantic primes and the rules of combining them forms a 'Natural Semantic Metalanguage' , which is the core from which all the words in a given language would be built up.
The current agreed-upon number of semantic primes is 65 (see list at wikipedia links above).
That means that any English word can be defined using a lexicon of about 65 concepts in the English natural semantic metalanguage.
I'm going to get silly now, but I can't help but think the semantic primes - if you can avoid thinking of them as words or even conscious experience - represent some core set of cognitive axioms, like the primitive elements for constructing mental models. As you go to simpler life forms the "word list" would get smaller. If there is any truth to that, I wonder what potential primitives we are missing that would allow us to think more complex thoughts and whether you could measure species intelligence by their "vocabulary" and working out what concepts can't be expressed when one of the primitives is missing. What would happen if you lost the concept of above'ness?
The other thing I find interesting and it might be no more than a coincidence, is how there is only the numbers one and two and then you have to use many or more. This in some way matches up with the ideas of the Parallel individuation system whereby young children can only precisely recognize quantities up to 3, or 1 + 2 and an adult can only precisely recognize quantities up to 4, or 2 + 2. After that, the brain uses the Approximate number system. So it's like there are only 2 slots to place a quantity.
This and the rest of the comment remind me of the Pirahã language, in which there are purportedly two numerals but researchers can't figure out what they are: https://en.wikipedia.org/wiki/Pirah%C3%A3_language#Numerals_...
> Frank et al. (2008) describes two experiments on four Pirahã speakers that were designed to test these two hypotheses. In one, ten spools of thread were placed on a table one at a time and the Pirahã were asked how many were there. All four speakers answered in accordance with the hypothesis that the language has words for 'one' and 'two' in this experiment, uniformly using hói for one spool, hoí for two spools, and a mixture of the second word and 'many' for more than two spools. The second experiment, however, started with ten spools of thread on the table, and spools were subtracted one at a time. In this experiment, one speaker used hói (the word previously supposed to mean 'one') when there were six spools left, and all four speakers used that word consistently when there were as many as three spools left.
I assume you see 3 objects on a table as a triangle. It's probably not equilateral, but any three objects on a table describe a triangle.
Make sure you can see 4 as a square, not 2+2. If you're stuck on seeing two pairs (or lines), try seeing 3+1 (a triangle and a point) instead. Then incorporate the point into the triangle...
Next, see pentagons. ... That's it.
I haven't tried to see "six"... Five was hard enough. :P
It looks interesting, certainly, but rather arbitrary. There are several pairs of opposites, which in a minimal language could be handled with the concept of "opposite", and I have no idea how you'd express some fundamental concepts of human experience such as hunger, cold, pain or surprise, while "live, die" do not seem to me to be such fundamental concepts: they seem more like concepts that need to be defined, for example by a philosopher or medical specialist, rather than experienced directly.
Admittedly, the original question is specifically about the English language, but toki pona is a nice experiment related to this.
> "A friend!" Shouted back the man. He ran toward Zaphod.
> "Oh yeah?" said Zaphod. "Anyone's friend in particular, or just generally well-disposed to people?"
Adams, Douglas. The Restaurant at the End of the Universe.
"jan pona" mije li toki wawa. ona li tawa tawa jan Zaphod.
"jan pona?" jan Zaphod li toki. "jan pona tawa jan wan anu ale?"
jan Douglas Adams. ma moku lon pini pi ma suli.
Prior HN discussion: https://news.ycombinator.com/item?id=16847691, https://news.ycombinator.com/item?id=2359174, & others
0. Get a dictionary.
1. Form a directed graph, with an edge from each word to every word that uses that word in its definition.
2. Remove all words that have no outgoing edges.
3. If you removed some words, go to step 1. Otherwise, all words left in the dictionary are minimal.
EDIT: If anyone knows of a machine-readable dictionary, I'd love to actually do this.
So if "multitudinous" isn't used in a definition of another word, you remove it from the set. Maybe you then find out that "myriad" was only used in the definition of multitudinous, so you can take myriad out, and so on.
GPT-2 (of recent OpenAI fame) uses 1.5 billion parameters and, though capable of interesting results, is far from human level. It also uses just text so it's incomplete.
Another interesting metric is Bits Per Character - BPC. The state of the art is around 1.06 on English Wikipedia. This measures the average achievable compression on character sequences and doesn't include the size of the model, just the size of the compressed sequence.
Even then, one is rather constrained and definitions frequently cross-referenced other words to bootstrap the definition.
It sticks to a basic vocabulary, has an entry for every word it uses, and goes heavy on examples and pictures in preference to formal definitions. (And it's monolingual even though written mainly for learners in North America.)
I don't have it to check, but estimating from memory: around 2000 to 4000 words. I found it useful while bootstrapping up from Duolingo.
That is actually a really interesting challenge: to have a completely self-contained dictionary. Especially in 1963, before modern automation, the proofreading required must have been a Herculean task.
Perhaps this could be some kind of measure for answering this question in and of itself: what is the smallest useful self-contained natural language dictionary that one can write?
EDIT: Oh, fginionio came up with an intuitive approach to do this automatically below: https://news.ycombinator.com/item?id=19332041
The idea is to first index each word v in the lexicon of L (including w), starting at 1 and ending at n, whatever is the number of distinct words in the language. Alternatively, you can index _meanings_. Then (should be obvious where I'm going with this by this point) you map a sequence S_k of repetitions of w of length k in [1,n] to each k'th word, v_k, in L. So now L' is the language of n sequences S_1,...,S_n of w each of which maps to a word (or meaning) in L. And you have "defined" L in terms of a single word, the word w.
But that's probably not at all what the reddit poster had in mind.
However, it should be noted that natural language is such that there's really no reason that we have many words- it's just convenient and helps us create new utterances without having to create long sequences of one word, as above. The important ability in human language is that we can combine words to create new utterances, forever- which we can do with one word just as well as with a few thousand.
Finally, I suspect that if there was a minimal set of (more than one!) words sufficient to define all other words (meanings) in a language, all natural languages would converge to about that number of words- which I really don't think is the case.
I'm pretty confident the goal is to choose a smallest subset of English so that, if you know this subset of English and are given a dictionary written in it, you can learn the entire vocabulary of full English.
That means you're not allowed to create any new words, so you can't create the magic uber-word w.
> if there was a minimal set of (more than one!) words sufficient to define all other words (meanings) in a language, all natural languages would converge to about that number of words- which I really don't think is the case.
This amounts to saying there is little to no redundancy in language. I'm not convinced. For example, once you've got "one" and "plus", the words "two", "three", "four", etc. are just convenience. Another example might be opposites: if you have "down", you don't absolutely have to have "up". But the thing is, people really like convenient ways of saying things. In fact, the economics probably drive you toward doing this. It makes for shorter sentences think of it like data compression: if a concept occurs often, you want a dedicated word for it so you can just say that word instead of saying the definition.
So for English it would be rather easy to find this by looking up synonyms originating from France, Germany and even Scandinavia and of course latin.
Oh- w can be an English word. And the reddit post didn't say anything about not inventing a new language, with only English words (it would be a new language since it would have completely different grammar and semantics).
But I think you're right that what I propose above is totally cheating :)
Often the reason we have a word for a concept is precisely because no other combination of words would do. I'd suggest that the article's attempt to "define" one word with another is an oversimplification. It is not enough to sufficiently define a word, to convey its most common understanding. To declare a word superfluous, replaceable, one must define it absolutely. For many words I'd expect such a definition to fill an entire volume, not a short sentence.
The author also forgets that words have layers beyond literal meanings. Their tone, their length, and even their spelling can convey different meanings depending on context.
You don’t have to have felt schadenfreude for someone to explain to you what it is.
I took Websters dictionary from the project Gutenberg site. I started with 95712 words. After the initial throwing away of words that weren’t in any definitions, I was down to 4489 words. After expanding them, and throwing away words that weren’t in the expanded definitions, I was down to 3601 words. Setting recursive definitions as atoms and continuing got me down to 2565 words.
I feel this would be of interest to the thread, if anyone knows what I'm talking about or knows how to successfully Google for such a thing.
Finally, here you are. At the delcot of tondam, where doshes deave. But the doshery lutt is crenned with glauds.
Glauds! How rorm it would be to pell back to the bewl and distunk them, distunk the whole delcot, let the drokes uncren them.
But you are the gostak. The gostak distims the doshes. And no glaud will vorl them from you.
It has been on my to-play list for some time but I haven't got around to it yet.
"In Thing Explainer: Complicated Stuff in Simple Words, things are explained in the style of Up Goer Five, using only drawings and a vocabulary of the 1,000 (or "ten hundred") most common words."
You can read some of her books online, such as "Robinson Crusoe In Words Of One Syllable" .
Hope you are one of the 10000 lucky ones whose mind is blown for the first time.
Or another one: "1"
We could extend it to cover words not conceivable by humans, and any universe, by using a program to simulate those, but (1) I assume the question implicitly assumes human words, though (2) it wouldn't require more words anyway.
The baby learns the words via example, not by definitions.
You could have 100 synonyms with the same "definition" but 100 different shades of meaning, implied degree of strength, or connotations.
You don't necessarily simplify anything by making people add additional words get across those subtleties.
Of course of some are useless equivalents, but many aren't.
It's just a thought experiment about how much you could optimize one dimension (number of words) if you didn't care at all about optimization anywhere else in language.
Not all synonyms amount to useless equivalents.
Occasionally it's useful to use a different word simply because one can; sometimes the facilitous utility of alternate mots juste serves its own purpose.
No such thing as a synonym. On the face of it yes many words share meanings. But a mutt is not the same as a dog despite what thesaurus.com says
What is the minimum number of words needed to define everything else?
You might not have words for it, but a fully logical being can decipher any bitstream given enough interactivity.
So start from 1 and 0, form basis of mathematics and symbols, then start with physics from all the way bottom.
1) Y = set of words in every definition of the words in set X
2) X = Y - X (all words in Y that are not in X)
3) Repeat from 1 if the set of words in X has changed
Does that reduce all words down to the actual minimal set of words required to define other ones? Since you can build upwards from the resulting set X to get the original set of words.
Also, this reminds me of the knapsack problem a little bit (for example what is the minimum set of coins required to be able to make $X).
mentioned by mjgeddes in this very thread
Do they have the experiences relevant to the word being defined? If not, what experiences do they have in common with the person providing the definition?
How intelligent are they? Can they understand complex concepts through logic, through examples or both?
How much do they know about English (besides the few words assumed known)?
Of course I see the obvious bootstrapping problem where you relate the encoding starting with just those two words but ... somehow I think that's easier to overcome than it seems ... as in, I think it must be possible.
If Helen Keller can write a book, surely I can relate digital encoding to a toddler over the course of a year or three, right ?
Source: My memory of something I read at British Council Library 17 years ago.
What effect this really had can be observed with the introduction of mechanical printing presses which reduced spatial and temporal distances of information flow significantly.
The internet might yet be another of those things..
One thing is that toki pona has no built-in comparatives at all. A usual thing is to say something like
mi sona e ijo mute ala. jan pi pali sama li sona e ijo mute.
'I know not many things. My colleague knows many things.'
ona li suli taso mi suli mute.
'She is big, but I am very big.'
jan ni li jo e mani mute. taso jan ante li jo e mani mute mute.
'This person has a lot of money. But the other person has lots and lots of money.'
Another thing is that there's no built-in way to make a relative clause at all.
mi sona e toki. mama meli mi li sona e toki sama.
'I know a language. My mother knows the same language.' (As opposed to 'My mother knows a/the language that I know'!)
mi sona e toki. mama meli mi li sona ala e toki ni.
'I know a language. My mother does not know this language.' (As opposed to 'I know a language that my mother doesn't (know)'!)
ona li pali e ijo. mi sona e jan ante. jan ni li pali kin e ijo ni.
'She does something. I know another person. This person also does this thing.' (As opposed to 'I know another person who does what she does'.)
moku mute li kama tan soweli. mi moku ala e moku ni.
'Many foods come from animals. I don't eat these foods.' (As opposed to 'I don't eat foods that come from animals'.)
It's also extremely tricky to construct specific tenses and specific logical conditions. The particle "la" can mean "when", "because", "also", or "if", and is only supposed to be used once per sentence. This is especially challenging when trying to contrast things that have happened with hypothetical conditions. For example
jan olin ona mije li moli la mi mute li pilin ike.
I intend this to mean 'we feel bad because his romantic partner died' but we can't really disambiguate, for example, 'we will feel bad when his romantic partner dies' or 'if his romantic partner dies, we will feel bad'.
You can qualify things with "tenpo pini/ni/kama la" ('in past/this/future time'), but you're not supposed to use more than one "la" in the same sentence, so it's discouraged to write things like
?tenpo pini la mi moku e ni la insa mi li pilin ike.
'Because, in the past, I ate this, my belly feels bad.'
You can try to break these up into multiple sentences.
tenpo pini la mi moku e moku jaki. mi pali e ni la mi kama pilin ike.
'In the past, I ate gross food. Since I did this, I started feeling bad.'
This gets really challenging if you have to refer to several different things of the same sort, which perhaps have conditional relations to one another that apply at different times or in different circumstances. For example, if you wanted to say "when my mother arrived, the plane that she was on was very warm because it had a broken air conditioning unit which the crew didn't know how to fix", you might end up making a long series of sentences that tell a story.
tenpo pini la mama mi li kama kepeken ilo tawa kon.
ona li kama la kon lon ilo li seli mute.
ni li kama tan ni: ilo lete li pakala. jan pali li sona ala pona e ilo lete.
In the past, my mother came using an air travel tool.
When she/it arrived, the air in the tool was very hot.
This happened because of this: the cooling tool broke. Workers did not know how to improve the cooling tool.
But some kinds of conditions don't necessarily lend themselves well to this form, like if I wanted to say "if she had known that this would happen, she wouldn't have taken this airplane", or quantifiers like "every Singaporean who goes to school in Singapore learns English and whatever the government defines as his or her family's language" or "everyone who was inside the building when the earthquake happened got injured by some object"...
I don't feel confident about my ability to describe the truth conditions of the latter two examples in toki pona in a way that's faithful to the English original.
It's also unclear to what extent we're allowed to stack "e ni:" and "tan ni:" in order to embed indirect discourse and chained reasons.
?ona li pilin pona tan ni: toki pona li pona tawa ona tan ni: ona li toki lili li jo ala e nimi mute.
'She was happy because of this: she liked toki pona because of this: it's a small language and doesn't have many words.'
Edit: also, NSM explications assume that you're deliberately defining new vocabulary in order to expand your language, which isn't really customary in toki pona. Even if we figure out how to express a concept or situation in toki pona, we don't then acquire a single word that we can use for that concept or situation in the future.
- "backwards j"
- "a circle"
- "a cross"
- "n, but rotated ninety degrees"
- "mirror of p"
- "vv, except no gap"
- "pixel-wise union n and l"
- "mirror of s, and make the lines straight"
Semantics are impossible anyways, I challenge you to define the word "dog".
Challenge: Do better, make sure you don't have circular dependencies.
The tradeoff being density of information, understandability to the readers, and conciseness.
fixed it for you.
00110001 00110000 00100000 01110111 01101111 01110010 01100100 01110011 00101110 00100000 00100010 00110001 00100010 00100000 01100001 01101110 01100100 00100000 00100010 00110000 00100010 00001010 00001010 01000110 01101001 01111000 01100101 01100100 00100000 01101001 01110100 00100000 01100110 01101111 01110010 00100000 01111001 01101111 01110101 00101110
binary codes can be prefix-free thus self terminating.
To understand duck you must see a duck (Eat a duck, pet a duck, smell a duck, hear a duck)
Perhaps you could cheat and uses pixels and coordinates to use English to draw photos and videos to explain ducks.
Duck is the common name for a large number of species in the waterfowl family Anatidae which also includes swans and geese. Ducks are divided among several subfamilies in the family Anatidae; they do not represent a monophyletic group (the group of all descendants of a single common ancestral species) but a form taxon, since swans and geese are not considered ducks. Ducks are mostly aquatic birds, mostly smaller than the swans and geese, and may be found in both fresh water and sea water. Ducks are sometimes confused with several types of unrelated water birds with similar forms, such as loons or divers, grebes, gallinules, and coots.
But you could also describe a duck in two simple words: "water bird". Apparently that's a real term: https://en.wikipedia.org/wiki/Water_bird
See Genesis 2:19-20 (and its placement/context). God shows Adam forms to be named.
Every ancient-enough religion all starts out, effectively, the same way, in their own "religious dialect": the universe was created, life on Earth was created, and then Mankind invented language. This has made a lot of people very angry and been widely regarded as a bad move.
Things were simply not named before we invented language and named things.
I'm not saying that new words aren't created, just that in a practical sense, unless you're creating new words to mess with someone trying to do this, it doesn't apply.
You have my most enthusiastic contrafribularities.
With math a circular definition is unacceptable, and that when the theorem comes into play.