
Bot that uses deep neural networks to generate plausible definitions of words - Houshalter
http://lexiconjure.tumblr.com/
======
_jomo
_ycombinator: n. a person who is designed to deceive or show hostility to
acceptable situations._

Best possible outcome.

[http://lexiconjure.tumblr.com/post/139734312765/ycombinator#...](http://lexiconjure.tumblr.com/post/139734312765/ycombinator#139734312765)

------
yoo1I
That's actually one of my favorite party games: Get out a volume of an
encyclopedia ( I know, right!), find a word whose meaning is likely unknown to
the circle of people eagerly sitting around the room with a scrap of paper and
a pencil, announce it, and have everyone make up fake definitions of the word.
Mix them in a hat along with your copied down correct definition, read them
aloud and then everyone votes on the most likely to be correct. The person who
collects the most votes wins.

 _So_ glad this tedious procedure finally got automated! Well... almost, we
still need the bots to vote for the most plausible one.

~~~
mgraczyk
Your game is called Balderdash or Fictionary.
[https://en.m.wikipedia.org/wiki/Balderdash](https://en.m.wikipedia.org/wiki/Balderdash)

~~~
yoo1I
Balderdash![0] I'll be the judge of what's called "The Dictionary Game"!

[0] [https://youtu.be/mFlCnohRzJ0](https://youtu.be/mFlCnohRzJ0)

------
skybrian
I wonder if there might be a way to generate a dictionary definition based on
a word2vec vector? It would be cool to generate a word that's halfway between
two other words, or to complete an analogy.

~~~
Houshalter
That's a really good idea. It would test how much semantic information the
word vector actually contains about the word.

Another idea is to generate word vectors _from_ a dictionary. Instead of
trying to infer it's meaning from context, a dictionary gives you it's meaning
directly. And even alternate meanings. Or a hybrid approach might be best. I
recently read a paper where they used wordnet relationships to improve regular
word2vec vectors a lot.

------
Houshalter
It responds to tweets at:
[https://twitter.com/lexiconjure](https://twitter.com/lexiconjure)

The code is available here:
[https://github.com/rossgoodwin/lexiconjure/](https://github.com/rossgoodwin/lexiconjure/)

------
taliesinb
_neobayesian: n. a small round beetle which is cultivated for its foliage and
feeding on trees. Genus Neobaeya, family Characidae. mid 19th century: from
modern Latin Neobayea (from Greek neobaea ‘blood vessels’) + -AN._

What delightfully deranged nonsense!

 _ojno: n. (pl. ojnos) 1 a small piece of metal with a long glazed stem and a
pointed snout, used for making soft fabrics._

It's like funhouse mirror Borges.

------
dnautics
This one is deep in the uncanny valley. The definitions feel like an ml
trained thing trying too hard, lack the spark of just-so humor. I'll take
"felis catus is your taxonomic nomenclature" over this any day.

------
pbnjay
This is awesome! Can I get one that goes the other way? I give you a
definition and you give me a plausible but invalid word...

~~~
cfcef
That's possible. It's using Andrej Karpathy's char-rnn. Presumably it's doing
something like running the trained model with 'th sample.lua -model
something.t7 -primetext '$WORD, ' and taking everything after the comma as the
definition. So to reverse this, you would take the dictionary corpus, remove
'^$WORD, ', and suffix ', $WORD$'. Then it will be training to predict a final
word conditional on the definitions, and you can do the same thing with the
new model, feed in '-primetext '$DEFINITION, ' to get out a word.

~~~
themadstork
Hey, I'm the creator. This would probably work, you'd want to use a unique
delimiter character that's not in the rest of the corpus --- so not a comma.
(I'm using the pipe character.)

------
maaarghk
I absolutely love this. Something about it really appeals to my sense of
humour on top of being impressed with the technical aspect of it.

------
sasas
> compsci n. (pl. compscises) a small compartment in a ship’s compass.

------
sigmar
would be cool to use this to derive definitions of slang

------
pmarreck
This actually doesn't seem that accurate...

~~~
Houshalter
It's very impressive as an AI demo. While it's far from perfect, it seems to
have a much better grasp of English than previous examples I have seen. It's
picking up on small subtleties, it's grammar and spelling are decent, and it's
even remembering long strings of arbitrary characters. Which a difficult task
for RNNs, which have to learn many "memory cells" from scratch using millions
of floating point ops. And it's doing all that with an insect sized brain.

I don't know why this is. Perhaps training on dictionaries is a really good
way to teach NN's about English. Who would have guessed?

------
logicallee
The first definition seemed promising:

>Venprigon

>

>a city in SW Russia, on the River Danube; pop. 123,600

>(est. 2002).

No reason I wouldn't accept it. In particular the Danube flows through Russia
(edit: apparently not, but Russia borders it and is mentioned in the Danube
article). Why not. The name seems Russian enough to my ears (don't speak it)
and the population given is small enough that I wouldn't have heard of it (if
it had said 10 million that would be a give-away). Pass with flying colors to
someone not great at Geography. Probably NOT a pass to someone who knows
Geography well, as the Danube does not flow through Russia. 10/10 for me.

But it drops off immediately:

>onomierren

>

>n. [mass noun] a disease caused by a strong feeling of blurred and deceptive
movements of the teeth.

>

>mid 19th century: from INO- ‘one’ + Greek meros ‘marriage’.

The whole definition is completely non-sensical, this is a 0/10\. It doesn't
make sense for a disease to be "caused by a feeling" (feelings aren't ever
listed as causes of disease, but rather as symptoms) and the feeling of
blurred "vision" might maybe make sense if someone isn't reading carefully,
but blurred, and deceptive, movement of teeth doesn't even pass the least
attentive reading. The fact that it's listed as a mass noun is okay (lots of
diseases are), but the etymology isn't even trying: INO- doesn't mean one in
any medical language (everyone knows it's mono, or maybe uni-), ino- isn't
even in the word onomierren, meros isn't Greek for marriage , and even if it
were, what the hell wuould "one marriage" have to do with a disease caused by
the movement of teeth. This is a 0/10.

The next one:

>clapter

>

>n. a person who delivers a clapted book, especially a computer file or
television programme or a program.

since I don't know what a clapted book is, it sound plausible until the
repetition "or television program or a program", neither of which sounds like
something one would deliver. If it just said "a person who delivers a clapted
book" I might find it plausible. 5/10.

>fengler

>

>n. a person who fengles or shares a fengue.

10/10\. I don't know what a fengle is but this seems perfectly plausible to
me.

>ambistrate

>

>n. [BIOLOGY] a plant or animal that is extremely hard or wide, as in a small
or more liquid or gland.

>

>early 18th century: from Latin ambistratus, from ambi- ‘money’ + stare ‘to
stand’.

Again this is completely non-sensical. Ambistrate sure sounds like a word,
specifically a verb, but it is then listed as a noun. Well okay. A plant OR
animal? Weird. That is extremely hard OR wide? Okay. And then it just drops
off to complete random garbage "as in a small or more liquid or gland." You
can't even parse that grammatically. It's just random words.

The etymology sucks, ambi- doesn't mean money (ambiguous? ambivalent?
ambidextrous? etc), stare sounds okay to me.

This is like a 1/10.

>forepiscate

>

>n. [BIOLOGY] a plant or animal that foresees or is produced by a foreperson.

>

>forepiscitic adj. forepiscity n.

seems completely improbable, firstly for a plant to be able to foresee, this
word (foresee) would have to have some meaning I don't know - and secondly,
the definition says that a foreperson can produce such a plant. This is
garbage, 1/10.

>salakala

>

>n. [mass noun] a Japanese colour like that of salad colour.

>

>Italian, literally ‘salted pepper’, from Latin salus ‘salt’.

again all over the place. we don't talk of "salad colors", and if it's a
japanese color (which salakala sounds like it could be) why is it given an
Italian etymology. Completely implausible, 1/10.

>quanspor

>

>n. a small round board on a plane figure with a slightly unstable joint.

>

>late 18th century: from Latin, ‘born’.

I guess the definition could sort-of make sense, but born in Latin is
something around natus (nativity scene) or something with nasc- like
"nascent", or reNaissance (rebirth), or that sort of thing. This quanspor crap
doesn't share a single syllable.

Like 5/10 due to the technical jointmaking definition being nearly plausible
to me.

In sum, I would say this program does an _extremely_ poor job of deep
learning. Since Greek and Latin stems are in many ways predictable (after all,
lots of new words have been coined with them) it should do a much better job
of etymology and word construction. Then, too, it doesn't derive meanings that
are plausible from existing words and definitions. Instead it kind of seems to
just dump words together.

As a deep learning project I would say this shows very poor results. It's not
even shallow learning. I would be easy to trick by mentioning things like "a
flying insect of the genus __something i don't know__ native to __some
place___". I would also be extremely easy to trick via medical and other
technical invented terms - as long as the invention isn't something completely
implausible like a disease caused by a feeling of blurred and deceptive
movement of teeth. Why can't it say it's caused by .... something that ever
comes after the term "caused by"?

interesting project, but very poor showing IMHO.

~~~
themadstork
Hey, I'm the creator, and I agree that it's a bit hit or miss. I think the
definitions from the words it generates on its own every 90 minutes (these are
the ones without anyone's twitter handle at the bottom) are (on average) quite
a bit better than those from the words people are tweeting at it, which it
typically (and somewhat expectedly) has a bit more trouble with.

This bot was really just a creative experiment, and I'm pretty new to machine
learning, so I'd love to hear any specific suggestions you might have to
improve it.

~~~
sdenton4
The interesting pattern I noticed in the output is just lack of context in
word roots. So you get a word that's clearly a portmanteau getting a
completely unrelated definition.

Fortunately, breaking a word into parts is exactly the kind of thing a
convolutional network should be great at...

