
Show HN: Language games – simple games made with word vectors - Der_Einzige
https://github.com/Hellisotherpeople/Language-games
======
r34
I wonder if there are any good sets of word vectors in languages other than
English (I'm specifically interested in Polish).

Same with any other language models and NLP related stuff. It's hard to find
anything valuable.

~~~
Der_Einzige
Yeah! Fasttext has some models in polish which I can convert into the
magnitude format for you! File an issue for it and I'll get it done

~~~
dcsan
Yes I would look to fasttext first, they have a huge library of vectors here:
[https://fasttext.cc/docs/en/crawl-
vectors.html](https://fasttext.cc/docs/en/crawl-vectors.html)

Actually facebook research are pretty good at gathering things, their ParlAI
project also gathers together other sources for conversational stuff.

------
ggggtez
It's interesting. There has been lots of ways to explore Word2Vec, but maybe
this helps people get an intuitive understanding of the strengths and
weaknesses better. (hint, the farther the words are from the target, the more
meaningless the distance comparison becomes).

------
YeGoblynQueenne
"Competative" word guessing?

(And it's "pedant", not "pendant").

~~~
PaulHoule
Yep, like most projects that use embeddings it doesn't seem that smart when
you look closely. It does "better than chance" but that's not a high standard!

------
johnnyheineken
Nice project!

I did something similar when I was still at school. Scripts which emulated
Codenames [1], and providing hints for given set of words [2].

I believe that is possible extension to your project - emulate Codenames
games:)

[1]
[https://boardgamegeek.com/boardgame/178900/codenames](https://boardgamegeek.com/boardgame/178900/codenames)

[2] [https://github.com/johnnyheineken/codenames-
AI](https://github.com/johnnyheineken/codenames-AI)

~~~
alew1
I’ve seen your Codenames work project — it’s neat!

I’ve also played around with word vector games — Robot Mind Meld [1] has you
and a robot working together to converge on the same word.

[1] [http://robotmindmeld.com](http://robotmindmeld.com)

------
SamBam
I'm failing to understand game 4:

    
    
        Player 0 it is your turn!
        The word you are trying to match in meaning is:
        antepenultimate
        Your char list is:
        ['i', 'g', 'j', 'u', 'b']
        Please input a valid word made from some or all of the provided characters
    

What's the right answer?

~~~
ggggtez
Importantly: there was no "right answer" in mind when it created the question.
The letters were random, as was the word.

It calculates the distance in vector space from your word to the given word.

I think it helps if you understand what Word2Vec is, as then it becomes clear
what is going on.

~~~
dcsan
so you didn't have a word near penultimate and then pick some letters from it?
It's really just open-ended what the users should choose? That does make it a
lot harder...

------
dcsan
btw what did you like most about using the magnitude format? lazy loading
seems good, depending on if it takes time to load the sqlite DB at each
startup. That's one of the major pains of working on anything with word
embeddings is that server reloads / hot restarts take forever.

The fastText vectors (see below) also use subword embeddings so I think they
had potentially better results. I used FT for some Chinese stuff and I think
it worked better for that since chinese _characters_ are so much more
important than latin scripts.

Seems magnitude python API also has some other features like POS tagging, I
wonder how that compares to say spaCy.

------
dcsan
these are neat, it would be fun to put a reactJS frontend on some of them!

