

A Better Strategy for Hangman - icefox
http://www.datagenetics.com/blog/april12012/index.html

======
lorax
An even better strategy would be to weight your guesses towards the most
information gain. If a letter appears often, but is in the same position most
of the time, getting it won't help you as much as getting a letter that
appears in fewer words but in more varied positions. Doing a quick check of 5
letter words in my /usr/share/dict/words file I find: S: 659,62,267,260,2282
E: 122,829,448,1254,679 A: 252,1201,678,410,307

That is, S, when it appears, is heavily skewed towards the last letter in the
word, finding it won't help you figure out the word nearly as well as finding
the more evenly distributed E or A.

~~~
Natsu
Doesn't this all assume that the opponent is choosing a random word? If
they're trying to be evil, I'm sure they can pick words where each letter you
get gives little information.

For example, which letter do you guess if the puzzle is _og? You still have to
go through bcdfhjln even after you've burned however many guesses getting that
o & g.

~~~
ars
I once wrong a hangman game that did this in reverse. The computer made sure
to make you be as wrong as possible, while still ensuring your guesses were
"correct".

It had one problem in that it always tried to make sure you guessed wrong, but
by carefully choosing your letters you could cause it to exclude tons of words
because they contained the letter you guessed, leaving only a small number of
possible words.

It needed a refinement to occasionally allow you to guess correctly if that
maximized the number of possible words remaining that still fit the prior
guesses.

------
r00k
I loved this blog post (and the other one I read, about Battelship), so I
wondered who was behind these great posts.

Their homepage says this "DataGenetics is a technology consultancy
specializing in unlocking the value stored in large databases. Using a variety
of techniques we can mine the trends in your data to help you maximize your
marketing and advertising campaigns."

Damn.

Yet another group of smart folks focused on more-effective advertising.

~~~
apawloski
That's where the money seems to be. It's a shame, but I can't blame groups
like these.

~~~
rbarooah
Why not? Do you think people have no responsibility for their choices other
than to maximize their financial gain?

~~~
apawloski
No. I don't believe that a person's intelligence level makes it appropriate
for any other party to decide what she should and should not do. I also
question what "responsibility" an intelligent person has in their career
choices. Responsibility to whom? Society? Should they be out curing cancer
instead? I cannot accept the notion that people in the aforementioned position
have a responsibility to anybody but themselves and their families.

~~~
rbarooah
I never said anything about intelligence level or other parties deciding what
people should do.

Are you able to consider the idea that someone's responsibility to themselves
and their families might lead them to make decisions other than maximizing
financial gain?

Also - I'm curious as to why you think family members are a special case?

~~~
apawloski
I think you may be reading my previous comment without considering its context
in the thread. I was initially writing in response to the observation that
this was "Yet another group of smart folks focused on more-effective
advertising." It wasn't clear to me that you and I pivoted from this.

Of course I'm able to consider the idea, rbarooah. What I reject is the notion
that we can project our own values on a person and then blame him for not
fulfilling them (thus my original comment).

~~~
rbarooah
I do recall the context.

What were you trying to convey with the phrase "It's a shame?" Was that not
your judgement of their choice?

Also, how do you propose to prevent other people from blaming one another for
things?

You didn't answer my question about why family members are exempt from your
general view - which I am genuinely curious about.

------
jtheory
I kept waiting for the other foot to drop, and it never did.

This is great if you're playing hangman against a computer. Probably, you're
not.

You're playing against a person, who probably has some sense of your
strategies.

Even when I was a kid, we didn't play hangman by choosing random words. What's
the fun in that? You notice how the other person plays, and in the next rounds
you pick words that break their strategies.

You notice they are doing the "common letters" or even "common vowels"
strategy, and you give them "lynx".

Or you trick them with a word that has a few easy-to-get letters, but is going
to be hard to guess the last few because there are so many possible matching
words -- I had a great one that I forget now... maybe "budder"?

Did everyone really play with randomly-chosen words? What a waste.

~~~
thyrsus
My favorite strategy: four letter word. Say no to everything until they guess
U: _ U _ _ then decide on the least likely word which includes none of their
guesses, often "junk".

~~~
doctoboggan
That isn't a strategy, that is cheating. ;)

------
hythloday
It's an interesting article, but I would have thought that the best* letter to
guess is the one that appears in as close to 50% of the possible words as
possible, rather than the most likely to appear, given that the hangman is
drawn per error, rather than per guess. Is my reasoning wrong?

* assuming as he does at the bottom the existence of a computer.

~~~
sdevlin
Your intuition is telling you that you should shoot for 50% so that you can
divide the search space in half regardless of the outcome. A right or wrong
answer will eliminate half of the possible remaining words.

But right and wrong guesses are not equally valuable in hangman. A wrong guess
will eliminate all words of a given length containing the given letter. But a
correct guess gives you more: you get the location (or locations) of the
letter in the word as a bonus. This will reduce your remaining search space
drastically.

Thus, you should optimize for the success of your next guess.

~~~
karamazov
Additionally, you're not penalized for correct guesses.

------
vasi
This is great, but it's just a first-order strategy! Once your opponent knows
you're following these (quite rational) rules, she could pick words where the
rules fail particularly badly.

I look forward to seeing your analysis of how to deal with that :)

~~~
jerf
The strategy given isn't to solve the puzzle, but to get at least one letter
on the board. So if you look at the table at the end, the worst case is a
three-letter word that only contains a K out of the letters given, which is
the worst case scenario on that entire board. Assuming your opponent sticks to
the dictionary there is no way for them to ruin this strategy, only push you
into the worst cases given, which once you figure out what's going on will
play to the guesser's advantage, not the word-chooser.

------
jh3
> There are no two letter words containing the letter C, Q, V or Z.

Does 'qi' count? <http://www.merriam-webster.com/dictionary/qi>

I know it counts when playing Words with Friends...

I also think 'za' is valid when playing Scrabble type games, but 'za'
technically isn't a word.

~~~
davnola
IIRC "zo" is also valid. It's a Tibetan yak hybrid.

Hmmm... Yak butter tea.

~~~
waqf
In case you cared, "zo" is valid in the UK/international Scrabble wordlist,
but not in the US (yet, until we persuade them to use the same wordlist as
everyone else).

------
kd5bjo
This came up as an pre-interview employment challenge that I did a few months
ago. I've uploaded my code here: <https://gist.github.com/2242816>.

I could make good arguments for both "most likely to be present" and "most
information gain", so I implemented a mixed strategy. I calculated the
information gain (in bits) and the probability of a correct guess, then used a
linear combination of them to determine which letter to guess. Additionally,
the weights changed over the course of the game so that correct guesses were
valued more as incorrect guesses became more scarce.

To determine appropriate weights, I ran a crude Monte Carlo simulation. The
final weights I ended up with were 0.60:0.03 with no incorrect guesses and
0.54:0.69 once there aren't any strikes left. (bits information gain :
p(correct guess))

------
squeakynick
Good comments. Being paranoid, I double checked the database I used! Neither
'qi' nor 'za' are in the database I used.

I'm not arguing that they are, or are not, words. I'm just saying that they
were not in the dictionaty file I used :)

------
onemoreact
Cool, but assumes all words have an even probability.

Vs a week player the ideal word list should only include 'common' words.

Vs an ideal player you need to assume they will pick from the list of words
that you least likely to guess using optimum play.

Generically, optimum play ends up with a somewhat random list, aka 90% of the
time pick E, 10% of the time pick I etc.

PS: Actually generating this list is a 'hard' problem and vary dictionary
dependent, but you can probably get reasonably close using some sort of
genetic algorithm and a enough simulation time.

~~~
eru
> PS: Actually generating this list is a 'hard' problem and very dictionary
> dependent, but you can probably get reasonably close using some sort of
> genetic algorithm and a enough simulation time.

Sounds like a challenge. I don't think, given a dictionary, finding the ideal
strategy for both parties will be that challenging. It's a fairly
straightforward two person zero-sum game. You can either model it assuming
hidden information, or concurrent play. (Which is basically the same here.)

As an interesting variation, you might allow the chooser to cheat: I.e. don't
make them write down the word in the first place, just require their play to
be consistent.

------
K2h
Two possible improvements, that I think may be original (thought I probably
missed it in the article and comments)

1) A hit or miss of a letter tells you a lot about what the new subsequent
optimal guess is. You could make a flow chart (a really big one) that shows
you the optimal letter to call out next based on what has hit and missed.

2) the position that a letter has hit tells you a lot about the target word.
if you have a computer, the regex becomes trivial to identify the next optimal
letter to guess. an 'optimal' table could be generated based off this pattern,
and it would be a huge table.

I kept feeling like every step of the way it was an infomercial, 'but wait...
there's more!' and I like that, it got me thinking.

I'd love to have my hangman bot go head to head with yours on random words.
that would be a fun little project.

------
eshvk
Very cool. I was implementing Hangman a few weeks back and I figured out the
first strategy, use the frequency distribution for the specific length to
initiate your guessing. However, I thought about but discarded the second
strategy of conditioning on the previous wrong/right guesses and selecting
words and then doing a histogram of the letters. Thinking back, I find it
interesting that thinking about it from a complexity point of view made me
dread that approach because it becomes significantly more expensive to do the
recomputation of frequency distributions compared to blindly uses the naive
independence assumption and I was not sure how much of a win I would get by
adding that bit of optimization.

------
hammock
Instead of a table at the end, what if you were to boil it down to a single
list of letters that I can easily memorize? You could do this by weighting the
final table with the number of 1-, 2-, 3-, etc -letter words in the
dictionary.

Put another way, what are the letters I should guess when I don't know the
length of the word?

While it wouldn't be perfect strategy, it would be easier to execute.

------
brownbat
There are lots of good further caveats in this thread.

We should have a hangman AI tournament.

------
grandalf
Does this mean that RSTLNE on wheel of fortune is non-optimal too?

~~~
dwd
Given your perception can generally fill in the vowels - picking any early on
would seem non-optimal.

~~~
grandalf
I think it's more a goal of eliminating uncertainty about which letter
occupies a slot.

For example, if I have guess T and E and see _ _ E as the word I know it's not
the word "The".

------
rgejman
This seems like a problem ripe for solving with a markov model.

------
nraynaud
Playing hangman against Sheldon Cooper was a mistake in the first place :)

