

Hacking Scrabble (part 1) - zephod
http://blog.zephod.com/post/14167418899/hacking-scrabble-part-1

======
msluyter
Note: apparently he's using the SOWPODS (combined british & north american) 2
letter words. The tournament word list used in north america only contains 101
2 letter words:

[http://en.wikipedia.org/wiki/Official_Tournament_and_Club_Wo...](http://en.wikipedia.org/wiki/Official_Tournament_and_Club_Word_List#Current_edition)

Here's the TWL 2 letter word list:

<http://wordsolver.net/?tpl=twl2>

~~~
pge
I'd vote that memorizing a hundred 2 letter words (many of which you likely
already know) is easier than memorizing 26 mnenomics. or, you can just know a
bunch and guess every once in awhile, gambling that your opponents also don't
know all 101 for certain and don't want to risk challenging the one you put
down;)

~~~
zem
actually, once you move to even the low-division tournament level, pretty much
everyone knows all the 2 letter words.

------
famousactress
I wonder if you could devise a series of phrases where every consecutive pair
of letters in the phrase is a valid two-letter word. Then it might be possible
to mentally retrieve the entire list by remembering a handful of phrases,
instead of one-per-letter.

[Edit: Turns out you can get kinda close: <https://gist.github.com/1474006>

This is really quickly scratched together (and there may be bugs), but there
don't appear to be that many valid words left completely out when using this
strategy against an american dictionary of 3k words. I'll bang on it more
later, or someone else feel free to do better]

[Edit: After seeing @msluyter's comment, I adjusted to look for only the
American 2-letter words. Ended up with much better results. This list of words
(and possibly a subset of them that I'm not checking for yet):

    
    
      'MELINE', 'ESOP', 'HEMAD', 'SHERE', 'WER', 'UNENAMORED', 'MYOHEMATIN', 'REHONOR', 'HEMATOSIN', 'ANEMATOSIS', 'TORET', 'HEMATID'
    

Covers all but these valid words:

    
    
      'AA', 'OE', 'ZA', 'QI', 'UH', 'AW', 'AY', 'BY', 'XU'
    ]

------
uptown
On the topic of Scrabble, can anybody provide some insight into how they'd go
about architecting the database for a game like WordSquared?
<http://wordsquared.com/>

It's similar to Scrabble ... but it has no playing-area boundaries. I'm
curious what the best way to represent this type of game-board data in a
database would be and was hoping somebody smarter than me could help make me
smarter.

~~~
jerf
You're pretty free in how you want to represent the database, because you're
unlikely to ever want to query (quickly) based on the contents of the tile, so
representation doesn't much matter. My first cut would be a 64x64 tile,
indexed by x & y from an arbitrary 0, each of which contains a 64x64 = 4096
(4K) string representation of the contents of the tile. There will be lots of
spaces, so my first cut at efficiency would be to simply do inline RLE and
embed ASCII numbers in the stream that represent the number of spaces, ie,
"4096" = empty tile, "2047a2048" represents a tile with one a in the middle,
etc. (This preserves some human readability, where gzip does not, but you
could use that too.) But frankly, at 4K+noise per tile, and no compelling need
to ever refer to the entire table at once, even the stupid string
representation will get you a _long_ way before it bothers any serious DB
server; that's enough to store about 250-ish million tiles in a single
terabyte (a bit of fudge in there for overhead). (And the live set would be
small enough to fit in very little ram so I'm not too worried about that.)

By the time that representation is your Biggest Problem, you'll be able to
afford someone to do something more clever.

------
powrtoch
I upvoted this in hopes of generating some discussion about whether there
might be a better way to do this. Here's one idea:

With very few exceptions, all of these words will contain at least one vowel.
This suggests that we can (for the most part) compress the data from: (1) a
list of 24 lists of words, to (2) a list of 6 sets of 2 lists of words.

So now "a" looks like this: bdefhjklmnptyz -a- abdeghilmnrstwxy

Now you can apply the same mnemonic trick to each of these word lists as the
author does in the article, but you wind up only having to memorize 12 lists
instead of 24, plus a slight overhead for non-vowel words (there are 4).

This can perhaps be improved further by removing redundancy somehow (though
maybe not, as the redundant structure might be easiest to check mentally).

~~~
zephod
Author here. To be honest I posted it in the first place to hopefully generate
some discussion about whether there might be a better way to do this(!). I
think memorising 26 mnemonics is possible (and i disagree with the below
commenter who says 124 individual words might be easier...) but perhaps the
information can be further compressed. Can you elaborate on how your method
works a little? How are the two lists derived?

~~~
powrtoch
Oh sorry, classic case of not realizing not everyone hears your thoughts :-)

For "a", there's a list where "a" is the first letter, and a list where "a" is
the second. What I wrote out above is an attempt to show the structure. The
letters on the left are possible prefixes for "a", the letters on the right
are possible suffixes. Then I stuck "-a-" itself in the middle because that
seems like a good visualization.

------
imurray
There's a very good review of what it takes to build a state-of-the-art AI
Scrabble player in the following paper (appears to be open-access, apologies
if not):

World-championship-caliber Scrabble, Brian Sheppard, _Artificial Intelligence_
, 134(1–2):241–275, 2002.
[http://www.sciencedirect.com/science/article/pii/S0004370201...](http://www.sciencedirect.com/science/article/pii/S0004370201001667)

There's more to it than you might think. In particular, I was initially
surprised about how much of the game is about rack management. Although I know
this won't be a surprise to people who've played Scrabble seriously.

------
pheelicks
I don't really think it is worth generating these mnemonics. A lot of the
words in that list are really common words so you're making it harder for
yourself by re-remembering them in mnemonic form.

I think you should remove all the words you know first, as this will make it
much easier to remember. After that if you want to use the mnemonic approach,
then if that works for you then great.

Although, I learned most of the weird ones by just playing against a computer
which used them a lot. Eventually you remember them all. Incidentally, this
way you actually learn a couple interesting new words. Three-toed sloth
anyone?

------
jphackworth
It's not that hard to memorize the 2-letter words. Most of them do have a
meaning that makes sense. So it's easier to remember

aa = lava

ab = your abs

ad = tv ads

ae = scottish "one"

ag = agricultural

ah = ah that feels good

ai = a sloth

al = a tree

am = i am happy

an = an apple

ar = arrr i'm a pirate

as = as you were

at = where you at

aw = aww that's cute

ax = the weapon

ay = ay caramba

than it is to memorize "abdeghilmnrstwxy".

~~~
onemoreact
Comparing his approach vs yours I can't help but think a: _birthdays mangle
wax_ is easier to remember than that list.

~~~
fr0sty
But it is easier to check the the GP's solution:

    
    
        return definitions.get(candidate) != null;
    

vs

    
    
        foreach c in definitions(candidate.charAt(0)) {
            if c == candidate.charAt(1)) return true;
        }
        return val;
    

That leaves aside the general usefulness of knowing the definitions of words.

------
gillnana
This is very similar to anamonics[1], which are usually used for remembering
longer words.

Here is a massive list of them:
<http://www.poslarchive.com/math/scrabble/anamonics-twl2.html>

[1] <http://en.wikipedia.org/wiki/Anamonic>

------
bwwhite
I think these may be easier to memorize these if the mnemonic started with the
letter in question, where possible.

------
fjania
A while ago I put together <http://twoletterscrabblewords.com/> to visualize
the two letter pairs that were possible. If you click on any of the tiles
you'll see the valid pairs, unless the tile is grey, in which case there are
no pairs for it.

I was hoping that it would be useful as a study tool for people who are more
visually inclined - I wouldn't have considered using mnemonics originally.

------
ja27
What about instead using a little base 2 to memorize? So 'b' can follow 'a' or
'o' and be followed by 'a', 'e', 'i' or 'o'. Take a as 16, e as 8, i as 4, o
as 2 and u as 1. So if you memorized '18b30' you could decode that in your
mind to the right combinations for 'b'. Obviously this doesn't work for the
combinations with two consonants and the words ending in 'y', but it's a
start. I think you'd pretty quickly learn common values like 30 is 'aeio'.

------
silverlight
Looking forward to the conclusion of this! Interesting and as a side benefit
might help me finally get a leg up on playing my wife in Scrabble :-).

------
stuaxo
Funny I've been thinking about just this .. .my gf is far too good ... I'll
probably never beat her but getting within a reasonable percentage seems
reasonable. Maybe writing some minigames would help. Not just for learning two
letter words, but helping with stuff like being able to quickly spot popular
prefixes and suffixes..

------
jmlacroix
How about approaching the problem from the suffix point of view?

Instead of having 25 prefix having suffixes, you have 20 suffixes having
prefixes.

The prefix technique has an average of 4.96 suffixes per prefix, and the
suffix one has 6.2 prefixes on average for each suffix.

Maybe this would be easier to deal with?

------
ttttannebaum
Looking at your Python script. Might want to learn about the built-in set()
functionality. For exmaple:

def uniqueletters(s): return set(c for c in s)

def intersect(word, charlist): return
len(uniqueletters(word).intersection(charlist))

~~~
ttttannebaum
In fact:

max(wordlist, key=(lambda w: intersect(w, charlist))

should replace a good portion of the create_mnemonic function. But now I'm
just showing off. Just letting you know that Python is much cooler than you
know.

Haskell, however, could probably do this in a few lines.

~~~
zephod
I've got a pull request sitting on GitHub showing me exactly this. Awesome!
I'm about eight months into Python but my knowledge isn't particularly deep.

------
jgw
It's unfortunate that some Scrabble players are more likely to know dozens of
obscure two-letter words than how to spell "mnemonic".

~~~
zephod
A very good point. Poor quality control... I can fix that now :-)

~~~
jgw
Sorry, that probably came out more snotty than it should have.

My beef with Scrabble is that it really isn't a word game, so much as an
exercise in memorizing arbitrary strings of symbols. It's unfortunate, because
it doesn't really encourage people to develop their vocabulary, as in
"learning new words and their meanings", which is generally useful. Rather, it
encourages people to learn what arbitrary combinations are legal in a given
word list.

Ad absurdum - at one point, the world Scrabble champion was a fellow who
didn't actually speak English.

~~~
omaranto

      Ad absurdum - at one point, the world Scrabble champion was a fellow who didn't actually speak English.
    

That's not too surprising unless he played in English. Scrabble has editions
for many languages.

~~~
Someone
The guy being allied about is Panupol Sujjayakorn
(<http://en.wikipedia.org/wiki/Panupol_Sujjayakorn>). He is Thai. In Thailand,
scrabble is used to learn English.

Unlike with Chinese, it is not impossible to make a Thai scrabble, but I doubt
there is one, given the size of its alphabet
(<http://en.wikipedia.org/wiki/Thai_alphabet>)

~~~
zem
i've been to the king's cup tournament in bangkok a couple of times. there's a
children's tournament run in parallel in the same venue, and it was a truly
amazing sight to see thousands of kids squaring off across scrabble boards. i
truly wish school scrabble would take off in india; as it is, this year we
didn't even manage to find two kids to go for the world youth scrabble
championship :(

------
zem
this was pretty much the way i memorised the 2s, except i did the mnemonics by
hand. (i've written a lot of little apps like this but from the inverse point
of view - i type in a hand-generated mnemonic and it tests it for correctness,
and tells me what letters are missing or incorrectly there)

