I'm a non-coding lurker here, but the only thing that ever motivated me to write my own program was to solve the newspaper anagram puzzle, "Jumble", faster than my wife. I wrote it in Basic, then re-wrote it in assembly language, I forget when. Then in Fortran while home bound during the first summer of the pandemic - to atone for my past sin of neglecting to learn Fortran when I had a chance in the late 1960's.
About thirty years ago I had an account on a computer system that had a Boggle game, and I got tired of losing to it. So I wrote a program (in C!) which, when run, would search the grid and print out a list of all the words it found. Since I was using a bigger dictionary than the Boggle game itself, I found a lot more words. Ha ha, take that!
I proudly boasted to a friend about my winning Boggle solver and they said it was the pettiest thing they had ever heard of.
I did the same thing!
I stumbled upon an old ‘Al Zimmerman’ programming contest that involved creating wordsearch grids with the most words possible. I had a technique in mind, wrote it in BASIC. Then realized that was too slow. So I tried it in Java (which I’d heard was easy… it wasn’t… st least not to me). And then finally wrote it in C++, a language I found perfect in all respects. I actually ranked near the top in that contest, and I’ve been programming ever since. Ironically, I write programs to make puzzle books and I even have a Jumble knock-off on Amazon (sans cartoons… can’t draw worth a damn.)
With respect, the moment you mentioned "re-wrote it in assembly language"...you stopped being a non-coder! :-) By the way, very cool use of technolgoy for somewhat everyday things (solving puzzles).
I did similar for a newspaper puzzle unscrambler, pushing to rewrite it in lower level languages (PowerShell then C# then Rust) and changing algorithm (from sort letters, to precompute lookup hashtable of sorted letters, to multiply prime numbers one for each alphabet letter to drop the overhead of sorting, to lookup hash of those, to sorting the integer results into an array to do a binary search through and jump to the matching offset in an answers array and inlining those in the code) until it was near enough instant.
... and I don't use it, because unjumbling the word myself is satisfying, but typing the letters into a computer and getting the answer isn't.
Not sure if you're asking about the technique, or the math, or the comparison with sorting, so here's a long answer of all of them - start like the secret codes from childhood by giving numbers to letters so A=1, B=2, C=3, D=4, etc.
Then go through a word and find the values for each letter and multiply them together, e.g. "tab" is 20 x 1 x 2 = 40 and hopefully an anagram that just rearranges the letters gets the same answer because multiplication doesn't change if you shuffle the numbers around, e.g. "bat" is 2 x 1 x 20 = 40 which is the same, "bat" and "tab" are anagrams... but with the integers it doesn't always work and different words can clash e.g. "fab" 6 x 1 x 2 = 12 and "cad" 3 x 1 x 4 = 12 have the same answer but are not anagrams.
Prime numbers help because the Fundamental Theorem of Arithmetic[1][2] says that there can't be any clashes when you multiply Primes, every number breaks down into a unique product of Primes (I can't prove that myself, but it is apparently true). So give the letters Prime numbers A=2, B=3, C=5, D=7, E=11, F=13, etc. and now "fab" 13 x 2 x 3 = 78 and "cad" 5 x 2 x 7 = 70 no longer clash. The only way to get the same answer is to have the same primes (in any order), so anagrams will have the same answer and non-anagrams will not.
Why it drops the overhead of sorting is that the time for sorting any collection requires looking at each item and comparing at least some of them, and swapping positions of at least some of them, generally O(N items x log(N)). Lookup the letter in a Prime value array and multiplication once per letter doesn't need any comparisons or any swapping positions, so it is O(N items) time, that gives this approach less work to do for each word, so it can finish faster.
It looks like (Python, assuming lowercase ASCII letters where 'a' starts at code 97):
primes = [2,3,5,7,...]
ascii_a = 97
product = 1
for c in word:
product *= primes[ord(c) - ascii_a]
Do that for the incoming word, and for every word in the wordlist, and see which have matching products, those are the anagrams. Or pre-compute for all the words in the wordlist and only do it for the incoming word and then lookup the matching ones.
Prime combinatorics - I think every programmer independently discovers the technique. You can also determine if a shorter word can be made from a longer word, e.g. "cat" from "catch", by looking for modulo results.
And the technique of Prime Combinatorics for an "alphabet" can be used to solve Poker hands, Blackjack hands, Match 3 puzzle games, slot machine reel positions, and a slew of other similar problems where you would ordinarily have to build a very complex logic table/switch-case/if-then-else decision tree.
You could just do counting sort if the big-O is important, but I'm a bit suspicious about that big-O anyway. A model where multiplying arbitrarily big numbers is constant time is a bit unrealistic, kind of feels like it's getting off the hook for a log factor for free.
It's also all such small values that I'm not sure the big-O matters either, I'm not confident which would win without just trying it. I'd _guess_ that usual sort probably just wins though, or counting sort if you really went out of the way to optimize it.
Too late to edit my earlier reply, but I remembered what happened - for my newspaper puzzle, only 9 letter words are possible answers, and all 9 letter word products fit in an int64. Then in C# it is faster, casual runs seem to knock 20-25% off the runtime (~0.035 seconds vs ~0.045 seconds), but in PowerShell it is slower.
(I'm curious,if it would be possible to shuffle the primes around to bring all words in my full wordlist down to UInt64; the highest is 2810298024552111657086849344270208275 - microspectrophotometries which is some quadrillion times too big. Rearranging the by letter frequency in the wordlist brings it down to 24474928756352445162715195352080 - electroencephalographically which is still a trillion times too big, so I doubt it).
very interesting, both this comment and the sibling one. I didn't really consider how much could fit in an uint64. If all the words can be made to fit then that should indeed be quite fast (I was assuming bigints).
Huh, good call; it was 2 years ago that I did it, and I've retried to test it; the values overflow an Int64 or a Uint64 for words like "abortiveness" (== 38199980262261136506). Trying in PowerShell that overflowed to a double and then errored assigning it to an Int64.
Looking what was happening, I find C# doesn't throw overflow exceptions by default and instead wraps around. That takes 0.05 seconds for all 172k words compared to 0.1 seconds for sorting all words so it ran a lot faster, yes, but it was wrong and may have clashed words. Same with Rust and overflows, it seems. Turning it to System.Numerics.BigInteger in C# makes it take about the same time 0.1s for both. In Python which handles any size integers it takes a lot more time, 0.14 seconds for sorting, 0.4 seconds for products.
That means my embedded arrays of products are all overflowed as well. What a good example for arguing that programs should be correct first, and fast second. (I remember being suspicious of overflows, but I don't remember what I tried to conclude that all the word results would fit into int64, but it must have been bad).
Given that he added that middle "initial" himself and he didn't actually have a middle name, I like thinking that he added it precisely to make that joke work. Probably not true, but I still like it.
That was pretty good for a one-word anagram. Back in the 1990s I wrote a program that generated anagrams for longer phrases and I was surprised to find these prescient ones:
Saddam Hussein = He damns Saudis
Charles Manson = Slasher con man
David Letterman = Dead mitral vent
Mary Jo Kopechne = My joke chaperon *
Benito Mussolini = So, I bout Leninism
Lee Harvey Oswald = Oe, why ever Dallas? *
* "Chaperon" is a valid alternate spelling of "chaperone"
Things like that explain how some people get obsessed with numerology. Except, I find the anagrams like you and the previous commenter mentioned to be more impressive than anything numerologists produce.
Some of those are almost creepy in how they make so much sense.
We're pattern matching machines, if you throw enough randomness at us we will find patterns anyway.
Probably because looking at evolution it's more helpful to see a predator that isn't there, than to miss a predator that is there. It intuitively makes sense that we would tilt towards false positives over false negatives in our perception. Since one has a higher cost attached than the other.
Which is why I find some of the discussion arround whether GPTs are intelligent interesting. I see the argument quite often that they merely match patterns and combine things. Which to me seems very much akin to us. They lack the ability to interact with the world and there is no feedback loop as of now, but it seems to me that there is something very human to the AI we programmed.
The most remarkable name anagram for me remains that of 1990s UK conservative Health Secretary and government minister Virginia Bottomley (now Baroness Bottomley of Nettlestone), whose name rearranges to “I’m an evil Tory bigot”.
In terms of this ‘chunk scoring’ method this scores very high (15 I think?), which definitely confirms its value as a way of rating anagram quality.
> I guess I’ll be known for nothing more than being the man who realized that “Spiro Agnew” was “grow a penis.” Gore Vidal said, “It could be ‘grow a spine,’ too, but yours is better.”
Boy, they're really socking it to that Spiro Agnew guy again.
How many generated anagrams do you have to skim through to find these gems? Was the program written in a way that limits the output to phrases that make at least some sense?
It had features for things like limiting output to anagrams that contain a maximum number of words, or words with a minimum length, so you wouldn't have to slog through every permutation. Usually the most interesting ones had the fewest words. But "interesting" is subjective so if you really wanted to find juicy ones you would have to spend more time combing through the results by hand.
Ted Kennedy, her "chaperon," was male, so it works perfectly. Although I don't think the difference between the spellings is a gender thing, but rather a British/American thing.
Similar to this, he produces a standard form for each word, but breaks each letter into letter pieces or 'atoms' which gives much more freedom for moving between words.
Definitely give it a watch. If you are not familiar with Tom7's videos, he has a hilarious whimsical style while also bringing to life completely out there ideas with some brilliant technical skill.
I thought integrals / triangles was much more interesting, despite the shorter length. None of the other long pairs have any meaningful relationship, except for maybe excitation / intoxicate
Megachiropterans are fruit bats. They are really stinking adorable, especially when compared to smaller insect-eating bats, and for that reason are sometimes called flying foxes. In short, they make good subjects for cinematographers.
What’s nostalgic to me is that this is such a classic Perl pattern: hashing manipulated strings to find relationships. Prior to Perl doing this was painful. Boost wasn’t a thing. Python was in its cradle and Java was still struggling with beans. Perl removed the barriers between complex coding ideas and an implementation that C wasn’t ready for. The time from thought to prototype was near instant compared to current compiled languages.
Ummm, the author used awk for the first version in the 1990s.
> This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language:
As a concrete example, while my awk skills are extremely rusty, here's a program which will normalize the input line, create a table mapping the normalized name to the matching original lines, then at the end it only displays the ones with at least 5 anagrams.
I tried to avoid modern awk features, like asort, to be something that would have worked in the 1980s:
{
# Convert to normal form:
# 1. Fold to lower case
# 2. Bin the letters to get frequency counts
# 3. Only consider lower case ASCII letters
split(tolower($0), letters, "");
# Ignore asort() in modern awks and do a bin sort instead.
for (i in letters) {
c = letters[i];
repeats[c] = repeats[c] c;
}
normal_form = "";
for (i=97; i<=122; i++) {
c = sprintf("%c", i); # no chr() in a 1980s awk
normal_form = normal_form repeats[c];
}
table[normal_form] = table[normal_form] "," $0
delete repeats;
}
END {
# Only show the ones with at least 5 matches
for (i in table) {
match_str = table[i];
split(match_str, matches, ",");
num_matches = length(matches)-1
if (num_matches >= 5) {
# print the number of matches, then the match string
printf("%d%s\n", num_matches, match_str);
}
}
}
When I try it on a word list I have handy, here are the most common words:
Certainly Perl is more succinct, though note that even up to Perl 4 in the early 1990s you would need to use the string concatenation method to store the list of matches in the table.
But, "painful"? No. Not to someone who knew how to use awk.
The point is that you don’t need a hash table implementation. And the normalization that is used as the sorting key doesn’t otherwise have good hash-like qualities (like constant size and a good distribution) and thus doesn’t especially deserve to be called a hash.
It can be a simple ~20 line C program that checks whether the previous line has the same normalized key as the current line. It doesn't require hashing. I didn't say you could do it all with standard unix programs.
You pipe the sorted list into awk (for example) and append the second field to a list as long as the value of the first field remains the same. Whenever the value of the first field changes, and in the END block, you output the list (which contains the matching anagrams) and reset it to empty.
No hash table needed, just splitting the line into the two fields, equality comparison, and appending values to a list.
Go try your method and spot the bug. If you can’t, then respond and I’ll tell you. hint: you need one more manipulation to find the answer before or after the sort, there’s a command line version of it but it uses .. a hash.
I wonder whether someone made a similar effort for finding the "most rhyming" words in English. Of course, since this would be based on sound, it would need to incorporate additional information besides spelling.
It also starts a huge argument about what kinds of rhyming are permitted (visual, sound, which accent, etc) - and some languages “rhyme” on things we wouldn’t use in English like the emphasis, etc.
While I personally agree with the end result - cinematographer megachiropteran possibly being the best anagram, its worth exploring other scoring mechanisms I suppose - some kind of alliteration / cadence score to quantify how the words roll in succession. Maybe throw in semantic distance as well using clip embeddings probably to quantify the surprise element (or closeness for that matter). Gonna play around with this and see what I can do.
Waiting for someone to post a 4-symbol APL version (which makes use of a language feature added by an eccentric hedge fund millionaire in 2002 and removed in 2017, to ensure nobody ever runs it).
Why shouldn’t my programs be intense neutron stars of weird symbols, if there’s a superhuman intelligence always at hand to explain and improve the code?
Assembly language generation by symbolic AI (compilers) has been here for 60 years. The "prompts" are very precise and predictably behaved, but requiring the users to learn a specialized language. (Most such languages tend to be concerned with the "how" rather than "what".)
With the new AI, you just specify the "how". Then you get a buggy program, in one of the above specialized languages for the old AI, and the rest of the prompts in the chat are edit instructions on how the AI should fix the code to make it work.
That doesn't work in Dyalog APL, comes back with RANK ERROR.
It's idiomatic to pick ⊃ the first result of ⎕NGET to get just the lines to work on, and not the other things like file encoding.
Then grade-up-right-train-each (⍋⊢)¨ is redunant, it's the same as grade-up-each ⍋¨
The grade is an array of which indices to take to put the argument in sorted order and I don't think it makes sense to group ⌸ by that since the grade isn't the same for different arrangements of letters, so the whole approach breaks down there. I think the words have to be sorted, e.g to pick out 'bat' and 'tab' as the same letters:
Then 1<≢¨ would be "1 is less than the count (tally) of each" which fits somewhere in the solution, but not there and won't work on my array. We don't need to know the actual sorted letters so we can inline the bitmask of which answers matter or not in the first column:
Any with a 1 in the first column are anagrams, and 0 in the first column are not. And the other columns are indices into the wordlist where the matching words are, so words[1,3] picks out 'bat' and 'tab' and it's probably possible to filter rows with 1 in the first column:
It can probably be done shorter and cleaner with more skill than I have. It's a lot quicker to execute than the PowerShell version I commented, but took a lot longer to code.
(That expands to {⍵[⍋⍵]}¨words to sort each word, then use that on the left of Key ⌸ with words on the right, and Key feeds the count and indices into the custom function which is ⊢∘⊂ that takes the indicies with right-tack ⊢ and throws away the count, and encloses them with ⊂. That gives nested arrays of words which sort the same, including invididual words that sort like nothing else. Then (⊢⊢⍤/⍨1<≢¨) added to the left is counting the words in each nesting and filtering out the single ones, and I think ⊢⍤/ is a bodge to use compress in a train without it being misread as reduce when both use the same symbol / ?)
Yes. @1 indicates where the first argument of the implicit function is inserted. (It's like _ in some partial application syntaxes, but more general because there can be multiple implicit arguments.) The previous steps produced a list of word lists, so we indicate that this giant list, which is the implicit first argument of this function of the pipeline, is going into the first argument position of sort.
The : symbol causes argument 2 of sort to be defaulted, as if it were omitted. That's the comparison function, which defaults to less. In TXR Lisp, you can explicitly default optional arguments with : which enables you to give arguments to later optional arguments to the right of those.
We then specify argument 3, the function for selecting the key to sort by for each element. Each element is a list of words (anagrams fo equal length). We pick the first one with car, and take its length: chain will compose those functions for us.
We use square brackets because the function names are in the function namespace: square brackets provide a function application language in which variables and functions are in one namespace.
In a book release event for Stefan Fatsis's book on tournament Scrabble players, Word Freak, the host posed the "megachiropteran" anagram as a trivia question for a book giveaway.
One of the top players in my club _instantly_ replied "cinematographer," and added, "but megachiropteran isn't in the OSPD [Official Scrabble Players' Dictionary], so it doesn't really count."
Why do scrabble players care about words that long? Beyond fun trivia.
What's the longest word ever played in an actual game of Scrabble? And though this is hard to get data for, what's the longest word that ever could have been played but missed?
Either just by having developed their brains in a way that word facts just stick, or by constantly seeking out new word facts because that's what they love doing.
I believe that the gentleman in this case instantly saw the word 'cinematographer' hiding in 'megachiropteran' because his brain is a highly trained anagramming machine, and knew that it wasn't allowed because he knew all of the allowed 15-letter words, just in case he might get the opportunity to play one.
High-level players memorize staggering numbers of words. All 2- and 3- letter words is entry-level. All X-J-Q-Z, all Q-without-U, all words you ending in -MAN, all 70+ 7-letter words of the form SATINE+... top-level players know the 4000 4-letter words, the 5000 5's, and way, way more.
That one has real meaning too. Being naturally inclined to mentally run through all the worst case scenarios for everything has definitely saved me time over the years.
For any Spanish speakers in here, I got inspired by this and borrowed Mark Dominus' code to find the "best anagram" in Spanish, my results are here: https://github.com/pilaf/anagramas
One of the longest anagrams I know that doesn't involve obscure medical words is "astronomers" and "moon starers." Both of these phrases contain the same letters but in a different order, making them anagrams of each other. This anagram contains 11 letters and is quite interesting, as both phrases relate to looking up at the night sky and studying celestial bodies.
There's something magically target-rich about anagrams.
My full first, middle and last name is long enough with common letters that there are several uproariously serendipitous and embarrassingly obscene (even for me) anagrams of my full name.
So bad I would never post them here. Much worse than you could possibly imagine. Take my word for it: you don't want to know.
The only advice I'll share is that parents should carefully screen their baby names with the advanced anagram server, and choose short names with unusual letters that have lower chances of backfiring.
Megachiropteran lit up helicopter for me (as soon as he worte it means mega-bat) and chiro- naturally zing'd the word chiropractor. First guess was 'opter'/'opter' must have something to do with flight and the o is shared as a connective and it means 'bone-flying'. Wrong: turns out -pter is Latin for 'wing', and so we have helico(l)-pter (helix-wing) and our featherless friends are chiro-pter (hand-wing).
Qui suis-je ?
Me voici Sultan !
Nuitisme vocal
Silice mouvant
Io, vent musical !
Le voici musant
Mot inclus à vie
Ce motival insu
Là vous émincit
Cultivons amie
Si il vaut ce nom
Son val muet ici
Indice:
Si nul vice à mot
Vu ici slame ton
Nom c'est via lui
Vaincu tel, omis
Who am I?
Here I am, Sultan!
Vocal nightism
Moving silica
Io, musical wind!
Here it is, musing
Word included for life
This unknown motive
There, it slices you down
Let's cultivate, friend
If it's worth that name
Its silent valley here
Hint:
If no word vice
Seen here, slams your
Name, it's through it
Defeated as such, omitted
I think the best anagrams are the ones that create meaningful phases, I would probably train a SVM or a simple linear layer on BERT (or other encoder) embeddings and use it to score the results.
Actually early genx. I don’t think most millennials were old enough for a lot of the jokes in 90’s simpsons as they were young teens. But glad you got it!
one thing i miss from the "olden" days is the fun we used to have coming up with unassisted anagram discoveries and sharing them with each other. i think my personal favourite was antipyretic/pertinacity
It just searched Bing and loaded up this very article. And man I'm getting really sick of the unrelated ChatGPT comments on literally every post, no offense.
imo interpretation of actually as "I can't believe someone did this (that someone could be so stupid)" rather than actually as "this is an event that definitely occured". Didn't seem too bad to me.
You basically said, “Can you believe someone actually was so stupid as to ask this question on stackexchange? I didn’t believe anyone could actually be that stupid, so I wasn’t go to address it, but it turns out there actually are people that stupid, and they actually outed themselves by posting their question out in the open! So, here I am having to address it in my blog post in case there are other people who actually are equally stupid.”
I’m pretty sure I wrote an anagram solver as a teenager that just spins through permutations forever. I guess I was that stupid once. No comment on whether I’m cured now. Your point is valid, the wording can be read as to imply negativity, but that’s also interpreting it in a negative light. The author was also defending his warning against an imaginary rebuttal, justifying the mention to not to waste time solving anagrams the naive brute force way. I don’t think he was trying to be elitist or offensive, and it is in fact true that people will solve anagrams the slow way, so doesn’t hurt to mention it, and it can even help on occasion. We don’t have to take it as condescending.
I didn't “basically” say any of that. I didn't say any of it at all; you made it all up and then attributed it to me. All I actually did was to state a fact: someone on StackExchange actually asked this question.
All those thoughts about what a stupid question it is, and how stupid someone must have been to ask it—all those are thoughts you had, not me.
FWIW, I also originally read that line to imply "I thought this was obvious, but apparently I need to be explicit anyway".
Given the tone of the rest of the article I assumed that I'd either read something into it that wasn't there or was missing some context - for example, perhaps the person on SO had insisted that this was the correct approach, and you'd gone to some lengths to show that it was not and were exasperated by it by the time you wrote the article.
At any rate - it didn't really take away from your article for me, but I did see it as condescending.
misrelation / orientalism; superintended / unpredestined; incorporate / procreation (don't mind if i do!); predators / teardrops (a cause and effect); counteridea / reeducation (a bit synonymous); streamlined / derailments (quite opposite!); truculent / unclutter; colonialist / oscillation; renavigate / vegetarian; persistent / prettiness; paternoster / penetrators (hmm); obscurantist / subtractions; nectarines / transience (a story of ripeness); definability / identifiably; indiscreet / iridescent; excitation / intoxicate; discounter / reductions (how logical!)
One small suggestion I have: add a point for pairs with different starting letters, and another point for pairs with different ending letters.