Hacker News new | past | comments | ask | show | jobs | submit login
I found the best anagram in English (2017) (plover.com)
342 points by Amorymeltzer on May 5, 2023 | hide | past | favorite | 162 comments



Some of my favorite pairs from the big list:

misrelation / orientalism; superintended / unpredestined; incorporate / procreation (don't mind if i do!); predators / teardrops (a cause and effect); counteridea / reeducation (a bit synonymous); streamlined / derailments (quite opposite!); truculent / unclutter; colonialist / oscillation; renavigate / vegetarian; persistent / prettiness; paternoster / penetrators (hmm); obscurantist / subtractions; nectarines / transience (a story of ripeness); definability / identifiably; indiscreet / iridescent; excitation / intoxicate; discounter / reductions (how logical!)

One small suggestion I have: add a point for pairs with different starting letters, and another point for pairs with different ending letters.


Also "Xenoparasite Exasperation" - would have made a great title for a 1960s/70s Sci-Fi B movie...


in French we have chien (dog) / niche (doghouse)


Almost verlan!


iench is the verlan dog ;)


I'm a non-coding lurker here, but the only thing that ever motivated me to write my own program was to solve the newspaper anagram puzzle, "Jumble", faster than my wife. I wrote it in Basic, then re-wrote it in assembly language, I forget when. Then in Fortran while home bound during the first summer of the pandemic - to atone for my past sin of neglecting to learn Fortran when I had a chance in the late 1960's.


I dunno man, I’ve never heard of a “non-coder” rewriting the same program in three languages, including one round ‘just because’


About thirty years ago I had an account on a computer system that had a Boggle game, and I got tired of losing to it. So I wrote a program (in C!) which, when run, would search the grid and print out a list of all the words it found. Since I was using a bigger dictionary than the Boggle game itself, I found a lot more words. Ha ha, take that!

I proudly boasted to a friend about my winning Boggle solver and they said it was the pettiest thing they had ever heard of.


I did the same thing! I stumbled upon an old ‘Al Zimmerman’ programming contest that involved creating wordsearch grids with the most words possible. I had a technique in mind, wrote it in BASIC. Then realized that was too slow. So I tried it in Java (which I’d heard was easy… it wasn’t… st least not to me). And then finally wrote it in C++, a language I found perfect in all respects. I actually ranked near the top in that contest, and I’ve been programming ever since. Ironically, I write programs to make puzzle books and I even have a Jumble knock-off on Amazon (sans cartoons… can’t draw worth a damn.)


With respect, the moment you mentioned "re-wrote it in assembly language"...you stopped being a non-coder! :-) By the way, very cool use of technolgoy for somewhat everyday things (solving puzzles).


I did similar for a newspaper puzzle unscrambler, pushing to rewrite it in lower level languages (PowerShell then C# then Rust) and changing algorithm (from sort letters, to precompute lookup hashtable of sorted letters, to multiply prime numbers one for each alphabet letter to drop the overhead of sorting, to lookup hash of those, to sorting the integer results into an array to do a binary search through and jump to the matching offset in an answers array and inlining those in the code) until it was near enough instant.

... and I don't use it, because unjumbling the word myself is satisfying, but typing the letters into a computer and getting the answer isn't.


> to multiply prime numbers one for each alphabet letter to drop the overhead of sorting

how does that work?


Not sure if you're asking about the technique, or the math, or the comparison with sorting, so here's a long answer of all of them - start like the secret codes from childhood by giving numbers to letters so A=1, B=2, C=3, D=4, etc.

Then go through a word and find the values for each letter and multiply them together, e.g. "tab" is 20 x 1 x 2 = 40 and hopefully an anagram that just rearranges the letters gets the same answer because multiplication doesn't change if you shuffle the numbers around, e.g. "bat" is 2 x 1 x 20 = 40 which is the same, "bat" and "tab" are anagrams... but with the integers it doesn't always work and different words can clash e.g. "fab" 6 x 1 x 2 = 12 and "cad" 3 x 1 x 4 = 12 have the same answer but are not anagrams.

Prime numbers help because the Fundamental Theorem of Arithmetic[1][2] says that there can't be any clashes when you multiply Primes, every number breaks down into a unique product of Primes (I can't prove that myself, but it is apparently true). So give the letters Prime numbers A=2, B=3, C=5, D=7, E=11, F=13, etc. and now "fab" 13 x 2 x 3 = 78 and "cad" 5 x 2 x 7 = 70 no longer clash. The only way to get the same answer is to have the same primes (in any order), so anagrams will have the same answer and non-anagrams will not.

Why it drops the overhead of sorting is that the time for sorting any collection requires looking at each item and comparing at least some of them, and swapping positions of at least some of them, generally O(N items x log(N)). Lookup the letter in a Prime value array and multiplication once per letter doesn't need any comparisons or any swapping positions, so it is O(N items) time, that gives this approach less work to do for each word, so it can finish faster.

It looks like (Python, assuming lowercase ASCII letters where 'a' starts at code 97):

    primes = [2,3,5,7,...]
    ascii_a = 97
    product = 1

    for c in word:
        product *= primes[ord(c) - ascii_a]
Do that for the incoming word, and for every word in the wordlist, and see which have matching products, those are the anagrams. Or pre-compute for all the words in the wordlist and only do it for the incoming word and then lookup the matching ones.

[1] https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithme...

[2] https://www.varsitytutors.com/hotmath/hotmath_help/topics/pr...


Prime combinatorics - I think every programmer independently discovers the technique. You can also determine if a shorter word can be made from a longer word, e.g. "cat" from "catch", by looking for modulo results.

And the technique of Prime Combinatorics for an "alphabet" can be used to solve Poker hands, Blackjack hands, Match 3 puzzle games, slot machine reel positions, and a slew of other similar problems where you would ordinarily have to build a very complex logic table/switch-case/if-then-else decision tree.


Is it actually faster?

You could just do counting sort if the big-O is important, but I'm a bit suspicious about that big-O anyway. A model where multiplying arbitrarily big numbers is constant time is a bit unrealistic, kind of feels like it's getting off the hook for a log factor for free.

It's also all such small values that I'm not sure the big-O matters either, I'm not confident which would win without just trying it. I'd _guess_ that usual sort probably just wins though, or counting sort if you really went out of the way to optimize it.


Too late to edit my earlier reply, but I remembered what happened - for my newspaper puzzle, only 9 letter words are possible answers, and all 9 letter word products fit in an int64. Then in C# it is faster, casual runs seem to knock 20-25% off the runtime (~0.035 seconds vs ~0.045 seconds), but in PowerShell it is slower.

(I'm curious,if it would be possible to shuffle the primes around to bring all words in my full wordlist down to UInt64; the highest is 2810298024552111657086849344270208275 - microspectrophotometries which is some quadrillion times too big. Rearranging the by letter frequency in the wordlist brings it down to 24474928756352445162715195352080 - electroencephalographically which is still a trillion times too big, so I doubt it).


very interesting, both this comment and the sibling one. I didn't really consider how much could fit in an uint64. If all the words can be made to fit then that should indeed be quite fast (I was assuming bigints).


Huh, good call; it was 2 years ago that I did it, and I've retried to test it; the values overflow an Int64 or a Uint64 for words like "abortiveness" (== 38199980262261136506). Trying in PowerShell that overflowed to a double and then errored assigning it to an Int64.

Looking what was happening, I find C# doesn't throw overflow exceptions by default and instead wraps around. That takes 0.05 seconds for all 172k words compared to 0.1 seconds for sorting all words so it ran a lot faster, yes, but it was wrong and may have clashed words. Same with Rust and overflows, it seems. Turning it to System.Numerics.BigInteger in C# makes it take about the same time 0.1s for both. In Python which handles any size integers it takes a lot more time, 0.14 seconds for sorting, 0.4 seconds for products.

That means my embedded arrays of products are all overflowed as well. What a good example for arguing that programs should be correct first, and fast second. (I remember being suspicious of overflows, but I don't remember what I tried to conclude that all the word results would fit into int64, but it must have been bad).


We regret to inform you that you are now a former non-coder.


My personal favourite is that an anagram of Banach-Tarski is Banach-Tarski Banach-Tarski.


Guess what the B in "Benoit B. Mandelbrot" stands for?


Given that he added that middle "initial" himself and he didn't actually have a middle name, I like thinking that he added it precisely to make that joke work. Probably not true, but I still like it.


Just like in the original, I think it’s only going to work in theory.


It only has a score of 6 though, in the typical formulation.


aha, I see what you did there


underrated comment


Deductions Kramer.


"deduction markers", no?


Discounted remark.


That was pretty good for a one-word anagram. Back in the 1990s I wrote a program that generated anagrams for longer phrases and I was surprised to find these prescient ones:

Saddam Hussein = He damns Saudis

Charles Manson = Slasher con man

David Letterman = Dead mitral vent

Mary Jo Kopechne = My joke chaperon *

Benito Mussolini = So, I bout Leninism

Lee Harvey Oswald = Oe, why ever Dallas? *

* "Chaperon" is a valid alternate spelling of "chaperone"

** Yes, "oe" is a word


Clint Eastwood = Old West Action is still my favorite serendiptious anagram.

I also like the mathematically correct

ELEVEN PLUS TWO = TWELVE PLUS ONE

especially because it's also an numeric anagram

11 + 2 = 12 + 1


Things like that explain how some people get obsessed with numerology. Except, I find the anagrams like you and the previous commenter mentioned to be more impressive than anything numerologists produce.

Some of those are almost creepy in how they make so much sense.


We're pattern matching machines, if you throw enough randomness at us we will find patterns anyway.

Probably because looking at evolution it's more helpful to see a predator that isn't there, than to miss a predator that is there. It intuitively makes sense that we would tilt towards false positives over false negatives in our perception. Since one has a higher cost attached than the other.

Which is why I find some of the discussion arround whether GPTs are intelligent interesting. I see the argument quite often that they merely match patterns and combine things. Which to me seems very much akin to us. They lack the ability to interact with the world and there is no feedback loop as of now, but it seems to me that there is something very human to the AI we programmed.


And it’s 13 letters long


The Spanish translation of this is ONCE MAS CUATRO = CATORCE MAS UNO (11 + 4 = 14 + 1).


Which (including spaces) is 15 characters long :)


Mother in law = Woman Hitler is another classic


Does that count as Godwinning the anagrams discussion?


Godwin’s law = wild wagons


okay that one caught me off guard


> ELEVEN PLUS TWO = TWELVE PLUS ONE

Goedel's incompleteness theorem fans would be drooling all over this


Why?


Because it would be another way of expressing this idea https://en.wikipedia.org/wiki/G%C3%B6del_numbering (yes it's not easy to understand)


The most remarkable name anagram for me remains that of 1990s UK conservative Health Secretary and government minister Virginia Bottomley (now Baroness Bottomley of Nettlestone), whose name rearranges to “I’m an evil Tory bigot”.

In terms of this ‘chunk scoring’ method this scores very high (15 I think?), which definitely confirms its value as a way of rating anagram quality.


Ronald Wilson Reagan = Insane Anglo Warlord


Dick Cavett, https://www.nytimes.com/2018/08/04/style/dick-cavett-intervi...

> I guess I’ll be known for nothing more than being the man who realized that “Spiro Agnew” was “grow a penis.” Gore Vidal said, “It could be ‘grow a spine,’ too, but yours is better.”

Boy, they're really socking it to that Spiro Agnew guy again.


That's why they're called nattering nabobs of negativism.


His pal Richard Milhous Nixon = Nix Hanoi child rumors


Also "Orwellian radon snag"


I did Wikipedia article titles and found:

Atlantic casino resort spa / Carter assassination plot


That would make an excellent 'not to be confused with'.


How many generated anagrams do you have to skim through to find these gems? Was the program written in a way that limits the output to phrases that make at least some sense?


It had features for things like limiting output to anagrams that contain a maximum number of words, or words with a minimum length, so you wouldn't have to slog through every permutation. Usually the most interesting ones had the fewest words. But "interesting" is subjective so if you really wanted to find juicy ones you would have to spend more time combing through the results by hand.

You can find the documentation, a Win32 executable, and the source code here: https://www.kmoser.com/anagrams/


Yeah but chaperon would be masculine and chaperone feminine.


Ted Kennedy, her "chaperon," was male, so it works perfectly. Although I don't think the difference between the spellings is a gender thing, but rather a British/American thing.


This is great! It reminds me of a Tom7 video of a similar ilk: Anagrams, but where you can break apart letters: "Anagraphs" - https://www.youtube.com/watch?v=qTBAW-Eh0tM.

Similar to this, he produces a standard form for each word, but breaks each letter into letter pieces or 'atoms' which gives much more freedom for moving between words.

Definitely give it a watch. If you are not familiar with Tom7's videos, he has a hilarious whimsical style while also bringing to life completely out there ideas with some brilliant technical skill.


Don't forget the utterly classic anagrams we made out of "Information Superhighway" when that term was first introduced to a skeptical public.

https://ad1c.us/infobahn.htm


LOL. Those a great. I encourage folks to look.


Writing a program to find grammatically correct sentence anagrams seems like quite the challenge.


Some gems in there, like

    human parties | puritan shame


I thought integrals / triangles was much more interesting, despite the shorter length. None of the other long pairs have any meaningful relationship, except for maybe excitation / intoxicate


What you don't think all married people are admirers of sidearms /s

> 7 admirer married > 7 admires sidearm


This one reminded me of those captcha cartoons that were popular for awhile. “Admires sidearm” in curved writing and a simple drawing.


It seemed to me that integral/triangle (singular) should really get the same score.


Megachiropterans are fruit bats. They are really stinking adorable, especially when compared to smaller insect-eating bats, and for that reason are sometimes called flying foxes. In short, they make good subjects for cinematographers.


What’s nostalgic to me is that this is such a classic Perl pattern: hashing manipulated strings to find relationships. Prior to Perl doing this was painful. Boost wasn’t a thing. Python was in its cradle and Java was still struggling with beans. Perl removed the barriers between complex coding ideas and an implementation that C wasn’t ready for. The time from thought to prototype was near instant compared to current compiled languages.


Ummm, the author used awk for the first version in the 1990s.

> This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language:


I know. I’m talking about Perl. Try to keep up.


Do pay attention.

You wrote "Prior to Perl doing this was painful.".

I highlighted how you contradicted the author's claim that it was easy to do using awk. Awk existed long before Perl.


As a concrete example, while my awk skills are extremely rusty, here's a program which will normalize the input line, create a table mapping the normalized name to the matching original lines, then at the end it only displays the ones with at least 5 anagrams.

I tried to avoid modern awk features, like asort, to be something that would have worked in the 1980s:

  {
      # Convert to normal form:
      #  1. Fold to lower case
      #  2. Bin the letters to get frequency counts
      #  3. Only consider lower case ASCII letters
      split(tolower($0), letters, "");
  
      # Ignore asort() in modern awks and do a bin sort instead.
      for (i in letters) {
          c = letters[i];
          repeats[c] = repeats[c] c;
      }
  
      normal_form = "";
      for (i=97; i<=122; i++) {
          c = sprintf("%c", i); # no chr() in a 1980s awk
          normal_form = normal_form repeats[c];
      }
  
      table[normal_form] = table[normal_form] "," $0
      delete repeats;
  }
  END {
      # Only show the ones with at least 5 matches
      for (i in table) {
          match_str = table[i];
          split(match_str, matches, ",");
          num_matches = length(matches)-1
          if (num_matches >= 5) {
              # print the number of matches, then the match string
              printf("%d%s\n", num_matches, match_str);
          }
      }
  }
When I try it on a word list I have handy, here are the most common words:

  % awk -f anagram.awk < words_alpha.txt | sort -n -t, | tail -5
  13,elaps,lapse,leaps,lepas,pales,peals,pleas,salep,saple,sepal,slape,spale,speal
  14,anestri,antsier,asterin,eranist,nastier,ratines,resiant,restain,retains,retinas,retsina,stainer,starnie,stearin
  14,apers,apres,asper,pares,parse,pears,prase,presa,rapes,reaps,repas,spaer,spare,spear
  14,arest,aster,astre,rates,reast,resat,serta,stare,strae,tares,tarse,tears,teras,treas
  15,alerts,alters,artels,estral,laster,lastre,rastle,ratels,relast,resalt,salter,slater,staler,stelar,talers
Certainly Perl is more succinct, though note that even up to Perl 4 in the early 1990s you would need to use the string concatenation method to store the list of matches in the table.

But, "painful"? No. Not to someone who knew how to use awk.


What’s the word for when snark backfires?


snark backfires = barf earns kicks


“Sorry”?


krans


You don't _need_ to hash anything.

The simplest way to do it, is to convert all the words to (normal_form, orig_word) pairs, write the list to a file, then sort it.

It will be trivial to find the words with common normal form after the sort.

(Of course, you wouldn't catch me trying to implement that with C if perl is an option...)


But the normal form as defined by the article is an hash, even if you then use sorting instead of bucketing


The point is that you don’t need a hash table implementation. And the normalization that is used as the sorting key doesn’t otherwise have good hash-like qualities (like constant size and a good distribution) and thus doesn’t especially deserve to be called a hash.


How do you extract the last (largest) entry for each normalized key from the sorted list? What is the command line function?


It can be a simple ~20 line C program that checks whether the previous line has the same normalized key as the current line. It doesn't require hashing. I didn't say you could do it all with standard unix programs.


You pipe the sorted list into awk (for example) and append the second field to a list as long as the value of the first field remains the same. Whenever the value of the first field changes, and in the END block, you output the list (which contains the matching anagrams) and reset it to empty.

No hash table needed, just splitting the line into the two fields, equality comparison, and appending values to a list.


Go try your method and spot the bug. If you can’t, then respond and I’ll tell you. hint: you need one more manipulation to find the answer before or after the sort, there’s a command line version of it but it uses .. a hash.


Even in C it’s not that hard to make a basic hash table. It’s probably like 20 lines of code or something.


Quick shout out to the New York Times game "Spelling Bee". It involves making lots of words from 6 letters plus 1 mandatory letter.

https://www.nytimes.com/puzzles/spelling-bee


I wonder whether someone made a similar effort for finding the "most rhyming" words in English. Of course, since this would be based on sound, it would need to incorporate additional information besides spelling.


It also starts a huge argument about what kinds of rhyming are permitted (visual, sound, which accent, etc) - and some languages “rhyme” on things we wouldn’t use in English like the emphasis, etc.




The latter on diminutives is interesting. Wikipedia:

> Productive diminutives are infrequent to nonexistent in Standard English in comparison with many other languages.

https://en.m.wikipedia.org/wiki/List_of_diminutives_by_langu...


While I personally agree with the end result - cinematographer megachiropteran possibly being the best anagram, its worth exploring other scoring mechanisms I suppose - some kind of alliteration / cadence score to quantify how the words roll in succession. Maybe throw in semantic distance as well using clip embeddings probably to quantify the surprise element (or closeness for that matter). Gonna play around with this and see what I can do.


Sounds like fun -- here's a try at alliteration scoring for a start: https://github.com/darius/alliterate


Thanks for the share, will look into it. Cheers


  This is the TXR Lisp interactive listener of TXR 285.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  TXR's sound system features 120 dB separation between quarreling audiophiles.
  1> (flow "/usr/share/dict/words"
           file-get-lines
           (group-by sort)
           hash-values
           (keep-if cdr)
           (sort @1 : [chain car len]))
  (("ho" "oh") ("am" "ma") ("em" "me") ("no" "on") ("ah" "ha") ("it" "ti")
   ("mu" "um") ("eh" "he") ("ate" "eat" "eta" "tea") ("bar" "bra")
  [...]
   ("certification" "rectification") ("accouterments" "accoutrements")
   ("peripatetic's" "precipitate's") ("amphitheaters" "amphitheatres")
   ("colonialist's" "oscillation's") ("enumeration's" "mountaineer's")
   ("imperfections" "perfectionism") ("antiparticles" "paternalistic")
   ("broadcaster's" "rebroadcast's") ("impressiveness" "permissiveness")
   ("conservation's" "conversation's") ("tablespoonfuls" "tablespoonsful")
   ("certifications" "rectifications") ("amphitheater's" "amphitheatre's")
   ("certification's" "rectification's") ("impressiveness's" "permissiveness's"))


Waiting for someone to post a 4-symbol APL version (which makes use of a language feature added by an eccentric hedge fund millionaire in 2002 and removed in 2017, to ensure nobody ever runs it).


According to GPT-4, an equivalent APL one-liner is:

(⊢⌷⍨∘⍋∘≢¨)↑1<≢¨⊢⌸(⍋⊢)¨⎕NGET'/usr/share/dict/words'1


AI-assisted APL could totally be a thing.

Why shouldn’t my programs be intense neutron stars of weird symbols, if there’s a superhuman intelligence always at hand to explain and improve the code?


Assembly language generation by symbolic AI (compilers) has been here for 60 years. The "prompts" are very precise and predictably behaved, but requiring the users to learn a specialized language. (Most such languages tend to be concerned with the "how" rather than "what".)

With the new AI, you just specify the "how". Then you get a buggy program, in one of the above specialized languages for the old AI, and the rest of the prompts in the chat are edit instructions on how the AI should fix the code to make it work.


That doesn't work in Dyalog APL, comes back with RANK ERROR.

It's idiomatic to pick ⊃ the first result of ⎕NGET to get just the lines to work on, and not the other things like file encoding.

Then grade-up-right-train-each (⍋⊢)¨ is redunant, it's the same as grade-up-each ⍋¨

The grade is an array of which indices to take to put the argument in sorted order and I don't think it makes sense to group ⌸ by that since the grade isn't the same for different arrangements of letters, so the whole approach breaks down there. I think the words have to be sorted, e.g to pick out 'bat' and 'tab' as the same letters:

          {⍺, ≢⍵}⌸{⍵[⍋⍵]}¨words ← 'bat' 'dog' 'tab' 'cow' 'wok'

    abt 2
    dgo 1
    ...
Then 1<≢¨ would be "1 is less than the count (tally) of each" which fits somewhere in the solution, but not there and won't work on my array. We don't need to know the actual sorted letters so we can inline the bitmask of which answers matter or not in the first column:

          {(1<≢⍵),⍵}⌸{⍵[⍋⍵]}¨words ← 'bat' 'dog' 'tab' 'cow' 'woka'
    ┌→────┐
    ↓1 1 3│
    │0 2 0│
    │0 4 0│
    │0 5 0│
    └~────┘
Any with a 1 in the first column are anagrams, and 0 in the first column are not. And the other columns are indices into the wordlist where the matching words are, so words[1,3] picks out 'bat' and 'tab' and it's probably possible to filter rows with 1 in the first column:

          {⍵[;1]⌿⍵}{(1<≢⍵),⍵}⌸{⍵[⍋⍵]}¨words←'bat' 'dog' 'tab' 'cow' 'bta' 'racecar' 'carrace'
and then drop the first column:

          1↓[2]{⍵[;1]⌿⍵}{(1<≢⍵),⍵}⌸{⍵[⍋⍵]}¨words
Then uhh filter out the zeros to avoid index error, and index into the word list, so:

          {words[⍵/⍨⍵>0]}¨⊂[2]1↓[2]{⍵[;1]⌿⍵}{(1<≢⍵),⍵}⌸{⍵[⍋⍵]}¨words←⊃⎕NGET 'wordlist.txt' 1

    ┌→───────────────────────────────────────────────────
    │ ┌→────────────┐ ┌→────────────────┐ ┌→────────────┐
    │ │ ┌→──┐ ┌→──┐ │ │ ┌→────┐ ┌→────┐ │ │ ┌→──┐ ┌→──┐ │
    │ │ │aah│ │aha│ │ │ │aahed│ │ahead│ │ │ │aal│ │ala│ │ [...]
    │ │ └───┘ └───┘ │ │ └─────┘ └─────┘ │ │ └───┘ └───┘ │
    │ └∊────────────┘ └∊────────────────┘ └∊────────────┘
    └∊───────────────────────────────────────────────────
    
It can probably be done shorter and cleaner with more skill than I have. It's a lot quicker to execute than the PowerShell version I commented, but took a lot longer to code.


(⊢⊢⍤/⍨1<≢¨)({⍵[⍋⍵]}¨⊢∘⊂⌸⊢)words or something like that maybe


Ah!

(That expands to {⍵[⍋⍵]}¨words to sort each word, then use that on the left of Key ⌸ with words on the right, and Key feeds the count and indices into the custom function which is ⊢∘⊂ that takes the indicies with right-tack ⊢ and throws away the count, and encloses them with ⊂. That gives nested arrays of words which sort the same, including invididual words that sort like nothing else. Then (⊢⊢⍤/⍨1<≢¨) added to the left is counting the words in each nesting and filtering out the single ones, and I think ⊢⍤/ is a bodge to use compress in a train without it being misread as reduce when both use the same symbol / ?)


PowerShell 7:

    PS C:\> get-content wordlist.txt 
            | group { -join ([char[]]$_ | sort) }
            | where count -ge 2
            | sort {$_.Name.Length}
    
     Count Name                      Group
    ----- ----                      -----
        2 ho                        {ho, oh}
        2 do                        {do, od}
        2 ay                        {ay, ya}
    [...]
        2 aeghhiiloooppssty         {pathophysiologies, physiopathologies}
        2 aacghhiilloooppsty        {pathophysiological, physiopathological}
        2 aceghhiimooopprrst        {microphotographies, photomicrographies}


Can you explain that last line (sort @1 : [chain car len])?


Yes. @1 indicates where the first argument of the implicit function is inserted. (It's like _ in some partial application syntaxes, but more general because there can be multiple implicit arguments.) The previous steps produced a list of word lists, so we indicate that this giant list, which is the implicit first argument of this function of the pipeline, is going into the first argument position of sort.

The : symbol causes argument 2 of sort to be defaulted, as if it were omitted. That's the comparison function, which defaults to less. In TXR Lisp, you can explicitly default optional arguments with : which enables you to give arguments to later optional arguments to the right of those.

We then specify argument 3, the function for selecting the key to sort by for each element. Each element is a list of words (anagrams fo equal length). We pick the first one with car, and take its length: chain will compose those functions for us.

We use square brackets because the function names are in the function namespace: square brackets provide a function application language in which variables and functions are in one namespace.


Thanks! It's interesting to see some ideas from other Lisp dialects. Maybe I'll learn a bit more TXR.


In a book release event for Stefan Fatsis's book on tournament Scrabble players, Word Freak, the host posed the "megachiropteran" anagram as a trivia question for a book giveaway.

One of the top players in my club _instantly_ replied "cinematographer," and added, "but megachiropteran isn't in the OSPD [Official Scrabble Players' Dictionary], so it doesn't really count."


Why do scrabble players care about words that long? Beyond fun trivia.

What's the longest word ever played in an actual game of Scrabble? And though this is hard to get data for, what's the longest word that ever could have been played but missed?


Either just by having developed their brains in a way that word facts just stick, or by constantly seeking out new word facts because that's what they love doing.

I believe that the gentleman in this case instantly saw the word 'cinematographer' hiding in 'megachiropteran' because his brain is a highly trained anagramming machine, and knew that it wasn't allowed because he knew all of the allowed 15-letter words, just in case he might get the opportunity to play one.

High-level players memorize staggering numbers of words. All 2- and 3- letter words is entry-level. All X-J-Q-Z, all Q-without-U, all words you ending in -MAN, all 70+ 7-letter words of the form SATINE+... top-level players know the 4000 4-letter words, the 5000 5's, and way, way more.

This (https://www.yahoo.com/lifestyle/scrabble-records-highest-sco...) lists cases in which 15-letter words were played by adding prefixes or suffixes to other words.


I don't know why, but I really like soapstone teaspoons.


I will counter that with.. timesaving negativism.


I’ve seen better.


That one has real meaning too. Being naturally inclined to mentally run through all the worst case scenarios for everything has definitely saved me time over the years.


Over the long run, minimizing the maximally worst cases works.


For any Spanish speakers in here, I got inspired by this and borrowed Mark Dominus' code to find the "best anagram" in Spanish, my results are here: https://github.com/pilaf/anagramas

And here are the top scorers:

11 actinometro / cortamiento

11 aeronáutico / ecuatoriano

11 anemometría / mareamiento

11 apriscadero / esparcidora

11 arremetimiento / meritoriamente

11 atrasamiento / metatarsiano

11 camastronería / sacramentario

11 entropezada / panderetazo

11 importación / piromántico

12 anemométrica / maceramiento


One of the longest anagrams I know that doesn't involve obscure medical words is "astronomers" and "moon starers." Both of these phrases contain the same letters but in a different order, making them anagrams of each other. This anagram contains 11 letters and is quite interesting, as both phrases relate to looking up at the night sky and studying celestial bodies.

ChatGPT


The wordsmith link in my other comment tells me that one is a triple on that theme:

astronomers = moon starers = no more stars


There's something magically target-rich about anagrams.

My full first, middle and last name is long enough with common letters that there are several uproariously serendipitous and embarrassingly obscene (even for me) anagrams of my full name.

So bad I would never post them here. Much worse than you could possibly imagine. Take my word for it: you don't want to know.

The only advice I'll share is that parents should carefully screen their baby names with the advanced anagram server, and choose short names with unusual letters that have lower chances of backfiring.

https://wordsmith.org/anagram/advanced.html


https://wordsmith.org/anagram/ is nice for finding multi-word anagrams, and it has a hall of fame link also.


Megachiropteran lit up helicopter for me (as soon as he worte it means mega-bat) and chiro- naturally zing'd the word chiropractor. First guess was 'opter'/'opter' must have something to do with flight and the o is shared as a connective and it means 'bone-flying'. Wrong: turns out -pter is Latin for 'wing', and so we have helico(l)-pter (helix-wing) and our featherless friends are chiro-pter (hand-wing).


Made me think of ornithopter (from Greek ornis, ornith- "bird" and pteron "wing")


This is linked from a link, and describes anagrams being used to prove you were the first discoverer of something: https://spark.iop.org/puzzle-hookes-law#:~:text=When%20Hooke....


Seems the best ones are exclusively of Greek or Latin influence. What are the best ones with Germanic or Celtic influence?


I’m going to hear the Paul Simon lyric as “Don’t I know you from the megachiropteran’s party?” from now on


If it is Greig Fraser's party, sure.


Shameless plug:

https://github.com/dfhoughton/ranagrams

I find it a good time killer. Note, there's a word list in the repo, but `crate install ranagrams` starts you off without a word list.


    Qui suis-je ?
    
    Me voici Sultan !
    Nuitisme vocal
    Silice mouvant
    Io, vent musical !
    
    Le voici musant
    Mot inclus à vie
    Ce motival insu
    Là vous émincit
    
    Cultivons amie
    Si il vaut ce nom
    Son val muet ici
    
    Indice:
    
    Si nul vice à mot
    Vu ici slame ton
    Nom c'est via lui
    Vaincu tel, omis
    
    
    
    Who am I?
    
    Here I am, Sultan!
    Vocal nightism
    Moving silica
    Io, musical wind!
    
    Here it is, musing
    Word included for life
    This unknown motive
    There, it slices you down
    
    Let's cultivate, friend
    If it's worth that name
    Its silent valley here
    
    Hint:
    
    If no word vice
    Seen here, slams your
    Name, it's through it
    Defeated as such, omitted


I think the best anagrams are the ones that create meaningful phases, I would probably train a SVM or a simple linear layer on BERT (or other encoder) embeddings and use it to score the results.


Link didn’t work for me. Here is a mirror:

http://archive.today/vyp1c


Hm... I was prepared to be skeptical, but the winning pair is indeed a really great anagram. Good stuff!


Jeremy’s Iron


Tell me you’re a millennial who grew up watching The Simpsons without etc etc.


Actually early genx. I don’t think most millennials were old enough for a lot of the jokes in 90’s simpsons as they were young teens. But glad you got it!


“Are you being sarcastic, dude?”

I was born in 1982, and a lot of the humor went over my head til much later. But the first ten seasons or so are still indelibly etched in my memory.


one thing i miss from the "olden" days is the fun we used to have coming up with unassisted anagram discoveries and sharing them with each other. i think my personal favourite was antipyretic/pertinacity


The world, overcrowded and jostling itself with all of its masculine calumnies....


Feels like the edit distance algorithm would be the natural fit here for scoring.


This came up the last time this was discussed here. It didn't work well.

https://gist.github.com/anonymous/431b163b2a2d532bfd0a3bdcc7...

See https://blog.plover.com/lang/anagram-scoring-3.html for more details.


How does that do on the longest example found by the author, which he (correctly IMO) dismisses as uninteresting?

cholecystoduodenostomy / duodenocholecystostomy


Scored by Levenenshtein distance, that is still fourth-highest.


Trigram overlap is a better measure.


> coprophagist

Now that's something...


Yeah, I wish I haadn't looked up the meaning.


Here's my answer for best English anagram:

11 + 2 = 12 + 1

Eleven plus two = twelve plus one.


> regulation urogenital

Eyeballed the entire list*. This is the most 2023 one I could find.

(*no)


My favorite has always been: A man a plan a canal Panama


That's a palindrome, not an anagram. If you'd like to see the director's cut of that palindrome, it's available here:

http://www.fun-with-words.com/palin_panama.html

I've always been partial to pangrams: sentences that use every letter in the alphabet at least once, typically shooting for a short sentence.

My favorite is "Pack my box with five dozen liquor jugs". Not minimal, but delightful and uses only words that most mortals know.


My favourite is that the 49 word one was found by Guy Steele.

Yes, this Guy Steele: https://en.wikipedia.org/wiki/Guy_L._Steele_Jr.


See also https://norvig.com/palindrome.html with code and a bit of Steele followup.


Clint Eastwood

Old West Action


Satan, oscillate my metallic sonatas.


That’s a palindrome but a good EP by Soundgarden (Nude Dragons).


[flagged]


It just searched Bing and loaded up this very article. And man I'm getting really sick of the unrelated ChatGPT comments on literally every post, no offense.


OK, fair enough. I will stop doing it. I get down voted a lot for it :)


I apologize for the snarky comment, I guess I just have GPT fatigue.


Tldr:

Rewrite all words so the letters in each word are rearranged alphabetically. Don't do this other thing suggested on StackExchange

Look at the list of anagrams

Recognize that the list is actually boring

Get silly with words and nerd out

The end!


uh, you forgot the actual tl;dr, which is

     cinematographer --> megachiropteran


> I wouldn't have mentioned this, but someone on StackExchange actually asked this question.

I wonder if the author realizes how condescending this is.


I'm listening. Please explain.


imo interpretation of actually as "I can't believe someone did this (that someone could be so stupid)" rather than actually as "this is an event that definitely occured". Didn't seem too bad to me.


You basically said, “Can you believe someone actually was so stupid as to ask this question on stackexchange? I didn’t believe anyone could actually be that stupid, so I wasn’t go to address it, but it turns out there actually are people that stupid, and they actually outed themselves by posting their question out in the open! So, here I am having to address it in my blog post in case there are other people who actually are equally stupid.”


I’m pretty sure I wrote an anagram solver as a teenager that just spins through permutations forever. I guess I was that stupid once. No comment on whether I’m cured now. Your point is valid, the wording can be read as to imply negativity, but that’s also interpreting it in a negative light. The author was also defending his warning against an imaginary rebuttal, justifying the mention to not to waste time solving anagrams the naive brute force way. I don’t think he was trying to be elitist or offensive, and it is in fact true that people will solve anagrams the slow way, so doesn’t hurt to mention it, and it can even help on occasion. We don’t have to take it as condescending.


I didn't “basically” say any of that. I didn't say any of it at all; you made it all up and then attributed it to me. All I actually did was to state a fact: someone on StackExchange actually asked this question.

All those thoughts about what a stupid question it is, and how stupid someone must have been to ask it—all those are thoughts you had, not me.


FWIW, I also originally read that line to imply "I thought this was obvious, but apparently I need to be explicit anyway".

Given the tone of the rest of the article I assumed that I'd either read something into it that wasn't there or was missing some context - for example, perhaps the person on SO had insisted that this was the correct approach, and you'd gone to some lengths to show that it was not and were exasperated by it by the time you wrote the article.

At any rate - it didn't really take away from your article for me, but I did see it as condescending.


Thanks for letting me know.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: