Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What are the "worst" spelling bee pangrams? (billmill.org)
104 points by tptacek on March 17, 2024 | hide | past | favorite | 85 comments



The minimum puzzle length for spelling bee is 20 words iirc. The dictionary is also a highly curated list of “common” words. What constitutes a valid word is up to Sam, the NYT editor. It’s designed to make the puzzles doable by the average solver. You’ll notice that a lot of the words in the OP are very esoteric.

Source: helped build SB at NYT.


Thanks for the insight!

There's no actual answer to the question, given that the word list can be set inconsistently, so I had to choose _something_ to go off.

The best word list I was able to find is the one hosted by https://www.sbsolver.com/ , but unfortunately they don't distribute it.

I got a somewhat better word list for part 2: https://notes.billmill.org/blog/2024/03/mitzVah_-_the__worst...

But it's still imperfect. However a lot of the words I expected to be invalid have actually been in puzzles before, so it's not easy to guess which are going to be good and which aren't.

edit: there have been puzzles with as few as 16 words before: https://www.sbsolver.com/stats/count/low

edit 2: I modified the program to print puzzles with at least 16 words, and the "worst" puzzles it found with that constraint are:

unbEknown, jawbonE, monadnocK, woRkgRoup, daGlock, moonwalK, confLux, buLLhorn, yOkOzuna, Fraught, hogliKe


I think your word list is still considerably too large. Zero chance in my mind that jouk or qajaq, for example, would make it to the NYT wordlist. (I don't think they'd even be accepted in the crossword, which has looser standards, unless there was a very specific theme that called for them). Apart from being obscure, their only use seems to be as non-standard spellings, for juke and kayak respectively. The Spelling Bee doesn't even accept UK spellings.

At least 5 of the proposed pangrams wouldn't make the cut, either.


I agree with you, do you have a better one? I can only work with what I have


Perhaps you could scrape https://nytbee.com/, mentioned in the thread, for the historical answers, or contact the owner. Also @banana_giraffe, a commenter on this thread, seems to have scraped 6 years of data.

Then you could take a very permissive wordlist and filter it using the historical data. For all words of six distinct letters or fewer, you could determine whether they were allowed, not allowed, or indeterminate (no puzzle ever appeared that would have allowed them). My gut feeling is that you'd be left with very few indeterminate words, though jouk and qajaq might well be among them - review those manually.


[1] has a couple of references to lists of common words used as inspiration.

[1] https://gitlab.com/engmark/xkcd-passphrase-generator


They've loosened up the rules since I left many moons ago, likely to expand number of puzzles without repeats. IIRC we had about 5 years worth at the beginning.


Very cool, thanks! I play every morning. There are times when Sam's curation is very frustrating. It would be nice to submit other valid words and have the game verify them as a way to score "bonus" points. Oftentimes I find that there are some baffling omissions and, after the fact, some truly bizarre inclusions. It would be nice to be able to score points based on your own vocabulary while still having the game's score based on whatever common denominator Sam comes up with.


Agreed, I get frustrated a few days a week with curation inclusions and exclusions. DOORYARD, MICROMINI, NONCOM, ROMCOM are in the list and IMO shouldn't be. UNTENDED, MONOTONIC, BOLE are not in the list but IMO deserve to be.


As a nonnative speaker it's first time ever I've heard word "bole" being used


Agreed. When I type a good word that isn’t accepted, I usually just stop playing that days puzzle. My guess is that Sam is not very scientifically literate. Simple weather words like cyclonic or adiabatic, advection, no dice. And then you get some pretty obscure literary words.

Makes me want to make a free clone that includes science words, and isn’t afraid of the letter S.


I think your definition of "simple" doesn't agree with the average person's. I guarantee you that 98% of people don't know the word "adiabatic".


It’s a very common word in many technical domains. Not like it’s a guy’s name or something.


"It's a very common word in extremely niche domains" doesn't make something common, unfortunately.


It is also very culturally biased; some loanwords are more present than others


The omissions that kill me are common nautical terms.


And so many "non-American" words are rejected too like: whinge, colour, metre, etc.


"Whinge" is just a plain old misspelling of "Whine," nothing more or less. We don't need "Whinge" to become a word; we already have "Whine."


"Whinge" definitively is a word[1] and has existed since old English.

[1] https://www.oed.com/search/dictionary/?scope=Entries&q=Whing...


Well, it shouldn't be. It's entirely redundant with 'whine'.


Where do you stand on "guarantee" vs "warranty"?

That's not to mention common phrases with a built-in redundancy such as "Cease and desist"? "Face mask"? "Free gift"?

Idiomatic English has lots of redundancies of one kind or another.


Those aren't really redundant, though. I can say, "I guarantee that this product will work," but nobody would say, "I warranty that this product will work." (You could argue that guarantee is redundant with warrant, but the police don't go to the judge to request a search guarantee. Both words are needed.)

I can't think of a single use case for whinge that wouldn't be equally satisfied by whine. Can you?


I can't think of a single use case for nought that wouldn't be equally satisfied by "zero". Or courgette that wouldn't be satisfied by zuchini, or both of those by "baby marrow". Or why use "oregano" when you could use "wild marjoram"? A perfectly good English name that has been largely displaced by an Italian word.

English (as all modern languages) has tons and tons of exact synonyms and other types of redundant words. It's just a normal part of usage.

In this specific case I personally prefer whinge for emphasizing the complaint and whine for emphasising the noise, so I don't really think they are redundant - I think they are slightly different.


No it isn't. What gave you that daft idea?


It's a stupid word, one that we don't need. 'Whine' works just fine. Why was it necessary to add a 'g'?

You can see 'whinge' gaining ground very recently at the expense of 'whine' here:

https://books.google.com/ngrams/graph?content=whine%2Cwhinge...

This is an outrage, and must be stopped. :-P


no homophones. whine shares a phonemic address with wine, while whinge staked out a plot of its very own, even if its just a slapdash pop-up tent next door to the local drunkard, binge. hinge shares a smartly bricked-up border with both.


It feels like they could just use something like the Google Ngram viewer to filter the words.


I’m still pissed about advection!


What does “isn’t afraid of the letter S” mean?


spelling bee puzzles never contain S. Originally they didn't include puzzles with e+d or i+n+g either.


I still remember my disappointment when I entered HEMOPHAGE and it was deemed "not a valid word".


naphtha and caracal.


And naphthalene, which would have been a pangram that day.


I seem to remember both those words were put forward in the hints page, too!


> Source: helped build SB at NYT.

Wow. Another example of HN at its finest.

Great work on the game btw. My gf introduced me to the game and we love it. Though, we play a variation of it against one another in which we open the game on a single screen and whoever finds a pangram first wins.


To share remotely, one player gets halfway to genius, pangram forbidden, and the second player gets over the line. After that you use SB Buddy (no peeking at hints) to get to Queeen Bee.

The most annoying missing wordlist words are naphtha and caracal. An objective measure of word-use frequency should determine the words included. Probably super-obscure articles of clerical costume should not be.


That may be the target, but there have been a handful of Spelling Bees with less than 20 words in the answer list. For instance, March 27, 2023 had 16 answers:

> MORTIFY, FORTIFY, FIFTY, FORTY, MOTIF, FIRM, FOOT, FORM, FORT, FROM, IFFY, MIFF, RIFF, RIFT, ROOF, TIFF


Very possible. The've loosened the rules have a bit since I originally generated the puzzles e.g. they now allow more than one vowel, i+n+g and e+d are allowed in puzzles, possibly more


> The've loosened the rules have a bit since I originally generated the puzzles e.g. they now allow more than one vowel, ….

The original Bees allowed only one vowel? That must have made it really tough to get long Bees!


Maybe it was 2. It's been a while..


the "ed" and "ing" puzzles are just annoying.


Toot, trot, tort, and moot aren't legal?


From the list of words, I think F was the required (central) letter.


As others have pointed out, all words must have the center letter, which was "F" on this day, the outer letters were "I M O R T Y"


You have to use the center letter, whatever it was.


So you need a 10-letter pangram that generates like 19 4-letter words. :)


> What constitutes a valid word is up to Sam, the NYT editor

I think the only listed words I'd think would get approved are jukebox, quixotic, and gimmickry.


Is it the same dictionary Letter Boxed uses?


Definitely not! I play both and Letter Boxed accepts many more words.


Related aside: if you like Spelling Bee, try NYT’s new game Strands: https://www.nytimes.com/games/strands

It’s in beta right now and so I believe is accessible to everyone. Make sure to read the instructions.

I love puzzles but for some reason I’ve never been into word/dictionary games (Spelling Bee; Boggle; Scrabble) but since hearing about Strands via https://www.theatlantic.com/technology/archive/2024/03/stran... I’ve played every day.


Thanks for this. Funny that it's not on my Games page (I'm an NYT Games subscriber) unlike, I think, Connections, which was shown on that page with a "beta" flag.


Indeed that’s a very good game. Thanks.


I love when people get fascinated by a puzzle and deep dive into it like this. Great article.

Similarly I was fascinated by LetterBoxed, another NYT game and took a crack at a solver. https://hlfshell.ai/posts/letter-puzzles/


I’m a fan of LetterBoxed, too.

The NYT’s solutions are always two words, which I rarely get on my own. But once, a couple of years ago, I discovered a one-word solution to one day’s puzzle: LEXICOGRAPHY. Very elegant, I thought to myself smugly.

I happened to remember that solution a couple of months ago, and I decided to see if I could find others. I am not a programmer, but by asking ChatGPT 4 for help I was able to create a Python program and run it in Google Colab using a large list of English words that I had compiled from various online word lists.

Here is the beginning of the resulting list of one-word solutions to LetterBoxed:

acetylcholinesterase [a c e] [h i l] [n o r] [s t y]

acetylcholinesterases [a c e] [h i l] [n o r] [s t y]

achondroplastic [a c d] [h i l] [n o p] [r s t]

acknowledgement [a c d] [e g k] [l m n] [o t w]

acknowledgment [a c d] [e g k] [l m n] [o t w]

The code that ChatGPT 4 wrote for me and the full list of solutions are here:

https://gally.net/temp/20240318onewordsolutionstoletterboxed...


Whoops. Looking through that list of words again, I see that some of the puzzle configurations could not yield those solutions, because adjacent letters in the solution are in the same triplet. The configuration for acknowledgment, for example, has a and c in the same triplet, which is not possible in LetterBox. My bad. I’ll take a look at the code again later.


With Claude 3 Opus’s help, I think I have fixed the code. The corrected code, which took more than two hours to run in Google Colab, and the resulting list of one-word solutions to LetterBoxed are here:

https://gally.net/temp/20240318onewordsolutionstoletterboxed...


I'll admit that I'm impressed it was able to one-shot create a solution to such a complex search problem.


It wasn’t quite one-shot. I had multiple interactions with ChatGPT on the first run-through before I could get the code to work, and a couple with Claude after I noticed that the partitions weren’t correct.

But all I did was report the errors to the LLMs and paste their revised code back into Colab. The core logic of the search algorithm was created entirely by the LLMs based on my natural-language description of the LetterBoxed rules and the solutions I was looking for. I could not have written that code myself.


I have a short program to solve letterboxed. I can't measure how fast it is.

  #include <array>
  #include <algorithm>
  #include <bitset>
  #include <iostream>
  #include <string_view>
  #include <vector>
  #include <unistd.h>
  #include <sys/mman.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  
  struct Word {
    using Bits = std::bitset<32>; // A bit for each a..z, plus an "error" bit at index 26.
    using Rules = std::array<Bits, 27>; // Map from letter indices to forbidden letters (plus a spare).
  
    std::string_view str;
    Bits bits;
  
    static unsigned ix(char c) { return c - 'a'; };  // 'a'..'z' -> 0..25; others -> >25
    bool ok() const { return !bits.test(26) && str.size() > 1; }
    explicit Word(std::string_view s, Bits accept = {0x3ffffff}, Rules const& rules = {}) : str(s) {
      for (unsigned i = 0, prev_ix = 26, bit_ix = 0; i != str.size(); prev_ix = bit_ix, ++i) {
        bit_ix = std::min(ix(str[i]), 26u);
        if (accept.test(bit_ix) && !rules.at(prev_ix).test(bit_ix)) { bits.set(bit_ix); }
        else { bits.set(26); return; }}}
  };
  
  int main(int ac, char** av) {
    auto usage = [av](){ std::cerr << "usage: " << av[0] << " abcdefghijkl [<wordlist>]\n"; };
    if (ac != 2 && ac != 3) { return usage(), 1; }
    char const* const  name = (ac == 3) ? av[2] : "/usr/share/dict/american-english-large";
    const int fd = ::open(name , O_RDONLY);
    if (fd < 0) { return std::cerr << av[0] << ": failed to open " << name  << '\n', 3; }
    const std::size_t file_size = ::lseek(fd, 0, SEEK_END);
    void const* const addr = ::mmap(nullptr, file_size, PROT_READ, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED) { return std::cerr << av[0] << ": failed to open " << name  << '\n', 3; }
  
    const auto  target = Word{av[1]};
    if (target.bits.test(26) || target.str.size() != 12 || target.bits.count() != 12)
      { return usage(), 3; }
  
    const auto rules = [w=target.str, ix=Word::ix](Word::Rules rules = {}) {
      for (int i = 0; i < 12; i += 3)
        rules.at(ix(w[i])) = rules.at(ix(w[i+1])) = rules.at(ix(w[i+2])) = Word(w.substr(i, 3)).bits;
      return rules;
    }();
  
    const auto  candidates = [&](std::array<std::vector<Word>, 26> candidates = {}) {
      for (std::string_view  in = {static_cast<char const*>(addr), file_size}; !in.empty();) {
        const Word word{in.substr(0, in.find('\n')), target.bits, rules};
        if (word.ok()) { candidates.at(Word::ix(word.str.front())).push_back(word); }
        in = in.substr(word.str.size() + 1);  // Skip past '\n' if present (might not be, at EOF).
      }
      return candidates;
    }();
  
    for (auto const& firsts : candidates) {
      for (const auto first : firsts) {
        for (const auto second : candidates.at(Word::ix(first.str.back()))) {
          if ((first.bits | second.bits) == target.bits) {
            std::cout << first.str << ' ' << second.str << '\n'; }}}}
  }


The double-backlink at the bottom is amusing - I recognize this bug from painful experience. OP counted each of the 2 links in the other article as a backlink, and didn't deduplicate. But... if you deduplicate, then you are unable to do true bidirectional backlinks to the original caller. You can't figure out 'which' link in the other article is meant.

You need HTML IDs set on each, but they have to be different ID-anchors (obviously, and also because it'd be bad/invalid HTML to have duplicate IDs). So you actually need to track not just the other article but the ID inside the other article, and a way to generate those link IDs to begin with (since you definitely don't want to do it by hand). Gets tricky.


Yup, here's the problem [0]:

  def calculate_backlinks(
      pages: Dict[str, Page], attachments: Dict[str, Attachment]
  ) -> None:
      for page in pages.values():
          for link in page.links:
              linked_page = find(pages, attachments, link)
              if not linked_page:
                  info(f"unable to find link", link, page.title)
                  continue
             
  linked_page.backlinks.append(page)

No deduplication performed. Since he's parsing the backlinks directly out of the markdown, you don't have to worry about a recursive loop where the backlinks section on one page appears as links in another. A simple solution would be to change the datatype of backlinks from list to set.

[0] From OP's static site generator: https://github.com/llimllib/obsidian_notes/blob/main/run.py


For those wondering about the word "quooke", which is listed at the bottom as one of the 6 official solutions, it is the "(obsolete, nonce word) simple past and past participle of quake": https://en.wiktionary.org/wiki/quooke

Spenser coined it to use in the Faerie Queene in 1590:

His horses backe, yet to and fro long shooke,

And tottred like two towres, which through a tempest quooke

I confess that it's hard for me to get excited about solving puzzles to find obsolete nonce words.


quooke is a valid scrabble word that is almost certainly not valid for the spelling bee, however I don't have a very good spelling bee word list.

So you might still be interested in the spelling bee, they mostly don't allow words like that.

(And thanks for the eytmology!)


>"(obsolete, nonce word)"

Note: 'Nonce' is British slang for paedophile, though did you mean 'nonsense'?


No, as 'chownie' says, I was using it in it's original technical sense of a word that is created for the occasion: https://en.wikipedia.org/wiki/Nonce_word

Nonce is also used in cryptography for a number that is arbitrary and only used once: https://en.wikipedia.org/wiki/Cryptographic_nonce


No, "nonce" meaning paedophile is very modern.

Until the 1970s nonce meant "appears only once" and referred to figures and terms which never found common use after being coined.


Ironic that `equivoke`, defined as a word or phrase that has multiple meanings, is the word that when shuffled has the fewest meanings.


Here is a TUI player and solver that maybe is of interest:

https://github.com/philshem/open-spelling-bee


The one-liner to solve that SB is

  grep -v '[^victmze]' /usr/share/dict/words |
    grep .... | grep e
A very short program to generate all possible puzzles given a word list is at https://github.com/ncm/nytm-spelling-bee .


As an aside, I’ve been frustrated at some of the game list’s omissions.

When “ullage” was not in the list, but a week or two later “doggo” was, I considered giving up.

Fun game, but frustrating and annoying at times.


The one-liner to solve that SB is

  grep -v '[^victmze]' /usr/share/dict/words |
    grep .... | grep e
A program to generate all possible NYTM SB puzzles is at https://github.com/ncm/nytm-spelling-bee . The alterations to match the online version are trivial. It runs in well under 100 ms on a modern CPU.

There are bigger dictionaries packaged, e.g. wamerican-huge.


Very neat. Would it be more efficient to start with every known pangram (~35k) and calculate the score from there instead of scoring every possible set of 7 letters (~8B) and then filtering for pangrams?


The problem is that if you do that, you have to test every word (~42k) against every pangram (~16k) to see if it's a subset.

I just wrote two quick test programs to find the score of the set of words for every pangrams, and my approach takes 6s vs 15s for your proposed approach: https://gist.github.com/llimllib/cc01daa8be8ced13ddeb6c76cf1...


It is certainly possible on some days to hit the "Solid" score, which is the terminal rank for free users, with only a pangram.


Wrote this which runs basically instantly https://pastebin.com/ax4eTKMr I hope I didn't make some embarassing mistake, but it seems to match the results of the code in part 2.


> Every day's puzzle is guaranteed to have at least one pangram.

I’m almost certain that wasn’t true for at least one puzzle early this year, but haven’t been able to come up with an exact date.


Official rules say otherwise: "Each puzzle includes at least one “pangram,” which uses every letter."

https://www.nytimes.com/puzzles/spelling-bee -> click on 'How to Play'


I've always assumed they generate the puzzle starting from a pangram.


That's exactly how it works. The script scanned the dictionary for pangrams and then built the puzzles from that list, filtering out pangrams that didn't meet the other rules for the game.


This is the assumption I started with when I started building my clone.


The 2140 or so puzzles I've grabbed over the years all have at least one panagram.

> 1: 1642, 2: 364, 3: 90, 4: 24, 7: 2, 5: 12, 6: 5, 8: 1

That 8 panagram day was December 16, 2021


https://nytbee.com/ is an excellent source for historical data, if you'd like to try to track it down.


Am I missing something or does this not take the center-letter rule into account?


Part 2 does


Just came here to say: bill mill is an awesome human and I'm glad to see his work appreciated here!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: