The minimum puzzle length for spelling bee is 20 words iirc. The dictionary is also a highly curated list of “common” words. What constitutes a valid word is up to Sam, the NYT editor. It’s designed to make the puzzles doable by the average solver. You’ll notice that a lot of the words in the OP are very esoteric.
But it's still imperfect. However a lot of the words I expected to be invalid have actually been in puzzles before, so it's not easy to guess which are going to be good and which aren't.
I think your word list is still considerably too large. Zero chance in my mind that jouk or qajaq, for example, would make it to the NYT wordlist. (I don't think they'd even be accepted in the crossword, which has looser standards, unless there was a very specific theme that called for them). Apart from being obscure, their only use seems to be as non-standard spellings, for juke and kayak respectively. The Spelling Bee doesn't even accept UK spellings.
At least 5 of the proposed pangrams wouldn't make the cut, either.
Perhaps you could scrape https://nytbee.com/, mentioned in the thread, for the historical answers, or contact the owner. Also @banana_giraffe, a commenter on this thread, seems to have scraped 6 years of data.
Then you could take a very permissive wordlist and filter it using the historical data. For all words of six distinct letters or fewer, you could determine whether they were allowed, not allowed, or indeterminate (no puzzle ever appeared that would have allowed them). My gut feeling is that you'd be left with very few indeterminate words, though jouk and qajaq might well be among them - review those manually.
They've loosened up the rules since I left many moons ago, likely to expand number of puzzles without repeats. IIRC we had about 5 years worth at the beginning.
Very cool, thanks! I play every morning. There are times when Sam's curation is very frustrating. It would be nice to submit other valid words and have the game verify them as a way to score "bonus" points. Oftentimes I find that there are some baffling omissions and, after the fact, some truly bizarre inclusions. It would be nice to be able to score points based on your own vocabulary while still having the game's score based on whatever common denominator Sam comes up with.
Agreed, I get frustrated a few days a week with curation inclusions and exclusions. DOORYARD, MICROMINI, NONCOM, ROMCOM are in the list and IMO shouldn't be. UNTENDED, MONOTONIC, BOLE are not in the list but IMO deserve to be.
Agreed. When I type a good word that isn’t accepted, I usually just stop playing that days puzzle. My guess is that Sam is not very scientifically literate. Simple weather words like cyclonic or adiabatic, advection, no dice. And then you get some pretty obscure literary words.
Makes me want to make a free clone that includes science words, and isn’t afraid of the letter S.
Those aren't really redundant, though. I can say, "I guarantee that this product will work," but nobody would say, "I warranty that this product will work." (You could argue that guarantee is redundant with warrant, but the police don't go to the judge to request a search guarantee. Both words are needed.)
I can't think of a single use case for whinge that wouldn't be equally satisfied by whine. Can you?
I can't think of a single use case for nought that wouldn't be equally satisfied by "zero". Or courgette that wouldn't be satisfied by zuchini, or both of those by "baby marrow". Or why use "oregano" when you could use "wild marjoram"? A perfectly good English name that has been largely displaced by an Italian word.
English (as all modern languages) has tons and tons of exact synonyms and other types of redundant words. It's just a normal part of usage.
In this specific case I personally prefer whinge for emphasizing the complaint and whine for emphasising the noise, so I don't really think they are redundant - I think they are slightly different.
no homophones. whine shares a phonemic address with wine, while whinge staked out a plot of its very own, even if its just a slapdash pop-up tent next door to the local drunkard, binge. hinge shares a smartly bricked-up border with both.
Great work on the game btw. My gf introduced me to the game and we love it. Though, we play a variation of it against one another in which we open the game on a single screen and whoever finds a pangram first wins.
To share remotely, one player gets halfway to genius, pangram forbidden, and the second player gets over the line. After that you use SB Buddy (no peeking at hints) to get to Queeen Bee.
The most annoying missing wordlist words are naphtha and caracal. An objective measure of word-use frequency should determine the words included. Probably super-obscure articles of clerical costume should not be.
That may be the target, but there have been a handful of Spelling Bees with less than 20 words in the answer list. For instance, March 27, 2023 had 16 answers:
Very possible. The've loosened the rules have a bit since I originally generated the puzzles e.g. they now allow more than one vowel, i+n+g and e+d are allowed in puzzles, possibly more
Thanks for this. Funny that it's not on my Games page (I'm an NYT Games subscriber) unlike, I think, Connections, which was shown on that page with a "beta" flag.
The NYT’s solutions are always two words, which I rarely get on my own. But once, a couple of years ago, I discovered a one-word solution to one day’s puzzle: LEXICOGRAPHY. Very elegant, I thought to myself smugly.
I happened to remember that solution a couple of months ago, and I decided to see if I could find others. I am not a programmer, but by asking ChatGPT 4 for help I was able to create a Python program and run it in Google Colab using a large list of English words that I had compiled from various online word lists.
Here is the beginning of the resulting list of one-word solutions to LetterBoxed:
acetylcholinesterase [a c e] [h i l] [n o r] [s t y]
acetylcholinesterases [a c e] [h i l] [n o r] [s t y]
achondroplastic [a c d] [h i l] [n o p] [r s t]
acknowledgement [a c d] [e g k] [l m n] [o t w]
acknowledgment [a c d] [e g k] [l m n] [o t w]
The code that ChatGPT 4 wrote for me and the full list of solutions are here:
Whoops. Looking through that list of words again, I see that some of the puzzle configurations could not yield those solutions, because adjacent letters in the solution are in the same triplet. The configuration for acknowledgment, for example, has a and c in the same triplet, which is not possible in LetterBox. My bad. I’ll take a look at the code again later.
With Claude 3 Opus’s help, I think I have fixed the code. The corrected code, which took more than two hours to run in Google Colab, and the resulting list of one-word solutions to LetterBoxed are here:
It wasn’t quite one-shot. I had multiple interactions with ChatGPT on the first run-through before I could get the code to work, and a couple with Claude after I noticed that the partitions weren’t correct.
But all I did was report the errors to the LLMs and paste their revised code back into Colab. The core logic of the search algorithm was created entirely by the LLMs based on my natural-language description of the LetterBoxed rules and the solutions I was looking for. I could not have written that code myself.
The double-backlink at the bottom is amusing - I recognize this bug from painful experience. OP counted each of the 2 links in the other article as a backlink, and didn't deduplicate. But... if you deduplicate, then you are unable to do true bidirectional backlinks to the original caller. You can't figure out 'which' link in the other article is meant.
You need HTML IDs set on each, but they have to be different ID-anchors (obviously, and also because it'd be bad/invalid HTML to have duplicate IDs). So you actually need to track not just the other article but the ID inside the other article, and a way to generate those link IDs to begin with (since you definitely don't want to do it by hand). Gets tricky.
def calculate_backlinks(
pages: Dict[str, Page], attachments: Dict[str, Attachment]
) -> None:
for page in pages.values():
for link in page.links:
linked_page = find(pages, attachments, link)
if not linked_page:
info(f"unable to find link", link, page.title)
continue
linked_page.backlinks.append(page)
No deduplication performed. Since he's parsing the backlinks directly out of the markdown, you don't have to worry about a recursive loop where the backlinks section on one page appears as links in another. A simple solution would be to change the datatype of backlinks from list to set.
For those wondering about the word "quooke", which is listed at the bottom as one of the 6 official solutions, it is the "(obsolete, nonce word) simple past and past participle of quake": https://en.wiktionary.org/wiki/quooke
Spenser coined it to use in the Faerie Queene in 1590:
His horses backe, yet to and fro long shooke,
And tottred like two towres, which through a tempest quooke
I confess that it's hard for me to get excited about solving puzzles to find obsolete nonce words.
grep -v '[^victmze]' /usr/share/dict/words |
grep .... | grep e
A program to generate all possible NYTM SB puzzles is at https://github.com/ncm/nytm-spelling-bee . The alterations to match the online version are trivial. It runs in well under 100 ms on a modern CPU.
There are bigger dictionaries packaged, e.g. wamerican-huge.
Very neat. Would it be more efficient to start with every known pangram (~35k) and calculate the score from there instead of scoring every possible set of 7 letters (~8B) and then filtering for pangrams?
Wrote this which runs basically instantly https://pastebin.com/ax4eTKMr I hope I didn't make some embarassing mistake, but it seems to match the results of the code in part 2.
That's exactly how it works. The script scanned the dictionary for pangrams and then built the puzzles from that list, filtering out pangrams that didn't meet the other rules for the game.
Source: helped build SB at NYT.