
Exploration of NYT Crossword answers from 1994-2017 - nastynate
https://github.com/jtanwk/nytcrossword
======
rabidrat
It sounds like they wanted to do the scraping themselves, but for anyone else
who wants to play with this data, it's available for download at
[http://xd.saul.pw/data](http://xd.saul.pw/data).

~~~
slig
Also here:
[https://github.com/doshea/nyt_crosswords](https://github.com/doshea/nyt_crosswords)
and a couple of other repos on GH.

~~~
rabidrat
Thanks for posting this, but not all data is created equal. These crosswords
only go back to 1976, are in .json format, and require nontrivial processing
to get into a useful format for data analysis. By contrast, I very
meticulously designed the formats available at xd.saul.pw to be easy to use.
You can even extract some interesting results with just e.g. `grep` and `wc`.

~~~
slig
Thank you, I did not realize at first that the link you posted is so complete!
I PM'd you on reddit a little while ago asking for the dataset, but didn't
have the time to play with it yet. Thanks again for your time and
consideration.

------
pwaivers
This was actually really fascinating to me. I loved to see all the words that
are _important_ :

\- nsfw - This one is hilarious. Of all the internet terms (e.g. AFAIK, AMA,
LOL), this is the most used.

\- laiditontheline - Why is this important in 2016? Does "Laid it on the line"
have a particular meaning?

\- idriselba - This one is great too. Stringer Bell!

------
jordanpg
It's interesting that word length emerges as such a clear proxy for
difficulty. I often find it easier to think of a longer word or phrase
(assuming no wordplay) than something short and obscure -- the parameter space
is much smaller.

Also interesting to see the step increase between Thursday and Friday. This
finding is consistent with my anecdotal experiences, but I doubt it's
deliberate on the part of the editors. I think there is probably a technical
explanation for this. For example, maybe there are fewer words available of
medium to long length between the set of shorter words and multi-word answers,
so no smooth transition between Thursday and Friday is possible.

~~~
finnh
Sunday's word length slots in nicely between Thursday & Friday, which matches
what I've read the NYT editors desire (for Sunday to be "roughly as difficult
as thu/fri") ... and gainsays your hypothesis =)

Rather than the set of possible answers being U-shaped, maybe it's a size
thing: it is hard to fit in N medium-sized words unless you have the extra
space of the Sunday puzzle.

The top & bottom of Fri/Sat puzzles are often split into just two words (one
long, one short), this might inform our developing hypothesis.

------
ChristianGeek
That third chart is a thing of beauty!

~~~
hackeraccount
what was the clue for CCCCCCC?

14 across - sound made by someone choking

