
Visualizing language usage in New York Times news coverage - lelf
http://chronicle.nytlabs.com/?keyword=google.programming.radio.olympic&format=percent
======
wickawic
I love how strikingly editorial changes show up:

"Theater" vs. "Theatre":
[http://chronicle.nytlabs.com/?keyword=theater.theatre](http://chronicle.nytlabs.com/?keyword=theater.theatre)

As well as how quickly culturally accepted terms are replaced:

"Mrs." vs. "Ms.":
[http://chronicle.nytlabs.com/?keyword=mrs..ms](http://chronicle.nytlabs.com/?keyword=mrs..ms).

~~~
draugadrotten
Another cultural change -
[http://chronicle.nytlabs.com/?keyword=gay..homosexual](http://chronicle.nytlabs.com/?keyword=gay..homosexual)

~~~
eevilspock
homosexual, gay, same sex, lgbt:
[http://chronicle.nytlabs.com/?keyword=gay.homosexual.lgbt.sa...](http://chronicle.nytlabs.com/?keyword=gay.homosexual.lgbt.same%20sex)

------
danso
The OCR anomalies are very interesting...for example, it seems that the word
"fuck" occasionally found its way into the NYT pages throughout history:

[http://chronicle.nytlabs.com/?keyword=fuck&format=count](http://chronicle.nytlabs.com/?keyword=fuck&format=count)

But if you look at the pre-digital-text archive period (pre-1990?), you won't
see "fuck" in the actual articles, such as the ones in the 1880s...I tried
other cusswords and saw that they would appear in the very-old stories but no
other time in NYT history...So I'm guessing that there is some fuzzy matching
going on to compensate for the text in the scanned archives.

~~~
FatalLogic
There are several hits for 'Internet' in 1853, and hundreds for 'email' in the
1890s. To avoid that, maybe they could weight the OCR recognition based on a
word's first occurrence in dictionaries (backdated a decade or two).

------
jasode
I was about to ask why the word "equation" is trending upwards since 1960 even
though the related words "equations" (plural) has not. It also doesn't exactly
track the word "mathematics". But after I added the word "music", it reminded
me that the Y axis shows "equation" at less than 1%. I suppose that's within
the realm of random noise insofar as word count statistics. Maybe NYT had the
odd mix of staff writers that happen to use "equation" more often in the
journalism that's unrelated to any larger cultural context.

[http://chronicle.nytlabs.com/?keyword=equation.mathematics.m...](http://chronicle.nytlabs.com/?keyword=equation.mathematics.music.equations)

(unclick "music")

~~~
discostrings
Perhaps non-mathematical usage of "equation" in phrases like "entered the
equation" and "wasn't part of the equation" has been increasing?

I certainly hear it in conversation often enough, but I've never taken note of
how frequently it's written.

~~~
jasode
Yes, that seems possible.

The Google Books Ngram Viewer doesn't have any hits for "enter the equation".
However, the word "equation" and "equations" have the opposite trend from
NYT.[1] But to emphasize again, the hits are less than .01% which is
potentially abusing the word "trend".

[1][https://books.google.com/ngrams/graph?content=equation%2Cequ...](https://books.google.com/ngrams/graph?content=equation%2Cequations%2C+music&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cequation%3B%2Cc0%3B.t1%3B%2Cequations%3B%2Cc0%3B.t1%3B%2Cmusic%3B%2Cc0)

------
Glide
Love it.

Just a simple one of past presidents (last names)
[http://chronicle.nytlabs.com/?keyword=clinton.bush.obama.rea...](http://chronicle.nytlabs.com/?keyword=clinton.bush.obama.reagan.carter.nixon.gerald%20ford.watergate)
(didn't want to try ford because of the automotive bailout, but that might
have given better results)

[http://chronicle.nytlabs.com/?keyword=social%20security.bail...](http://chronicle.nytlabs.com/?keyword=social%20security.bailout.crash.unemployment.jobs)

Seems like "jobs" wasn't thrown during the great depression like it is today.

These match up pretty well...

[http://chronicle.nytlabs.com/?keyword=politics.supreme%20cou...](http://chronicle.nytlabs.com/?keyword=politics.supreme%20court)

~~~
eevilspock
try this one for presidents:
[http://chronicle.nytlabs.com/?keyword=president%20nixon.pres...](http://chronicle.nytlabs.com/?keyword=president%20nixon.president%20kennedy.president%20ford.president%20carter.president%20reagan.president%20bush.president%20obama.president%20clinton.president%20johnson)

Seems like "jobs" has somewhat replaced "employment" and "unemployed":
[http://chronicle.nytlabs.com/?keyword=jobs.unemployed.unempl...](http://chronicle.nytlabs.com/?keyword=jobs.unemployed.unemployment.employment)

~~~
jojohack
I'd be careful with "jobs" since case appears to be ignored, so it's likely
picking up references also to Steve Jobs (or anyone with that last name)

------
eamsen
Some interesting symbolic stats:

[1]
[http://chronicle.nytlabs.com/?keyword=computer.smartphone.mo...](http://chronicle.nytlabs.com/?keyword=computer.smartphone.mobile)

[2]
[http://chronicle.nytlabs.com/?keyword=security.freedom](http://chronicle.nytlabs.com/?keyword=security.freedom)

[3]
[http://chronicle.nytlabs.com/?keyword=men.women](http://chronicle.nytlabs.com/?keyword=men.women)

[4]
[http://chronicle.nytlabs.com/?keyword=homophobic.homosexual....](http://chronicle.nytlabs.com/?keyword=homophobic.homosexual.queer)

[5]
[http://chronicle.nytlabs.com/?keyword=love.money](http://chronicle.nytlabs.com/?keyword=love.money)

~~~
pavanky
I think his vs her is more interesting than men vs women.

[http://chronicle.nytlabs.com/?keyword=his.her](http://chronicle.nytlabs.com/?keyword=his.her)

Surprisingly, the difference seems to be be fairly constant.

~~~
eamsen
I would expect that to be more constant when individuals are addressed. My
interest was towards addressing men and women as a group, which - to some
degree - would reflect the 'relevance' of the group for the given time in
media.

------
pumainmotion
It took me a long time to realize that articles related to a topic can be seen
by clicking on the year. This instruction is hidden below the screen fold.
It'd also be nice to see actual numbers (along with the %) for each year
during hover.

It'd also be useful to see a curated list of topics to select from, instead of
just randomly picking some related topics on page load. Hopefully they harvest
some interesting suggestions they receive through @nyt. There are some fun
topic-suggestions in this thread already.

Overall, a nifty tool that I wish existed for all news sources in the world
and worked across all languages.

Thanks to the creator(s)!

------
bellerocky
Wow nice, and look, and maybe it can even predict a likely future:

[http://chronicle.nytlabs.com/?keyword=vietnam.korea.iraq.rus...](http://chronicle.nytlabs.com/?keyword=vietnam.korea.iraq.russia)

------
scott_s
This surprised me; even at the height of the phrase "war on terror", it was
still matched by "cold war", which was going down, then bounced up:
[http://chronicle.nytlabs.com/?keyword=cold%20war.war%20on%20...](http://chronicle.nytlabs.com/?keyword=cold%20war.war%20on%20terror)

Also, a nice way of visualizing just how far back this data reaches:
[http://chronicle.nytlabs.com/?keyword=reconstruction](http://chronicle.nytlabs.com/?keyword=reconstruction)

------
chatmasta
Here's an encouraging one:

[http://chronicle.nytlabs.com/?keyword=beatles.bieber](http://chronicle.nytlabs.com/?keyword=beatles.bieber)

------
ghshephard
I love how "Netscape", "Loudcloud", and "Opsware" were mere flickering events
for the NYT, but "Andreessen" got traction and has been on an upwards trend
for the last ten years. Also - a unique spelling, while probably annoying when
he was growing up, certainly makes it easier to track his presence in media.

------
davidbarker
Does anyone have an idea why the occurrence of "New York" drastically drops
(62,000 articles vs. 29,000 articles) between 1980 and 1981?

[http://chronicle.nytlabs.com/?keyword=new%20york&format=coun...](http://chronicle.nytlabs.com/?keyword=new%20york&format=count)

~~~
FatalLogic
Perhaps they stopped using datelines for stories?

It was probably a change in style or format, not a reduction in the number of
stories about New York, because the percentages of stories about other cities
drop in a similar way at the same time.

[http://chronicle.nytlabs.com/?keyword=new%20york.washington....](http://chronicle.nytlabs.com/?keyword=new%20york.washington.moscow.london&format=percent)

Bonus... the end of that chart, I think you can see when the NYT really did
begin to expand its proportion non-local news.

~~~
davidbarker
Aaah, that makes sense. Thanks!

------
pdevr
Note: Don't take the titles seriously.

Publishing schedule changes?
[http://chronicle.nytlabs.com/?keyword=saturday.sunday.monday...](http://chronicle.nytlabs.com/?keyword=saturday.sunday.monday.tuesday.wednesday.thursday.friday)

Strong correlation:
[http://chronicle.nytlabs.com/?keyword=moon.mars](http://chronicle.nytlabs.com/?keyword=moon.mars)

Each peak higher than the previous leader:
[http://chronicle.nytlabs.com/?keyword=microsoft.google.faceb...](http://chronicle.nytlabs.com/?keyword=microsoft.google.facebook.twitter)

Resilience:
[http://chronicle.nytlabs.com/?keyword=computer.phone](http://chronicle.nytlabs.com/?keyword=computer.phone)

And then there was one:
[http://chronicle.nytlabs.com/?keyword=fax.iphone](http://chronicle.nytlabs.com/?keyword=fax.iphone)

Exponential decrease of area:
[http://chronicle.nytlabs.com/?keyword=telegram.fax.iphone](http://chronicle.nytlabs.com/?keyword=telegram.fax.iphone)

There's still hope:
[http://chronicle.nytlabs.com/?keyword=science.pop%20music](http://chronicle.nytlabs.com/?keyword=science.pop%20music)

We meet again, after almost a century:
[http://chronicle.nytlabs.com/?keyword=america.russia](http://chronicle.nytlabs.com/?keyword=america.russia)

The baby boomers had it so good:
[http://chronicle.nytlabs.com/?keyword=marriage.divorce](http://chronicle.nytlabs.com/?keyword=marriage.divorce)

Didn't last:
[http://chronicle.nytlabs.com/?keyword=blessings](http://chronicle.nytlabs.com/?keyword=blessings)

Interesting:
[http://chronicle.nytlabs.com/?keyword=luck](http://chronicle.nytlabs.com/?keyword=luck)

Added:

Increasing aspirations or inflation?
[http://chronicle.nytlabs.com/?keyword=millionaire.billionair...](http://chronicle.nytlabs.com/?keyword=millionaire.billionaire)

Obligatory:
[http://chronicle.nytlabs.com/?keyword=hacker](http://chronicle.nytlabs.com/?keyword=hacker)

------
debt
What if more articles were published in a particular year? I mean I assume the
data would be skewed if more words existed in a particular year. I figured
they'd measure how the language of say a random bunch of 5000 words as changed
over the course of the life of NYT.

------
kasperset
Search Query: ipad. This article looks interesting:
[http://www.nytimes.com/1999/11/11/technology/state-of-the-
ar...](http://www.nytimes.com/1999/11/11/technology/state-of-the-art-the-web-
without-microsoft.html)

------
ISL
These plots would be even more interesting with a logarithmic scale option.

In Chronicle-golf, the most common word I can find is "new" at 74% of articles
in 1978 (which beats out "who","what","when", "why", and "how"...).

~~~
andrioni
"I" gets ~79% in 1869.

~~~
ISL
Whoa - "can" scores 100% in only one year.

------
closetnerd
The obvious one, some religious denominations:
[http://chronicle.nytlabs.com/?keyword=jewish.christian.musli...](http://chronicle.nytlabs.com/?keyword=jewish.christian.muslim.buddhist.hindu)

------
mnarayan01
[http://chronicle.nytlabs.com/?keyword=nigger.chink.chinaman](http://chronicle.nytlabs.com/?keyword=nigger.chink.chinaman)

I wonder how much of the change is terms becomes slurs and how much is NYT
racism.

------
pdevr
How expensive food items are "created": truffles, caviar:
[http://chronicle.nytlabs.com/?keyword=truffles.caviar](http://chronicle.nytlabs.com/?keyword=truffles.caviar)

------
pravda
Civil War, World War II, and more!

[http://chronicle.nytlabs.com/?keyword=assault.attack](http://chronicle.nytlabs.com/?keyword=assault.attack)

Ok, what happened in 1871?

------
johncoltrane
[http://chronicle.nytlabs.com/?keyword=negro](http://chronicle.nytlabs.com/?keyword=negro)

------
officialjunk
the largest percentages i found so far are for the words: president and war.

~~~
draugadrotten
The trends for "russia" and "military" worries me.

------
BTurkE
An interesting one: try "code". There's an odd spike in 1934

~~~
FatalLogic
It's very likely to be related to the National Recovery Administration, which
was formed in 1933, and set price codes and codes of fair practice. It must
have generated a huge amount of public debate and lots of news stories.

[http://en.wikipedia.org/wiki/National_Recovery_Administratio...](http://en.wikipedia.org/wiki/National_Recovery_Administration)

------
c0achmcguirk
c# was way more popular in the 1800's than today.

------
lkrubner
Interesting that "republican" is much more common than "democrat" and has been
for all the decades covered by the graph.

~~~
_delirium
You're comparing different things there: Republican is both the noun and
adjective form, while Democrat is only the noun form. "Clinton is a Democrat"
and "Bush is a Republican", but "Obama was the 2000 Democratic candidate" vs.
"Dole was the 1996 Republican candidate".

You could try to adjust for that by comparing Republican vs. the sum of
Democratic+Democrat, but that also pulls in unrelated uses of both terms:
"democratic reforms in $countryname" and "Irish republicans", especially since
it isn't case-sensitive. Which then probably overcounts "democratic", because
it's used in that non-US-party sense more than "republican" is.

In general this kind of thing makes it _very_ tricky to conclude things from
pure word or n-gram frequency counts, since without more semantic annotation
there are a ton of confounding issues.

