
Who Wrote the Anti-Trump New York Times Op-Ed? Using Tidytext - ehudla
http://varianceexplained.org/r/op-ed-text-analysis/
======
minimaxir
The trick behind NLP that's surprisingly under discussed is that you _must_
use apples-to-apples comparisons between datasets. Using simple tweets as a
proxy for style against a professionally-edited NYT op-ed is silly, but this
analysis (along with the previous analyses) is transparent about that fact.

~~~
everdev
The OpEd reads more like a speech. I'd assume there are some publicly
available speeches (or videos with subtitles) that can be analysed instead.

Some articles mention that anonymous sources specifically take note of the
speech patterns of other people so they can use those to throw off the trail.
I'd be very surprised if the author didn't at least try to do that, or agree
to have the NYT rewrite each sentence while preserving the meaning.

It's an interesting exercise and can and might be able to rule out some
authors, but it seems like someone harboring such opinions and wishing to
express them to some extent (and has with others -- in the resistance) will be
discovered in short order.

Outing this person might be an even bigger story than the OpEd itself. But I
put my money on old fashion journalism rather than text analysis.

~~~
alex_anglin
Presumably the candidates for consideration have speech writers and staff who
would either write a given speech outright or influence it heavily. So, that
would likely rule out any speeches related to their position.

~~~
kevinmchugh
The speechwriters and staff are in some cases valid guesses in their own
right, but don't have a unique corpus.

------
chx
Yeah so a linguist already looked at this

[https://www.thedailybeast.com/this-linguist-helped-catch-
the...](https://www.thedailybeast.com/this-linguist-helped-catch-the-
unabomber-heres-his-take-on-the-new-york-times-resistance-op-ed)

> “What’s been done so far is not scientific and simply silly,” he said.

People. Don't get too drunk on your own kool-aid. Do I need to write an op ed
that there are adults in the room :P ?

~~~
girvo
This is a very quick dismissal, when the article _itself_ mentions this! It's
entire purpose is to show off their neat library, not to give a substantive
prediction on who the author is.

------
whatshisface
It's worth noting that Sec. Mike Pompeo used to be the director of the CIA and
would probably know someone who could tell him about NLP and how to avoid it.
The difference between identifying Shakespeare and identifying present-day
politicians is that the politicians alive today have a chance of knowing about
technology used today!

[https://en.wikipedia.org/wiki/Mike_Pompeo](https://en.wikipedia.org/wiki/Mike_Pompeo)

~~~
threeseed
Not sure if you've ever worked at a large company but that isn't how it works.
CEOs don't know most of the people at the lower levels other than at company
functions e.g. roadshow. They spend 99% of the time with their direct reports
and the board.

And even if Pompeo did know a Data Scientist who could tell him about NLP why
would he ask them how to avoid being detected when writing a scathing op-ed
piece ? You don't think that person would potentially leak ?

~~~
williamscales
I'm not sure, perhaps if he had been presented with briefings containing
evidence derived from these techniques he might be aware of their existence.

At any rate, I would expect him to be familiar with the idea of analyzing a
text to find a leaker. It just seems like he in particular would be more
careful.

But who knows? Speculating about this stuff is fun. You have some great
points, remember how incompetent Manafort was?

------
mc32
Is it also not somewhat equally likely someone who wanted to remain unknown to
mask their own lexicon so as to make it harder to detect who the real
author(s) were?

If someone were to do something like this, would it not be in their interest
to assume someone else's lexicon or, at least, coopt the lexicon of others in
general to throw of the lexicographic 'scent'?

In other words we can't only make the presumption that this is the author(s)'
natural language.

~~~
seth_tr
At the end of the article the author discusses this, quoting this tweet:
[https://twitter.com/dmimno/status/1037490608015925248](https://twitter.com/dmimno/status/1037490608015925248)

"Now might be a good time to remind everyone that “distinctive phrases” and
rare words (high TF-IDF) are not as good for stylometry as subtle differences
like “and” vs “the” ratios. If you can easily notice it, someone can easily
spoof it."

~~~
emmelaich
Then again stylometry would be the thing you'd expect to change most between a
tweet and an article published with style checkers and editors.

------
vipulved
One would assume that the author, who decided to publish a time-insensitive
piece anonymously, was smart enough to employ language to intentionally
misdirect analytical methods.

~~~
coldcode
I think the editors likely worked the content over as well to mix multiple
styles into it, which would make analysis like the OP difficult or impossible.
If you know people will analyze content using common tools, you can too and
keep mixing until its too fuzzy to be id'd.

~~~
coldcode
They say they didn't, but that doesn't mean the author didn't do it
themselves.

------
sn41
I have a suggestion, which others may have thought of: can you randomize the
style using programs?

1\. You write an essay.

2\. [Word-level randomization] An algorithm looks at it, as a first cut,
replaces uncommon words with common equivalents.

3\. [Sentence-level randomization] Changes each sentence to a randomly
selected grammatically correct equivalent.

4\. [Paragraph-level randomization] Adds spurious repetitions, replace some
paragraphs with automated summarized versions.

i.e. destroy the rhetorical effect, but get the message across so as to defeat
NLP-based identification. If the content of the message is important, and the
final version is okay to the author, this might be safer for anonymous
disclosures.

~~~
tmalsburg2
Perhaps it would be enough to just Google-translate the text to some other
language and back?

~~~
sn41
Yes, translating different paragraphs into different languages and back might
work.

------
mcguire
I wonder if there are any ethical issues in outing someone using these
techniques.

~~~
minimaxir
Analyzing writing patterns to determine the author has been a tool around for
decades and predates statistical techniques like TF-IDF.

~~~
TheCoelacanth
How is that relevant? Are old things automatically ethical?

~~~
BurningFrog
Well, they're common knowledge.

If I refuse to do something millions of others can easily do, it won't change
anything.

------
williamstein
This situation reminds me a little of the Unabomber, who was finally caught
entirely because somebody (his brother) simply recognized his writing style.

~~~
alex_anglin
Ted Kaczynskis writings were probably influenced a great deal by his mental
illness, which would make him an outlier. In contrast, those in bureaucracies
will likely tend towards milquetoast language over time, making recognition
more difficult.

------
village-idiot
I think if I planned to write a piece like this, I'd give an outline to a
close confidant, such as my spouse, and have them write it. Assuming said
confidant doesn't have a ton of public writing samples, this would frustrate
any textual analysis.

~~~
the_watcher
Or write an outline, have someone at the NYT write it, then send it back to
you and edit it for clarity and accuracy.

~~~
pawelmurias
Someone someone at NYT could just writing it without the outline.

------
wodenokoto
I did a end-of-semester project on stylometry on my masters, and no-where in
our literature review did we see anyone comparing TF-IDF's scores for
authorship attribution.

Other than some guy on twitter saying TF-IDF, on what grounds is this
classifier chosen?

Other mention that "This is about showing off the library". In that case, the
authors should chose to show case their tools for a problem they are ment to
solve.

"Check out my hammer, here's how to use it: First, you get some screws..."

------
ada1981
I really wish some news outlet was circulating the headline “text analsysis
shows Trump to be most likely writer of anonymous memo!”

------
aphextron
The op-ed was a plant, personally approved by Trump. This has literally been
his standard MO for decades: plant fake stories and weaponize the press for
his own use. And it works. He effectively ended the entire Manafort/Cohen news
cycle overnight.

Think about the plausibility of this story for even a moment. There's a list
of less than 10 people that could have possibly written the op-ed, and what it
amounts to legally is a _written admission of high treason_ , a felony
punishable by death. Do you seriously think _any_ reasonable person would ever
do that and assume they would get away with it?

The entire administration has been one massive psy-ops campaign against the
American public. Don't listen to a single word they say. Just watch their
actions, and react with your own judgement accordingly.

~~~
learc83
>and what it amounts to legally is a written admission of high treason, a
felony punishable by death.

Here is the legal definition of treason in the US (at the federal level).

"Treason against the United States, shall consist only in levying War against
them, or in adhering to their Enemies, giving them Aid and Comfort. No Person
shall be convicted of Treason unless on the Testimony of two Witnesses to the
same overt Act, or on Confession in open Court."

Nothing in that letter comes close to fitting that definition.

> He effectively ended the entire Manafort/Cohen news cycle overnight.

The Manafort Cohen news cycle was already ended by the Woodward book preview.
All this letter did was make the accusations in the book look even more
plausible.

~~~
cabaalis
I haven't heard anything at all from my peers about the Woodward book. I have
heard about the essay. So if it was a plant, it was geared towards the book
and not recent trials.

------
dboreham
Unless "gutless anonymous" is a fool, they got someone else (e.g. NYT
staffers) to write the actual text. Better to look at individuals who had
recent physical contact with NYT senior level employees since they would
certainly avoid telephone calls and email. Fairly likely the manifesto was
written on a Remington or Underwood.

~~~
willart4food
Step 1: write the Op-Ed Step 2: using google translate, translate the article
from English to a different language, let's say German. Step 3: still using
google translate, translate the German version into a different language,
let's say French Step 4: translate from French back to English Step 5: have
someone else re-edit it back into English

~~~
spacehome
If you're that paranoid, why would you trust Google?

