
Looking for the author behind NYT op-ed using data science - ehudla
https://github.com/mkearney/resist_oped/blob/master/resist-oped-text-similarity.md
======
was_boring
Awesome! I was actually thinking of doing something similar, but with some key
differences:

\- try and find transcripts and speeches each "suspect" has given.

\- Probably use a classification algo to try and determine who the author
could be (most likely using KNN)

\- I hadn't determined feature building, but was thinking of a either a simple
one-hot encoding, entity embedding, or tf-idf.

I then encountered a moral dilemma -- do I do this and it potentially becomes
ammo for messing with someone else's career (when that is not my intention)?

------
bcherny
Aren't there ethical considerations in outing the author? Clearly he/she
intended to remain anonymous, for obvious reasons.

~~~
xref
I wouldn't think so, it's not like doxxing someone. They author did something
very public and very political and America at least doesn't have any rules
against identifying that person.

Separately, I don't think any good will come from identifying them.

------
rossdavidh
Interesting. I put more weight on the authors caveats about this method, than
on the method itself, which is fine since it was I think just presented as an
example of what could be done.

Also, if the person most likely (by this analysis) is actually the one who
wrote the op-ed, it was either: \- the POTUS, which seems unlikely (although
who knows) \- the secretary for the POTUS, who types his tweets as well \- the
one person in the President's circle who he cannot fire

------
hk__2
I wonder if there could be a way to prevent this by e.g. paying a bunch of
random people to rewrite parts of the text in their “own” style (with a review
to ensure the meaning isn’t lost).

------
RickJWagner
I think if the source cares about his/her job, they'd intentionally throw in
some misdirection (i.e. lodestar')

------
rootusrootus
This is definitely interesting. However, I have assumed from the beginning
that, protests to the contrary notwithstanding, this op-ed was thoroughly
edited by _someone_ to prevent such an analytical outing.

------
chx
[https://news.ycombinator.com/item?id=17943416](https://news.ycombinator.com/item?id=17943416)
...

~~~
csmark
So the person interviewed is a linguist who's colleagues "helped crack the
Unabomber case by analyzing Ted Kaczynski's 35,000 word manifesto."

So the Washington Post published the Unabomber's 35,000 word manifesto in it's
daily paper in 1995 as a special section. The Unabomber's brother recognized
and turned in his brother. There was no crack in the case by investigators.
His brother received the $1 million dollar reward from the Department of
Justice for turning him in.

This isn't fake news but it is incorrect. Linguistic experts were involved but
it doesn't go much deeper than that.

