
Jack the Ripper letters written by the same person, forensic linguist concludes - fern12
http://www.manchester.ac.uk/discover/news/jack-the-ripper-letter-mystery/
======
sixhobbits
I did my master's thesis on authorship verification, which is exactly this
problem (deciding if two texts are written by the same author).

I experimented with clustering, SVMs, neural nets, etc for a long time, and
got mainly disappointing results.

Even when "modern methods" give very high confidence scores, the problem is
very messy and complicated, and usually the training data is different enough
from the actual data (in supervised learning scenarios) as to bring the result
into question.

I don't have access to the paper, so can't say much more, but I've seen a lot
of very good-looking results that are in fact questionable.

Still a fascinating problem!

~~~
sundarurfriend
From the abstract[1]:

> After compiling the ‘Jack the Ripper Corpus’ consisting of the 209 letters
> linked to the case, a cluster analysis of the letters is carried out using
> the Jaccard distance of word 2-grams. The quantitative results and the
> discovery of certain shared distinctive lexicogrammatical structures support
> the hypothesis that the two most iconic texts responsible for the creation
> of the persona of Jack the Ripper were written by the same person.

So it seems to be a combination of cluster analysis and manual linguistic
reading of lexicogrammatical structures. It would be interesting if these
studies were done as blind studies, with another letter from the period (or a
convincing known fake) was also used as control. Do they find those shared
distinctive lexicogrammatical structures in the control too, because they're
looking for them now? Does the cluster analysis give significantly higher
confidence to grouping the possibly-real letters, compared to grouping one
possibly-real letter with the known fake?

Those would make this more interesting and less questionable, especially for a
first study into this.

[1] [https://academic.oup.com/dsh/advance-article-
abstract/doi/10...](https://academic.oup.com/dsh/advance-article-
abstract/doi/10.1093/llc/fqx065/4824843)

------
fnord123
Is this the same discipline that thinks Columbus was from Northeastern Spain?

[https://en.wikipedia.org/wiki/Origin_theories_of_Christopher...](https://en.wikipedia.org/wiki/Origin_theories_of_Christopher_Columbus#Catalan_hypothesis)

Toilet paper.

~~~
taneq
Andrew Wakefield, the man who published the fraudulent "vaccines cause autism"
paper, was a medical researcher. Do we then discount all other medical
researchers?

~~~
fnord123
There are absolutely huge problems in medical research regarding statistical
strength and p-hacking. But, I get your point.

In this case, copy cats of Jack the Ripper were trying to emulate his (her?)
ideolect. So one expects the entire exercise to be hopeless and the conclusion
holds no predictive value.

~~~
msl
> In this case, copy cats of Jack the Ripper were trying to emulate his (her?)
> ideolect. So one expects the entire exercise to be hopeless and the
> conclusion holds no predictive value.

The letters in question were sent before the first letter was published. How
would a copycat know what to emulate before the first letter's publication? It
was also found that the latter letters were not very good at copying the
original author's style. This is all in the article.

