| ||Ask HN: Algorithms for text fingerprinting?|
92 points by vixsomnis on June 15, 2015 | hide | past | favorite | 41 comments |
|I remember reading an article a year or so ago about (the NSA) identifying users based on how they write: vocabulary, spelling mistakes, grammar, dialect, and so on.|
This is interesting to me because it is extremely difficult to change the vocabulary I use in writing and speaking. Being able to estimate the amount of similarity between two pieces of text would be useful.
The closest I can think of right now would be the proprietary algorithms used to check for plagiarism (for schools and universities, for instance).
Are there any publicly available algorithms for this? Where can I go to learn more? (Academic journals?) Am I just DDGing the wrong search terms?
| Apply to YC