
Show HN: Person Identification by Keyboard Typing - indutny
https://gradtype.darksi.de/
======
javabean_
Sometimes it helps to know the jargon around this topic. If you look for
"keystroke dynamics" several articles and github repos turn up.

Experienced morse-code interpreters used to be able to recognize who was at
the other end that day by their typical intervals and mistakes.

~~~
amelius
There are also related fields, e.g. identify people based on _what_ they type
(not _how_ they type it).

[https://stackoverflow.com/questions/4771293/can-an-
authors-u...](https://stackoverflow.com/questions/4771293/can-an-authors-
unique-literary-style-be-used-to-identify-him-her-as-the-autho)

------
erenhatirnaz
Here[1] is your anonymization tool: kloak - Keystroke-level online
anonymization kernel.

[1]: [https://github.com/vmonaco/kloak](https://github.com/vmonaco/kloak)

~~~
iamgopal
So my keyboard obfuscation usb tool on Kickstarter going to be hit for privacy
lovers.

------
Dunedan
Authenticating users via typing isn't new. In fact it was already available
more than 12 years ago. Unfortunately the original source isn't available
anymore, but Bruce Schneier covered it back then:
[https://www.schneier.com/blog/archives/2005/11/authenticatin...](https://www.schneier.com/blog/archives/2005/11/authenticating.html)

I also did find a paper in German from the company which tried to
commercialize it in the following years: [http://www.horst-goertz.de/hgs-
wordpress/wp-content/uploads/...](http://www.horst-goertz.de/hgs-wordpress/wp-
content/uploads/2013/09/3_Preis_2008_2.pdf)

~~~
coretx
British intelligence "authenticated" Russian embassy staff somewhere in the
midst of the cold war, and they where using typewriters. So it's way older
than 12 years ago. Also: since the authors of this gimmick tool want you to
train in by means of typing specifics it implies they made it overcomplicated.

------
jeremya
This is interesting. I think the weird underscore between words (which
indicates space) is throwing me off though. I'd recommend removing that.

~~~
Willox
The underscore definitely [feels like it] makes me miss most of the commas.

------
lettergram
I’ve seen something similar for a mobile app:

[https://www.onenigma.com/case-study/typingID](https://www.onenigma.com/case-
study/typingID)

A friend built it a few years ago. It’s integrated into at least one app, and
has some serious potential. I suspect a regular keyboard would be the same.

------
boltzmannbrain
Part of the streaming anomaly detection benchmark dataset NAB has keyboard
typing data (amongst other interesting sources) [1]. Comparing the algorithms
would be quite interesting; the algorithm in [2] is unsupervised and doesn't
need training data. The benchmark dataset and algorithms are open-source.

[https://github.com/numenta/NAB/tree/master/data#real-
data](https://github.com/numenta/NAB/tree/master/data#real-data)

Ahmad & Lavin, Neurocomputing 2016,
[https://www.sciencedirect.com/science/article/pii/S092523121...](https://www.sciencedirect.com/science/article/pii/S0925231217309864?via%3Dihub)

~~~
indutny
Will check it out, thank you!

------
narcindin
Cool stuff! My name kept showing up along with a few others, but I wonder it
that is due to insufficient users in my typing speed range. I'll try again
later.

Please make commas more visible. The upturned underscore obscures it.

------
zamadatix
Interesting concept. I did 62 sentences and got "⭐️⭐️⭐️⭐️⭐️" as a rating but
none of my last 5 are matched up. I figured it would be pretty easy to pick me
out as I use
[http://mkweb.bcgsc.ca/carpalx/?full_optimization](http://mkweb.bcgsc.ca/carpalx/?full_optimization)
as my main layout and thought that would make my key distances very unique.

On a usability front I echo that the "␣" is definitely confusing, especially
next to a ",".

~~~
indutny
Very interesting! I don't think that I ever had a sample from a non-standard
keyboard layout in the dataset.

While it might sound like this should make your samples very different from
others, it could actually act the other way and confuse the network.
Hopefully, this will be improved in the next version of it, which I'll start
training right after this data collection sprint.

~~~
zamadatix
I popped a few more in until I hit the maximum 91 stored (either that or it
broke). Thanks for fixing the way the space character is displayed.

You could use [https://developer.mozilla.org/en-
US/docs/Web/API/KeyboardEve...](https://developer.mozilla.org/en-
US/docs/Web/API/KeyboardEvent/code) to take advantage of keyboard layout
information to more easily classify a user. E.g. if a German user is typing in
English their key will register as "KeyY" but their key will register as "z".
Even if the neural net just picks up on "non qwerty = weird" I think it would
significantly help in it's ability to classify these corner case users.

I look forward to seeing how future versions fare!

~~~
clarry
What about those corner case users that use a non-qwerty layout but the layout
is in hardware (or keyboard firmware) rather than software?

------
cookiecaper
While typing, a GitHub username that contains my surname showed up. That isn't
all _that_ unlikely -- my surname is among the top 50 in the US -- but it
makes me wonder to what extent genetics determine similarity in typing
patterns. Maybe I can use this to find some long-lost relatives. ;)

~~~
nv-vn
Similar but I saw two friends show up very early on (and then keep showing
up). Maybe people self-select friends based on typing style?

------
beaker52
I made something like this (without the neural networks) just recording the
typing cadance of users entering their password. If the cadance matches within
a tollerance then it can be used as a crude indicator of identity, along with
other factors like IP, geolocation, browser fingerprint etc.

------
falkenb0t
Funnily enough, I had a classmate in my grad neural networks class that did
exactly this. Results on his end seemed to be pretty similar to yours though I
don't think he publicly hosted it anywhere

Regardless, this is an interesting application of NNs

~~~
indutny
Would they be interested in collaborating with me on this project? My email
address is listed on my GitHub profile:
[https://github.com/indutny/](https://github.com/indutny/)

Thank you!

~~~
falkenb0t
He very well might be interested. I'll shoot him a message and let you know if
he is

------
DoctorOetker
I have often wondered if it would be possible to make a simple game of typing
3 or 4 digit sequences (similar to the dactylo typing games like "type the
words before they fall to the bottom of the screeen"), and then find out if
PIN code subsequences or digit transitions have a distinct timing pattern. If
possible it would be very creapy if it would imply browser based games, or a
keylogger, or your employer could extract your pincode from enough typing
material (say in some spreadsheet)...

------
grenoire
One interesting issue that I encountered is with the way the US International
layout works. Normally when I type ', that is a dead key for accenting
characters; the thing is, ' then a character that can't be accented with it
(say t, or s), yields me with 't or 's, whereas when I type that way here I
have to explicitly do '␣ to yield '. I tried it on my own with the KeyEvent
API and got the same problem, not sure what the fix is but it sure messes my
writing up.

~~~
indutny
Oh gosh! This is very unexpected. I'm considering removing apostrophes, so
this might help.

------
emsal
I'm incredibly impressed that you implemented the whole LSTM model in
JavaScript [1] as well as Python. This indirectly gave me a lot of
implementation insight, so thanks!

[1]:
[https://github.com/indutny/gradtype/blob/master/src/model.js](https://github.com/indutny/gradtype/blob/master/src/model.js)

~~~
indutny
It is easier than exporting model to TF.js . I wish the tooling was better,
but there is way too much custom TF code in gradtype to export it easily.

------
amelius
This seems worse than browser fingerprinting since it identifies people, not
browsers.

Does the Tor browser contain a way to combat this?

~~~
verbify
They'd need to detect and store your keystrokes. If you have JavaScript
enabled that's trivial, but hopefully you have JavaScript disabled.

------
daok
After entering the 20 sentences, getting "sorry the server is down trying
later" is the worse user experience ever. If the server is down, please do
something before someone enters 20 sentences... or have a "try again" button
to submit again...

~~~
indutny
Sorry to hear that! The web page sends each sentence to the server once you
type it, but it looks like something has gone wrong.

I'll check server logs to identify the problem. Sorry again!

------
carapace
I remember someone did this using a smart phone sitting on the same desk as
the keyboard as the sensor (and maybe the processor, I forget), but the idea
was to recreate what they were typing rather than biometric identification.

------
user5994461
The _ and the , are a deal breaker for me, especially when following each
other. That's not readable. Ain't copying 30 sentences like that.

~~~
indutny
Sorry, should be fixed now!

~~~
user5994461
much better thanks.

so I wrote a lot of sentences and I don't understand what is supposed to
happen. Do I need to register for the thing to work? I don't want to.

~~~
indutny
I'm afraid that without registering there is not much to happen, sorry. The
whole point is that given past samples it can identify new previously unseen
samples. The identification is the crucial part of the demo.

------
new4thaccount
My college had some computer science folks that gave out $10 cash per student
to type a page of text for obtaining training data.

~~~
froindt
Out of curiosity, did you go to Iowa State? I helped run the study, and there
were very good results.

Unfortunately we ran into some mistakes in the communication between DARPA and
the university which cut things the project short.

We were also working on mobile identification which included the timing
between letter combinations (2-grams if memory serves right) as well as
accelerometer data.

~~~
new4thaccount
I did not. I actually went to college in the south. I wonder how many
participated?

~~~
froindt
We had a couple thousand people participate if I remember right. It was a
moderately high percentage of the student body. This was 2014 era.

~~~
new4thaccount
Interesting. Judging by the lines, we probably had a similar amount.

I wonder how many schools setup something similar? At least two I guess :).

------
sergiotapia
Broken on apostrophe.

`don't` got stuck on the `'` char.

~~~
indutny
That's very strange, sorry! What browser/keyboard layout are you using?

------
fc373745
got about 60 sentences in, and my name popped up here and there. my name's
first occurrence was at sentence ~40.

Pretty neat.

------
peppershaker
Could someone please explain how this works. I’m familiar with runs but not
sure how this is set up.

~~~
indutny
This is an ML model:
[https://github.com/indutny/gradtype/blob/master/src/model.js...](https://github.com/indutny/gradtype/blob/master/src/model.js#L133-L147)

It is trained using (almost) a triplet-loss:
[http://openaccess.thecvf.com/content_cvpr_2016/papers/Cheng_...](http://openaccess.thecvf.com/content_cvpr_2016/papers/Cheng_Person_Re-
Identification_by_CVPR_2016_paper.pdf)

------
stanislavb
This is quite interesting. I’m qurious whether it can be used for “forgotten
password”... maybe not

~~~
indutny
I think it has the potential to be used as a 2FA. However, for this to work -
much more training data is required.

~~~
frosted-flakes
It doesn't seem like using it for automatic 2FA would be practical. For one
thing, it wouldn't allow for password managers, and if the environment changes
significantly it wouldn't match up (on mobile, using an on-screen or touch
keyboard, finger pecking on a laptop while eating at the dinner table, etc.)

~~~
indutny
I imagined that it would just ask to enter a pre-defined phrase along with the
password.

------
HNLurker2
I wonder how it will react to random input, I mean me just typing non sense

------
hislaziness
one more demo of this I saw \- [https://vikasdesai.github.io/keystroke-
dynamics/](https://vikasdesai.github.io/keystroke-dynamics/)

------
pancho111203
Are you planning to open source the dataset generated?

~~~
indutny
Most of the collected data is published to github repo:
[https://github.com/indutny/gradtype/tree/master/datasets](https://github.com/indutny/gradtype/tree/master/datasets)

------
helsinki
I heard the NSA has been doing this for nearly a decade.

~~~
draugadrotten
NSA,CIA,TLA have done it since the age of typewriters would be more accurate.

I read about a type of snooping attack already in 2005.
[https://www.schneier.com/blog/archives/2005/09/snooping_on_t...](https://www.schneier.com/blog/archives/2005/09/snooping_on_tex.html)

------
ARandomerDude
Creepy. Why on Earth would you build something like this?

Amazing how people in tech are so flippant about building tools that are
almost exclusively useful for tyrants.

~~~
wsgreen
Pretty sure bank apps do this for fraud prevention. Preventing criminals from
stealing your money is pretty altruistic

~~~
loeg
I don't think I've ever had to type a sentence into a bank app, much less
30-40.

~~~
rococode
You've probably typed a username or password hundreds of times though, or at
least a fair portion of their userbase has. "This user normally types their
password in 3 seconds but today they did it in 0.2 seconds" is a reasonable
way to raise a red (or at least yellow) flag.

~~~
loeg
I pretty much exclusively use 1pass, so there's not much information there on
the bank fill side of things.

