Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Person Identification by Keyboard Typing (darksi.de)
202 points by indutny on May 11, 2019 | hide | past | favorite | 80 comments



Sometimes it helps to know the jargon around this topic. If you look for "keystroke dynamics" several articles and github repos turn up.

Experienced morse-code interpreters used to be able to recognize who was at the other end that day by their typical intervals and mistakes.


There are also related fields, e.g. identify people based on what they type (not how they type it).

https://stackoverflow.com/questions/4771293/can-an-authors-u...


I'll investigate it. Thank you!


Here[1] is your anonymization tool: kloak - Keystroke-level online anonymization kernel.

[1]: https://github.com/vmonaco/kloak


So my keyboard obfuscation usb tool on Kickstarter going to be hit for privacy lovers.


Thx for the link, do you know if it works?


Authenticating users via typing isn't new. In fact it was already available more than 12 years ago. Unfortunately the original source isn't available anymore, but Bruce Schneier covered it back then: https://www.schneier.com/blog/archives/2005/11/authenticatin...

I also did find a paper in German from the company which tried to commercialize it in the following years: http://www.horst-goertz.de/hgs-wordpress/wp-content/uploads/...


British intelligence "authenticated" Russian embassy staff somewhere in the midst of the cold war, and they where using typewriters. So it's way older than 12 years ago. Also: since the authors of this gimmick tool want you to train in by means of typing specifics it implies they made it overcomplicated.


There are multiple companies doing that. BioCatch as a random example. But it does not, for very inherent reasons, work in browser environments. I did not test this classifier, but it'll fail once you switch from chrome to FF.


It’s older than that, Michael Crichton wrote about and posted source code in an Applesoft Basic magazine in early 80s.


This is interesting. I think the weird underscore between words (which indicates space) is throwing me off though. I'd recommend removing that.


The underscore definitely [feels like it] makes me miss most of the commas.


Good point! I've added a small note to explain it. Will re-work the thing later.


This is fixed now.


I’ve seen something similar for a mobile app:

https://www.onenigma.com/case-study/typingID

A friend built it a few years ago. It’s integrated into at least one app, and has some serious potential. I suspect a regular keyboard would be the same.


Part of the streaming anomaly detection benchmark dataset NAB has keyboard typing data (amongst other interesting sources) [1]. Comparing the algorithms would be quite interesting; the algorithm in [2] is unsupervised and doesn't need training data. The benchmark dataset and algorithms are open-source.

https://github.com/numenta/NAB/tree/master/data#real-data

Ahmad & Lavin, Neurocomputing 2016, https://www.sciencedirect.com/science/article/pii/S092523121...


Will check it out, thank you!


Cool stuff! My name kept showing up along with a few others, but I wonder it that is due to insufficient users in my typing speed range. I'll try again later.

Please make commas more visible. The upturned underscore obscures it.


Interesting concept. I did 62 sentences and got "⭐️⭐️⭐️⭐️⭐️" as a rating but none of my last 5 are matched up. I figured it would be pretty easy to pick me out as I use http://mkweb.bcgsc.ca/carpalx/?full_optimization as my main layout and thought that would make my key distances very unique.

On a usability front I echo that the "␣" is definitely confusing, especially next to a ",".


Very interesting! I don't think that I ever had a sample from a non-standard keyboard layout in the dataset.

While it might sound like this should make your samples very different from others, it could actually act the other way and confuse the network. Hopefully, this will be improved in the next version of it, which I'll start training right after this data collection sprint.


I popped a few more in until I hit the maximum 91 stored (either that or it broke). Thanks for fixing the way the space character is displayed.

You could use https://developer.mozilla.org/en-US/docs/Web/API/KeyboardEve... to take advantage of keyboard layout information to more easily classify a user. E.g. if a German user is typing in English their key will register as "KeyY" but their key will register as "z". Even if the neural net just picks up on "non qwerty = weird" I think it would significantly help in it's ability to classify these corner case users.

I look forward to seeing how future versions fare!


What about those corner case users that use a non-qwerty layout but the layout is in hardware (or keyboard firmware) rather than software?


Thank you for taking it to all 5 stars rating. That helps a lot!


While typing, a GitHub username that contains my surname showed up. That isn't all that unlikely -- my surname is among the top 50 in the US -- but it makes me wonder to what extent genetics determine similarity in typing patterns. Maybe I can use this to find some long-lost relatives. ;)


Similar but I saw two friends show up very early on (and then keep showing up). Maybe people self-select friends based on typing style?


I made something like this (without the neural networks) just recording the typing cadance of users entering their password. If the cadance matches within a tollerance then it can be used as a crude indicator of identity, along with other factors like IP, geolocation, browser fingerprint etc.


Funnily enough, I had a classmate in my grad neural networks class that did exactly this. Results on his end seemed to be pretty similar to yours though I don't think he publicly hosted it anywhere

Regardless, this is an interesting application of NNs


Would they be interested in collaborating with me on this project? My email address is listed on my GitHub profile: https://github.com/indutny/

Thank you!


He very well might be interested. I'll shoot him a message and let you know if he is


I have often wondered if it would be possible to make a simple game of typing 3 or 4 digit sequences (similar to the dactylo typing games like "type the words before they fall to the bottom of the screeen"), and then find out if PIN code subsequences or digit transitions have a distinct timing pattern. If possible it would be very creapy if it would imply browser based games, or a keylogger, or your employer could extract your pincode from enough typing material (say in some spreadsheet)...


One interesting issue that I encountered is with the way the US International layout works. Normally when I type ', that is a dead key for accenting characters; the thing is, ' then a character that can't be accented with it (say t, or s), yields me with 't or 's, whereas when I type that way here I have to explicitly do '␣ to yield '. I tried it on my own with the KeyEvent API and got the same problem, not sure what the fix is but it sure messes my writing up.


Oh gosh! This is very unexpected. I'm considering removing apostrophes, so this might help.


I'm incredibly impressed that you implemented the whole LSTM model in JavaScript [1] as well as Python. This indirectly gave me a lot of implementation insight, so thanks!

[1]: https://github.com/indutny/gradtype/blob/master/src/model.js


It is easier than exporting model to TF.js . I wish the tooling was better, but there is way too much custom TF code in gradtype to export it easily.


This seems worse than browser fingerprinting since it identifies people, not browsers.

Does the Tor browser contain a way to combat this?


They'd need to detect and store your keystrokes. If you have JavaScript enabled that's trivial, but hopefully you have JavaScript disabled.


After entering the 20 sentences, getting "sorry the server is down trying later" is the worse user experience ever. If the server is down, please do something before someone enters 20 sentences... or have a "try again" button to submit again...


Sorry to hear that! The web page sends each sentence to the server once you type it, but it looks like something has gone wrong.

I'll check server logs to identify the problem. Sorry again!


I remember someone did this using a smart phone sitting on the same desk as the keyboard as the sensor (and maybe the processor, I forget), but the idea was to recreate what they were typing rather than biometric identification.


The _ and the , are a deal breaker for me, especially when following each other. That's not readable. Ain't copying 30 sentences like that.


Sorry, should be fixed now!


much better thanks.

so I wrote a lot of sentences and I don't understand what is supposed to happen. Do I need to register for the thing to work? I don't want to.


I'm afraid that without registering there is not much to happen, sorry. The whole point is that given past samples it can identify new previously unseen samples. The identification is the crucial part of the demo.


FWIW i typed in all 20 sentences and got a "server error" at or below where the sentences appear. Using chrome on windows.


Broken on apostrophe.

`don't` got stuck on the `'` char.


That's very strange, sorry! What browser/keyboard layout are you using?


My college had some computer science folks that gave out $10 cash per student to type a page of text for obtaining training data.


Out of curiosity, did you go to Iowa State? I helped run the study, and there were very good results.

Unfortunately we ran into some mistakes in the communication between DARPA and the university which cut things the project short.

We were also working on mobile identification which included the timing between letter combinations (2-grams if memory serves right) as well as accelerometer data.


I'm intrigued by this. Most of my mobile typing is done with the Google keyboard swipe gesture thing. Is there a way to identify a user by this, since the words are entered in the form field one at a time? Also, accelerometer data as in the way the use holds the phone? I find it interesting because for the past week or so I've been trying to fix my bad posture, and as a result the way I hold my phone has changed drastically. I wonder how this type of system deals with situations like that.


I did not. I actually went to college in the south. I wonder how many participated?


We had a couple thousand people participate if I remember right. It was a moderately high percentage of the student body. This was 2014 era.


Interesting. Judging by the lines, we probably had a similar amount.

I wonder how many schools setup something similar? At least two I guess :).


This very reasonable sum for a project with funding.

The GradType was creating using my personal funds. Any help is greatly appreciated!


got about 60 sentences in, and my name popped up here and there. my name's first occurrence was at sentence ~40.

Pretty neat.


Could someone please explain how this works. I’m familiar with runs but not sure how this is set up.



This is quite interesting. I’m qurious whether it can be used for “forgotten password”... maybe not


I think it has the potential to be used as a 2FA. However, for this to work - much more training data is required.


Maybe better as locking trigger. When a system detect the previously authenticated user is no longer the active user it could lock keychains, tokens and sudo timers. This is less intrusive compared to locking a complete system every time you take a step out of the room.


It doesn't seem like using it for automatic 2FA would be practical. For one thing, it wouldn't allow for password managers, and if the environment changes significantly it wouldn't match up (on mobile, using an on-screen or touch keyboard, finger pecking on a laptop while eating at the dinner table, etc.)


I imagined that it would just ask to enter a pre-defined phrase along with the password.


it used in Coursera to detect students (e.g., make sure that the assignment is submitted by the authorized user)


I wonder how it will react to random input, I mean me just typing non sense



Are you planning to open source the dataset generated?


Most of the collected data is published to github repo: https://github.com/indutny/gradtype/tree/master/datasets


Of course, but with masked identifiers.


I heard the NSA has been doing this for nearly a decade.


NSA,CIA,TLA have done it since the age of typewriters would be more accurate.

I read about a type of snooping attack already in 2005. https://www.schneier.com/blog/archives/2005/09/snooping_on_t...


Creepy. Why on Earth would you build something like this?

Amazing how people in tech are so flippant about building tools that are almost exclusively useful for tyrants.


I really don't think that this is a fair appraisal of this. It's a really novel look into how our habits identify us and the creation of it, in itself, doesn't reflect any mal-intent. If you're really so disheartened by the potential use-cases of typing ID like this, you are now aware of it in the first place! You could set out to build a simple tool that randomly adds noise to your keyboard IO and effectively stops this from working. Past that, this is something that others have almost certainly already figured out (be it big tech or government entities) but, even if they hadn't, creation like this is cannot be considered a wrongful act because it's impossible to know the long-term effects of anything before it exists.


1) You should definitely research who Fedor Indutny is.

2) If you think companies aren't already doing this, you're lying to yourself.

3) This is a harmless experiment made to spotlight how simple it is to identify someone just by their typing. This could be used in a multitude of ways.

4) Stop generalizing "people in tech", and stop assuming everyone builds stuff for no reason. This is a deliberate experiment, made on purpose, for a very good reason.


Pretty sure bank apps do this for fraud prevention. Preventing criminals from stealing your money is pretty altruistic


I don't think I've ever had to type a sentence into a bank app, much less 30-40.


You've probably typed a username or password hundreds of times though, or at least a fair portion of their userbase has. "This user normally types their password in 3 seconds but today they did it in 0.2 seconds" is a reasonable way to raise a red (or at least yellow) flag.


I pretty much exclusively use 1pass, so there's not much information there on the bank fill side of things.


No, banks won’t do this, most banks still struggle with basics like storing passwords securely.


Coursera uses this to make sure exams are given by the same person who registers for the course.


Why would you think "people in tech" are some homogenous group to generalize about?


Unfortunately, tyrants often pay well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: