Hacker News new | past | comments | ask | show | jobs | submit login
Guessing the pressed keyboard keys by analyzing the audio from the microphone (github.com)
194 points by Osiris30 9 months ago | hide | past | web | favorite | 33 comments



Attempted to train it by typing for ~2 minutes. Basically typed everything on the GitHub page and tried getting predictions. Results were disappointing. I didn't see any accurate predictions. Even with the default p/q program I see mostly random results.

I tested on a 15" Macbook Pro 2018 (latest version of keyboard that is softer to type of / less noisy)


Sounds like the author has the same experience: if the keyboard is not mechanical, then this library doesn’t work:

https://github.com/ggerganov/kbd-audio/issues/3


Correct. I just made a short video to demonstrate what a working setup looks/sounds like:

https://www.youtube.com/watch?v=2OjzI9m7W10


Macbooks usually have multiple mics for noise filtering. That could also be an issue.


I've tested this on my Filco Majestouch 2 with MX Browns but it barely managed to detect anything, and even then it didn't do very well. I think this is also highly dependent on the microphone used (I used the one in my Logitech webcam).


Still, I suspect that certain words, when quickly typed, have specific sounds.


The p/q test fails on my Macbook Pro 2013 as well. Plugging an external mechanical keyboard makes it work, so it's definitely due to the soft type of keyboard and not the mic.


I've always thought Twitch streamers were opening up an attack vector through this exact method.

Cool to see someone follow through with it. Any streamers out there should figure out a way of avoiding keypress bleed or muting their mics when typing sensitive info (e.g. passwords).


Don’t Skype & Type! Acoustic Eavesdropping in Voice-Over-IP

https://www.math.unipd.it/~dlain/papers/2017-skype.pdf


Yes, countermeasures, such as playing other sounds around those frequencies and/or filtering the microphone at those frequencies would be interesting to explore. Filtering the mic seems the less annoying from a user perspective!


Most streamers, myself included, use a filter called a noise gate, which requires the volume of the mic reach a certain threshold before being broadcast. This filters out the majority of background noise on the stream.


Perhaps accidentally, a yubikey or something similar would avoid this attack.


Now determine the keypresses by filming the vibrations of a potato chip bag[0]

[0]http://news.mit.edu/2014/algorithm-recovers-speech-from-vibr...


They manage to do it (just sound, not keypresses) with a 60fps DSLR camera at the end by examining the rows of the video - does this mean that for any videos already in existence, sound can be decoded from the images?


For any old 60fps homemovie of your potato chips: maybe.


Not for handheld videos.


Wouldn't you be able to filter out the camera motion, as long as the movement due to sound was fixed relative to something in frame?


I would say the motion blur and the rolling shutter effect would overpower the subtle vibrations due to audio.


Rolling shutter is exactly why this works.

Its not a complicated paper, give it a go.


If someone hunts-and-pecks how close can you get with gaze tracking?


Could you maybe do away with training beforehand on a single source and instead use multiple, very sensitive microphones and triangulate the locations of the keys being pressed? The estimations might not be accurate, but you could put the results through a smartphone typo correction algorithm.


This is super cool. Very similar to timing attacks. I wonder if there is a way to tune the model. Essentially cater the probability of each key to a person with maybe a sentence or something.


Yeah, this is exactly the sort of thing where augmenting the base model with a language model will have some good returns...


In keytap2 I'm trying to make use of the statistical distribution of n-grams in the language. The idea is to first group the unknown keys into clusters based on how similar they sound. The prediction then is performed by breaking the obtained substitution cypher (assuming each cluster corresponds to a letter).


There is more research on this than what you have cited here, unfortunately the literature goes under the catchy keyword "acoustic keyboard emanations", try that in Google scholar.


Can you combine this with or does it use relative positioning of the microphone? Seems like a good way to map where keys are (measure lower decibels which would be keys further from the mic).


Primary use case is more tracking and surveillance, I presume?


For the technology itself, sure.

For this particular open-source hobbyist project, not likely; those who're most likely to attempt to use such a tactic maliciously are likely to keep it to themselves rather than demonstrate that the security concern exists.


Primary usecase is hacking. A keylogger without any software or trace on the target computer.


This is another reason why you want 2FA.


Someone has seen the movie sneakers! The blind guy did this in that movie.


In what scene? I don't think he actually did... I'm pretty sure when they're trying to crack the guy's password they're using video to record and zoom in to watch him and he's blocking it.


definitely keep buying mechanical keyboards....




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: