So they generated training data from one laptop and microphone then generated test data with the exact same laptop and microphone in the same setup, possibly one person pressing the keys too. For the Zoom model they trained a new model with data gathered from Zoom. They call it a practical side channel attack but they didnt do anything to see if this approach could generalize at all
I believe that is the generalisable version of the attack. You're not looking to learn the sound of arbitrary keyboards with this attack, rather you're looking to learn the sound of specific targets.
For example, a Twitch streamer enters responses into their stream-chat with a live mic. Later, the streamer enters their Twitch password. Someone employing this technique could reasonably be able to learn the audio from the first scenario, and apply the findings in the second scenario.
Finally, a real security weakness to cite when making fun of people for their mechanical keyboard. Time to start recording the audio of Zoom calls with some particularly loud typers...
I used to work in an office space with an independent contractor whose schtick was that he was a genius. The affectations around his genius-ness included casually bringing up Mensa meetings, dropping magazines like Foreign Affairs and academic journals around the office, and his fucking keyboard.
The keyboard had custom switches that were very loud. And he typed fast - it was like living on a gun range. Everyone in the office probably would have chipped in for a hitman, but alas, the CTO, whose office had a solid door, was “inspired” that the mechanical feedback helped fuel inspiration in boy wonder.
Had we thought of the security risks of the keyboard, I would have brought good scotch to the infosec dude while expressing my concerns.
Somewhat tangential: clicky switches, like Cherry Blues, tend to click twice for each stroke. I think this leads to people assuming there are twice as many strokes going on. Tactile switches tend to only click once (when they bottom out). So, fancy keyboards can make people sound faster than they are.
Mechanical keyboard user here. Most of us use mechanical keyboards because they're a lot more fun to type on. That's it. Because if you're not having fun, what's the point?
Obviously the comment discusses a shared space. If you have your own room you can let your fart rips and sniff them for fun, pull out your dick and piss in a bottle for fun, clank on your loud toys for fun, all the things you should never do with other people around that you might find fun for whatever reason. No one cares. But don't do these things to other people around you, it's anti-social.
But isn't one of the reasons for using mechanical switches to be able to not bottom out, hence avoiding the repetitive shocks on the fingers? This is what I do with my tactile keyboards, and I'm actually quieter when I type quickly than my colleagues who bottom out on their cheap hollow HP keyboards like no tomorrow.
Is it? I've had a few mechanical keyboards, and follow some of those webpages devoted to different switches etc (not obsessively though, once in a blue moon), and I don't recall seeing "bottoming out" and "shocks" as any major benefit mentioned.
I also remember typewriters and old IBM style mechanical keyboards beeing quite heavy to activate, subjectively needing more pressure than some chiclet style "shock" (which I can barely feel).
Microphones are surprisingly sensitive. I can listen to music in my closed-back headset at a regular volume. My desk mic can pick this up. Without boosting the audio it's barely audible that there's music, but after adding some gain you get almost the full song profile (and background noise).
I can even pick out some of my breathing from the recording.
If I turn on noise suppression and noise gate it's fine.
I was two rooms away from someone playing music on a smart Google device. I could very barely hear that music was playing at all and only just barely made out it was a song I had been interested in but kept missing. I pulled out my S22+ and used Shazam. somehow it was able to pick it up easily.
My mechanical keyboard already has a knob that I've configured to control the system audio volume, all that's left is configuring Linux to play an audio recording of a keypress every time I press a key...
> all that's left is configuring Linux to play an audio recording of a keypress every time I press a key
I unironically think I've seen that config recently - someone had an actually quiet keyboard but wanted the full Mechanical Keyboard Effect™ so they just... have it play the sound per keypress. (It was not 100% clear to me whether it was an elaborate joke or a real aesthetic choice)
The Kinesis Advantage2 and the Moonlander have a piezo speaker to give keystroke sounds. However, they are not for, as you might expect to give the full Mechanical Keyboard Effect™.
If you have mechanical switches, you want to learn to type just past the actuation point and not until the switch bottoms out. This is relatively easy with tactile switches (the have a bump and the actuation point is immediately after the bump). However in linear switches, you don't feel when you have hit the actuation point. So the piezo speaker can be used during the first weeks to train your muscle memory of where the actuation point is, so that you can type lightly.
I had this on my Kinesis Advantage with Cherry Reds, and it was really nice during the initial days/weeks, after which I turned it off.
When conducting coding interviews remotely I often switch from my mechanical keyboard to my laptop keyboard (for taking notes) because I know how annoying/distracting that sound can be on calls. Suffice it to say, having a gain knob on my mechanical keyboard would be wonderful.
I've wanted to integrate a cap gun into a keyboard, basically a an old fashioned roll of paper caps and solenoid to whack 'em, triggered by exclamation points.
Some old IBM keyboards (beamsprings, the predecessor to the Model F, which preceded the Model M) had solenoids inside to make them louder and sound more like typewriters. I wonder if such a setup would defeat this attack, or if it would still be possible to discern the actual keypress alongside the solenoid.
Not just limited to old IBM keyboards! The new reproduction Model F keyboards also have a solenoid option! It's fantastically loud with it banging on the solid metal case along with the buckling springs. Great keyboards in general.
I'm guessing it would be easier (assuming you trained it on that keyboard), because each solenoid would be fairly unique due to manufacturing tolerances. Just my gut feeling, I have no data to back it up.
I know nothing about this keyboard, but I'd assume it just has one solenoid because the expense and space of 100+ solenoids is impractical if all you're using them for is simulating the vibration/sound of a typewriter.
I wish I could delete my comment to hide my stupidity. For some reason I was thinking about springs despite reading and typing solenoid. You are of course 100% correct and unfortunately it's too late for me to hide my shame.
"Just need to type in my password." He says a little too loudly to nobody. Then just type in the honeypot password and login with the real one that you entered with a virtual keyboard a few minutes ago.
Meanwhile you've got a prerecorded keyboard going concurrently that decodes to "I know what you're trying to do. Clever but not clever enough."
And I guess you might as well have a special keyboard that you only use for typing in passwords while you're at it.
It’s so fascinating to watch this play out live. Once again, an ambitious kid can implement software hacks that are very funny when used for a joke, but also have massive real-world implications.
A nice thing about master passwords though is that since you don't have to type them in as often, they can be very long. 95% accuracy probably isn't good enough to reliably reproduce a sentence-length master password, at least if it's only captured once.
The master password is also offline and require the key file to u lock the rest of the passwords. So by itself it’s not enough to compromise the accounts in the key file. The attacker would need the key file as well.
Ij on-tep of sentenca lentg, it's alio sentemce-bused ("corvect harse batterg stapfe") then ut would be quiti eady to guess even wits worse accurasy.
(If on-top of sentence lenth, it's also sentence-based ("correct horse battery staple") then it would be quite easy to guess even with worse accuracy.)
95% accuracy means for each stroke, the most likely key is the top choice. Most models return a probability distribution per key, and it's very like the other keys are in the top 2 or 3.
Then you simply have the password cracker start trying passwords ordered by probability, and I bet it breaks your sentence within very few tries.
95% means that on average only 1 in 20 keystroke will be wrong. Even if your password is very long (40-60) that means only 2-3 errors. Since more people are not machines their long password will be a combination of words like the famous "horsestaplebatterycorrect" example from xkcd.
Even if you flip a few letters from something like the above a human attacker will easily be able to fix it manually.
"horswstaplevatterucorrect" for example is still intelligible.
On average 2-3 errors. However the real thing we want to look at is what is my chance of guessing right across ALL characters. For 1 it's 95%, for 2 it's 90.2%, and it gets worse from there. The formula for accuracy would be .95^c where c is the number of characters in the password. So the chance of getting EVERY key correct in a 40 character password is < 13% and < 5% for 60 characters.
Right. The comment above is saying even if you are incorrect in 2-5 keystrokes it’s not hard to guess the correct keystrokes if you’re using a sentence style password.
I don't use one but I know people who swear by them.
Also this is an extremely obvious result. Typing is obviously a form of "penmanship", it was well known that telegraph operators could identify each other by how they tapped out Morse code in the 1800s.
People have been able to do this based upon key stroke latency and even identify people based on habitual mouse patterns for decades.
Audio recordings work as yet another reliable proxy? Shocked!!
I am amazed that people can do such obvious things and get published, have articles written on them... I need to get in on that, sounds easy
I can make a web demo. You turn on the microphone type a couple things into a box on the web browser.
Then you go to a different window and continue typing and then the model predicts What you are typing. As long as it's proper grammar you can get to effectively 100% accuracy. It'll appear to be spooky magic.
sounds like a good exercise although it'll literally just be for my own personal amusement. Nobody actually cares about this unless you've got some institutional clout which I do not. Praise for the PhD would be ridicule for you and me.
But really, should be fun ... the laptop dock mic will be great for this. If it's external you're in trouble ... but the researchers just used the onboard so it'll be fine.
1Password allows unlocking with a fingerprint (Touch ID) or Apple Watch, at least on a Mac. So you can unlock your password manager during a Zoom call, and nobody can snoop your master password.
(With 1Password, the master password is not enough to do a remote account takeover, you also need the second-factor key. And you can't snoop it, since it is only required during the first login, so a user will never type it after that.)
1Password requires an extra key upon the first login that you never have to type afterwards. So, have fun trying to log in to that password manager, even if you have the master password.
Also, you can also use and require a hardware FIDO2 token as second factor.
If you have 2FA and one part of it is easily figured out, then you have one factor authentication.
If you cared enough about the authentication in the first place to bother with 2FA, then I guess it seems like the reduction there is still something to be worried about, right?
Lots of “two factor authentication” schemes seem to involve just getting a text or something, so, not very secure at all. Of course, this is bad 2FA, but it is popular.
Now that I know about the existence of this generation of acoustic attacks I would like to have the possibility to insert a second "master password" different from the main one, that instead of letting me directly access to my passwords just allows me to use fingerprint to get them. Guess if it's already possible
I think maybe you wouldn't even need to see the keystrokes. Given enough examples of just audio, I wonder if you could work out the keys using the statistical letter patterns in language.
I think this linited attack surface can work without having to generalize one model to multiple people or keyboards. One advantage of a Zoom attack is that you get “plaintext” shortly after hearing the “ciphertext” if you can get the target to type into the chat window. And when you hear typing in other contexts it’s likely to be something that matches a handful of grammars that an LLM can recognize already (written languages, programming languages, commands, calculation inputs) - and when it doesn’t, that’s probably a password.
Do keystrokes still come through Zoom? The noise filtering has become extremely aggressive lately, often hear people say “Sorry about that engine / ambulance / city noise” but nobody knows what they’re talking about.
How come keyboard sound suppression is not a standard option in all online communication apps? It’s not that hard, keyboard sounds are pretty distinct.
Yeah and in fact, I've heard of this attack being done in the past, but it heavily depends on the typist, the keyboard, etc. Cadence, sound, etc changes with the typist and hardware. This isn't new, and has very few, if any practical applications for wide spread replication.
Asking for “what signal it is detecting” might be better asked from a “what is the greatest signal bearing information” being used… which would help in averting attacks.
This kind of stuff could be real menacing in all sorts of public places like airports, coffee shops and etc.
High security safe locks have had protection against this for a long time: you press up/down arrows to move from a random starting digit to the correct digit.
On screen pin entry with jumbled number mappings does the same thing. It also makes the inter-stroke delay rather independent of position, because the brain has to search the screen (although repeated digits and previously occuring digits are quicker, which is why some jumble at every keystroke).
Keyboards with OLED keys (like the Apple Touchbar or the Optimus[1]) might also work.
I did a similar acoustic side-channel attack as final year project at uni. There's a treasure trove of findings in this area, I'm just waiting for someone to combine methodologies. There are pretty good results using geometric models, trained and untrained statistical models like this and others, and combining these features with assorted language models.
Here's a few random papers I read along the way:
https://doi.org/10.1007/s10207-019-00449-8 - SonarSnoop, which uses a phone's speaker to produce ultrasonic audio that can be used to profile the user's interaction (e.g. entering swipe-based passcodes).
https://people.eecs.berkeley.edu/~daw/papers/ssh-use01.pdf - "Timing Analysis of Keystrokes and Timing Attacks on SSH", a paper from 2001 that uses statistical models of keystroke timings to retrieve passwords from encrypted SSH traffic.
https://doi.org/10.1145/1609956.1609959 - "Keyboard acoustic emanations revisited", which uses hidden Markov models and some other English language features to recover text based on classification via cepstrum features.
https://doi.org/10.1145/2660267.2660296 - "Context-free Attacks Using Keyboard Acoustic Emanations" which uses a geometric approach, using time-difference-of-arrival to estimate physical locations probabilistically.
I'm not clear why people are poo-pooing this as if it's not a big deal. From a security and espionage point of view this is pretty significant - the audio learning has got to the point that a sensitive audio bug can bascially be key logger. There are a ton of context where an audio tap would be much easier to get in place than a traditional network attack (and with modern shotgun mics, might not even require being in the building). That is applicable to much more than just password stealing.
I've always been a bit fascinated by this attack vector and wondered if would get to this point.
I wonder if playing the typing sound constantly could help. Not an abstract sound, but recording of your actual typing on this particular keyboard, mixed to play some realistic-sounding phrases / sequences. It should pause for a split second to let your actual keystrokes mix in. That would be really hard to decipher, or to correlate your typing with whatever other events (time to enter a password).
Better yet, play some white noise around you. I heard that it's actually done sometimes at really important meetings.
If you're not such a VIP, just type important things only on your phone; touch screens don't produce enough sound, hopefully.
Fascinating. I'm really curious what the acoustic properties are that it's recognizing.
Is it more of a physical fingerprint of each key, such that if you swapped keys/springs the model would need to be updated? So it's produced by manufacturing inconsistencies, the way individual typewriters used to be forensically identified?
Or is more each key being identical, but producing a different resonance pattern within the keyboard/laptop due to the shape of all of the matter surrounding it? If you move the keyboard in the room, do you have to re-train the model?
I also wonder how much it varies depending on how hard you press each key -- not at all or a great deal? And what about by keyboard -- when you compare thin MacBook keys with an external full-height keyboard, is one easier/harder to recognize each key on than the other?
Building on what you said: (1) just the key's properties; (2) key properties relative to other keys; (2) sound transmission and environment between key and microphone; (3) relationship between key and finger; (4) relationship between key and associated dendritis
By the way, some (most?) videoconferencing software removes keyboard sounds from the audio, because it's particularly a distracting problem with laptops where the microphone is right next to the keys.
I'm pretty sure Zoom does this by default as part of its noise cancellation (it's potentially even easier since you can use keydown events to help identify, not just the audio stream).
So as long as basic default noise cancellation is on, that would at least prevent this over regular videoconferencing. And because of this, I'm having a hard time thinking of when else this would be a realistic threat, where the attacker wouldn't already have enough physical access to either install a regular keylogger or else a hidden camera.
Teams definitely don't have this, at least not by default, or not by default in our corp. Anytime somebody on the call starts typing you hear it very clearly.
The example figure shows a key hit every half second, which suggests a pecking style of typing at around 24 wpm. This way the model gets very clean waveforms. I wonder how their approach would work with average or fast typists. The sound profiles might be much harder to link to characters.
Even if there was ambiguity, some data is better than none. Given enough training data, I suspect you could find repeatable patterns in standard typists: on a qwerty layout, after typing an "A", "Q" takes 1.2-2.3x as long to type as a "J" kind of pairwise tempo patterns. Anything to reduce the search space from brute-forcing every candidate character.
Even better if the target uses a passphrase, "hXXXse battXXX stXXXXX cXXXXXX" becomes interpretable given a few landmark letter identified with high probability.
In response to this post, I just open sourced a starter project to a variation of this idea: https://github.com/secretlessai/audio-mnist. I've been interested in doing image classification techniques like CNN on audio data for a while.
A couple years ago for a weekend project I made a simple "audio-mnist" dataset from handwritten digit audio recordings. I never got past a few days worth of work, but open-sourcing it has been on my mind for a minute. This post kicked me into action. Getting some more data, basic CNN examples, etc. could provide a nice starting point for a lot of research and tools.
There is still separate code I'd have to find and make intelligible to create the recordings and split the audio.
Anyway, in case anyone finds part of this process interesting or useful.
Some old TV remotes used to work this way. They were made by Zenith and are called Space Command remotes. Apparently they are the reason TV remotes are sometimes called clickers.
I've never considered how odd clicker is for remote but it feels totally natural to me. Like something my parents or grandparents would say. Never thought about where it came from.
Imagine the UX of 1 in 20 characters typed being incorrectly inferred though. The P_failure*Cost impact would strike me as insufferable even if error rate were to improve by an order of magnitude.
Text-to-keystroke-audio where the text comes from the LLM Prompt "fanfiction based on HGTV's Love It or List It starring an Ewok realtor and Klingon interior designer in iambic pentameter".
The goal is to cause the eavesdropper to totally reevaluate their life choices, and maybe even get caught up in the story.
Whereas for practical security, having some common substring in all your passwords that you don't type but insert through some global hotkey would be just fine as a mitigation against eavesdrop attacks.
Yes, that's also obscurity, but obscurity is actually good - it only got a (deservedly) bad reputation from when it gets used as a substitute (but I fail to see how using a nonstandard keyboard layout would even count as obscurity in the context of an audio attack, as the clear text reference would surely go through the same layout?)
Brilliant suggestion. Have a TRNG or a CSPRNG (if too poor for a TRNG) choose the next layout at random for you, ideally with every keystroke. Good luck cracking that!
Some places use touchscreen keypads for PIN entry exactly for this reason: to allow randomization, e.g. for opening a locked door, or for authorizing a transaction.
I’m sure it depends on the application to some extent. I can type my pin in without looking at all, so I can cover it up while doing it. If I had to hunt and peck, it’d easier for an onlooker to observe my slower motions I think.
But if I used the same machine often enough to produce wear specific to me, this randomization would be really useful.
I use a randomized PIN pad on my phone, and I've gotten quite used to it. I can enter my PIN almost as fast as I could on an unscrambled pad; it's definitely not hunting and pecking.
Could be done by using a device with a display - e.g. an "ereader" - to present a random keyboard layout. But, good luck being efficient typing on that. At that point, better use a different input model.
Or, use techniques such as those in the article, such as random keypresses played during the actual ones.
...wait, are you telling me Konami shuffling the touch input for e-Amusement PINs[0] was a good idea!?
[0] Okay... deep breath
Konami is a pachinko manufacturer with a side hustle making rhythm games for Japanese arcades. They have an online service that all their games connect to called e-Amusement. You can log into it using an e-Amusement Pass card, and your card is locked to a PIN number you have to set up when you first use it. Cabinets with touchscreens give you a touch keypad, except all the digits are shuffled around, which is a total pain in the ass and you have to do this for every credit.
Indeed. Let me add that how your fingers come into contact with the keys is probably just as important. I recommend a cryptographically rolling choice of dustballs, crumbs, and boogers.
Because the real data stream would still be there, just mixed with some noise. It feels harder to analyze whether the noise sufficiently obscures the real keystrokes than it does to ensure the actual keystrokes reveal no information.
That's already possible, the lack of battery, but likely impractical.
There is enough energy during key press/release to be usable for sending radio signal, however it won't be sufficient to do it while holding a key. A combination of a solar panel, piezoelectric keys and a tiny li-ion (as backup) may be sufficient for a 'battery-less' keyboard, but it will be too expensive.
In 2005 ACM's CCS Zhuang, Zhou and Tygar presented Keyboard Acoustic Emanations Revisited [1]
We examine the problem of keyboard acoustic emanations. We
present a novel attack taking as input a 10-minute sound recording
of a user typing English text using a keyboard, and then recovering
up to 96% of typed characters. There is no need for a labeled
training recording. Moreover the recognizer bootstrapped this way
can even recognize random text such as passwords: In our experiments,
90% of 5-character random passwords using only letters can
be generated in fewer than 20 attempts by an adversary; 80% of 10-
character passwords can be generated in fewer than 75 attempts.
Our attack uses the statistical constraints of the underlying content,
English language, to reconstruct text from sound recordings
without any labeled training data. The attack uses a combination
of standard machine learning and speech recognition techniques,
including cepstrum features, Hidden Markov Models, linear classification,
and feedback-based incremental learning
which builds up on Asonov & Agrawal's work [2] who came up with the idea the previous year (2004).
We show that PC keyboards, notebook keyboards, telephone
and ATM pads are vulnerable to attacks based on
differentiating the sound emanated by different keys. Our
attack employs a neural network to recognize the key being
pressed. We also investigate why different keys produce
different sounds and provide hints for the design of homophonic
keyboards that would be resistant to this type of attack.
That would certainly solve the password issue. And if a sufficiently paranoid person is aware of this attack vector, they could just manually mute the mic at any time they are typing in any sensitive information. I initially was thinking that using a Dvorak or even better custom layout would help, but upon further reflection I think not -- the first-pass output would be equivalent to a substitution cipher, and quickly solved as such.
This topic has me wondering though if it's possible to detect finger positioning or for that matter screen information from the reflection off the typist's eyeballs/eyeglasses shown in a webcam, or perhaps even if possible in principle, in practice most webcam resolution is simply too poor for that.
Zoom is good at filtering out rather loud background noises. I can't imagine that the sound of background typing during a conversation could be detected by the other party.
What? Zoom (by default with auto mic adjustment) catches everything. Typing on laptop is especially bad as it is closer to the mic than the person speaking (unless there is external mic), so it's like a stampede of rhinos.
It shouldn't. Auto (the default) is designed to filter out keystrokes along with other noises, precisely because typing on the laptop is horrible for the reason you mention.
Keystrokes should only be a problem when noise suppression is set to low/off, which you want to do for e.g. playing music.
But noise suppression is applied to sending audio, not receiving it. So you might need to tell your coworkers to re-enable their noise suppression.
I think an attacker would find that many streamers with high quality audio have properly setup their mics with noise gate filters to remove their relatively quiet keystrokes.
I wonder how hard this problem is. I bet it’s actually not that bad. If I were to guess, A huge part of the problem is likely the position of the microphone.
Note that the testing data in the confusion matrix appears to have a uniformish distribution of each key being pressed. I suspect this data was not generated by someone actually typing because you would rarely see numbers and rare letters. It is possible these were simply pressed one at a time rather than in a series of rapid presses.
My guess is this approach uses the mic to identify where the sound of the key press was coming from rather than what each key press sounds like. Which does not invalidate the results but may make it seem less magical. Tbh it’s probably much worse this way because such a model could probably generalize very well across all keyboards and typing styles.
This idea could also be used for good at some point. Imagine “connecting” any keyboard to a device just by enabling the microphone.
It would have its own set of problems: not two people using it at once, eavesdropping would be really easy… but it’d have its own set of interesting applications
When calling my cellular/internet/medical/financial provider, it might be interesting to "see" what they are typing. (Or if they're randomly surfing the internet.)
I can imagine many, many situations where you might do this. But maybe another thing to be worried about are scammees being able to know the Password of people they are calling.
Timing attacks have been attack vector for a while? I remember reading a tool on HN a couple years ago about it. You don’t even need audio, the rate of which you enter the keys into the password field is enough.
There's a great scene in Le chant du Loup (The Wolf's Call) a French 2019 submarine flick (at one point on Netflix) where the sonar guy hears a password typed and reconstructs it from the sound of each keystroke.
I wonder would it be possible / how much data would you need if you'd only have long recording but no clear text to combine it with. Maybe you'd hear space bar as it often has a distinct sound (maybe backspace and return as well), and could create a script that finds the key associated with the sound by brute forcing every key to every unique sound and trying which combinations come out as reasonable sentences.
I wonder how well this would go paired with that attack from a year or so ago that can recover audio from video of a glass window pane. Set up a camera pointed at the outside of your competitor's office? Hear their passwords? heck even send them an email, recieve a reply, and train on them typing emails sent to you?
Wow that's kinda worrying for streamers on Twitch and Youtube etc. They sometimes enter passwords while buying a game on Steam or purchasing something on Amazon. Now they're going to have to think about muting as they are already targets of doxing.
Similar to the unique heartbeat each of us have, the way people type may be another fingerprinting method. When I type passwords and PINs, I often make motions to keys that I'm not hitting to fool the invisible stalker behind me.
Sounds like a great kickstarter/home diy: “mechanical keyboard noise scrambler”, which is just a portable speaker/mic that upon hearing your keyboard, starts playing fake attenuated noise.
Encrypted keyboards. Each key is randomly remapped at the start of each session. Some high security locks already use this to prevent over-the-shoulder cameras capturing codes.
The locations of the numbers move around to prevent mouseloggers from recording your movements.
It seems like any way of doing it would end up slowing down the typist though. If it is just for the password, I could see it being possible, but if you're dealing with lots of information that needs to be protected, then it seems impossible.
When I type my login or wallet password, I've done it so many times that the sound profile is going to be quite different to normal typing. Does the model handle that?
As someone who teaches Dvorak touchtyping I recommend to do it no later than in sweet twenties because you will not be able to type passwords, if this a goal of your learning. Typing passwords is a final exam for my students.
I find this really hard to believe. If it were really possible then people could do it with their ears, and they would be doing it and showing off that they can do it. The human ear (and brain) are really, really good at finding patterns and getting signal out of noise.
Yes. Humans have fantastic audio and video processing abilities, particularly picking out signal from noise. Even now human operators listen to sonar signals on submarines. There's a reason for that.
Part of the issue with keyboard audio is that it's very "noisy". It's like comparing two instances of white-ish noise. Statistics would be able to discern the instances immediately, but a human probably wouldn't.
Another part of the issue is if the laptop has two microphones, it can distinguish a place for low-freq sounds. The human head cannot locate low frequency sound sources such as a sub-woofer in a 2.1 system.