Hacker News new | past | comments | ask | show | jobs | submit login
New acoustic attack steals data from keystrokes with 95% accuracy (bleepingcomputer.com)
455 points by mikece 9 months ago | hide | past | favorite | 227 comments



So they generated training data from one laptop and microphone then generated test data with the exact same laptop and microphone in the same setup, possibly one person pressing the keys too. For the Zoom model they trained a new model with data gathered from Zoom. They call it a practical side channel attack but they didnt do anything to see if this approach could generalize at all


I believe that is the generalisable version of the attack. You're not looking to learn the sound of arbitrary keyboards with this attack, rather you're looking to learn the sound of specific targets.

For example, a Twitch streamer enters responses into their stream-chat with a live mic. Later, the streamer enters their Twitch password. Someone employing this technique could reasonably be able to learn the audio from the first scenario, and apply the findings in the second scenario.


Finally, a real security weakness to cite when making fun of people for their mechanical keyboard. Time to start recording the audio of Zoom calls with some particularly loud typers...


I used to work in an office space with an independent contractor whose schtick was that he was a genius. The affectations around his genius-ness included casually bringing up Mensa meetings, dropping magazines like Foreign Affairs and academic journals around the office, and his fucking keyboard.

The keyboard had custom switches that were very loud. And he typed fast - it was like living on a gun range. Everyone in the office probably would have chipped in for a hitman, but alas, the CTO, whose office had a solid door, was “inspired” that the mechanical feedback helped fuel inspiration in boy wonder.

Had we thought of the security risks of the keyboard, I would have brought good scotch to the infosec dude while expressing my concerns.


Somewhat tangential: clicky switches, like Cherry Blues, tend to click twice for each stroke. I think this leads to people assuming there are twice as many strokes going on. Tactile switches tend to only click once (when they bottom out). So, fancy keyboards can make people sound faster than they are.


I don't think that's quite right. Many switches including tactiles will make a sound when the switch tops out, from the stem hitting the housing.

As far as I know, Cherry blues only click once and the second sound you hear on a keypress is just the topping out sound.

https://cdn-blog.adafruit.com/uploads/2016/09/Blue.gif


And they make little o rings to dampen that if you're hardcore.


Add a guy that bottoms out the keys and you will have an additional "click".


> it was like living on a gun range

Thanks for this metaphor. I know off at least one guy, to which this metaphor could be applied as well.


Not inspired enough to hire him properly apparently...


Mechanical keyboard user here. Most of us use mechanical keyboards because they're a lot more fun to type on. That's it. Because if you're not having fun, what's the point?


I don't know, typing?

Else, something like Mai Tais on the beach sounds more fun, maybe it's just me...


but mai tais on beach don't get you money, and if you are going to type on something, its better to make sure its comfortable for you.


But inconfortable for others. Surely you all know it bothers a good amount at least some of your colleagues, right?


Well, not everybody works in an open plan, a shared office, or in an office building.


Obviously the comment discusses a shared space. If you have your own room you can let your fart rips and sniff them for fun, pull out your dick and piss in a bottle for fun, clank on your loud toys for fun, all the things you should never do with other people around that you might find fun for whatever reason. No one cares. But don't do these things to other people around you, it's anti-social.


> Surely you all know it bothers a good amount at least some of your colleagues

Quiet switches for the office, clicky switches for home. Not exactly a hard problem to solve :)


But isn't one of the reasons for using mechanical switches to be able to not bottom out, hence avoiding the repetitive shocks on the fingers? This is what I do with my tactile keyboards, and I'm actually quieter when I type quickly than my colleagues who bottom out on their cheap hollow HP keyboards like no tomorrow.


Is it? I've had a few mechanical keyboards, and follow some of those webpages devoted to different switches etc (not obsessively though, once in a blue moon), and I don't recall seeing "bottoming out" and "shocks" as any major benefit mentioned.

I also remember typewriters and old IBM style mechanical keyboards beeing quite heavy to activate, subjectively needing more pressure than some chiclet style "shock" (which I can barely feel).


Mai tais on the beach don't let you signal what a cool hacker you are. When the point of a thing is signaling, normal arguments don't apply.


Not according to the article.. Microphones are sensitive enough to mount the attack on quieter keyboards.


Microphones are surprisingly sensitive. I can listen to music in my closed-back headset at a regular volume. My desk mic can pick this up. Without boosting the audio it's barely audible that there's music, but after adding some gain you get almost the full song profile (and background noise).

I can even pick out some of my breathing from the recording.

If I turn on noise suppression and noise gate it's fine.


I was two rooms away from someone playing music on a smart Google device. I could very barely hear that music was playing at all and only just barely made out it was a song I had been interested in but kept missing. I pulled out my S22+ and used Shazam. somehow it was able to pick it up easily.


What we clearly need are louder keyboards - which overload the mic so as to render keystrokes indistinguishable.


Adding a gain knob to my keyboard, be right back.


My mechanical keyboard already has a knob that I've configured to control the system audio volume, all that's left is configuring Linux to play an audio recording of a keypress every time I press a key...


> all that's left is configuring Linux to play an audio recording of a keypress every time I press a key

I unironically think I've seen that config recently - someone had an actually quiet keyboard but wanted the full Mechanical Keyboard Effect™ so they just... have it play the sound per keypress. (It was not 100% clear to me whether it was an elaborate joke or a real aesthetic choice)


The Kinesis Advantage2 and the Moonlander have a piezo speaker to give keystroke sounds. However, they are not for, as you might expect to give the full Mechanical Keyboard Effect™.

If you have mechanical switches, you want to learn to type just past the actuation point and not until the switch bottoms out. This is relatively easy with tactile switches (the have a bump and the actuation point is immediately after the bump). However in linear switches, you don't feel when you have hit the actuation point. So the piezo speaker can be used during the first weeks to train your muscle memory of where the actuation point is, so that you can type lightly.

I had this on my Kinesis Advantage with Cherry Reds, and it was really nice during the initial days/weeks, after which I turned it off.


You want https://github.com/zevv/bucklespring then.

Lagniappe: “To temporarily silence bucklespring, for example to enter secrets, press ScrollLock twice”


When conducting coding interviews remotely I often switch from my mechanical keyboard to my laptop keyboard (for taking notes) because I know how annoying/distracting that sound can be on calls. Suffice it to say, having a gain knob on my mechanical keyboard would be wonderful.


I've wanted to integrate a cap gun into a keyboard, basically a an old fashioned roll of paper caps and solenoid to whack 'em, triggered by exclamation points.


Some old IBM keyboards (beamsprings, the predecessor to the Model F, which preceded the Model M) had solenoids inside to make them louder and sound more like typewriters. I wonder if such a setup would defeat this attack, or if it would still be possible to discern the actual keypress alongside the solenoid.


Not just limited to old IBM keyboards! The new reproduction Model F keyboards also have a solenoid option! It's fantastically loud with it banging on the solid metal case along with the buckling springs. Great keyboards in general.


I'm guessing it would be easier (assuming you trained it on that keyboard), because each solenoid would be fairly unique due to manufacturing tolerances. Just my gut feeling, I have no data to back it up.


I know nothing about this keyboard, but I'd assume it just has one solenoid because the expense and space of 100+ solenoids is impractical if all you're using them for is simulating the vibration/sound of a typewriter.


I wish I could delete my comment to hide my stupidity. For some reason I was thinking about springs despite reading and typing solenoid. You are of course 100% correct and unfortunately it's too late for me to hide my shame.


Or auto-mute upon key press.


Or just use a password manager.


Alternatively, constant random key press sounds playing in the background.


I'll just have to add significantly more background clickity clacks as obfuscation.


My thought was to run psyops all the time.

"Just need to type in my password." He says a little too loudly to nobody. Then just type in the honeypot password and login with the real one that you entered with a virtual keyboard a few minutes ago.

Meanwhile you've got a prerecorded keyboard going concurrently that decodes to "I know what you're trying to do. Clever but not clever enough."

And I guess you might as well have a special keyboard that you only use for typing in passwords while you're at it.


It’s so fascinating to watch this play out live. Once again, an ambitious kid can implement software hacks that are very funny when used for a joke, but also have massive real-world implications.


I fear that "in the name of security" is going to ruin everything.


Good luck with my mech steno keyboard.


I guess more reason to just use a password manager to autofill your password?


Only if it doesn't only rely on a master password


A nice thing about master passwords though is that since you don't have to type them in as often, they can be very long. 95% accuracy probably isn't good enough to reliably reproduce a sentence-length master password, at least if it's only captured once.


The master password is also offline and require the key file to u lock the rest of the passwords. So by itself it’s not enough to compromise the accounts in the key file. The attacker would need the key file as well.


>a sentence-length master password

Ij on-tep of sentenca lentg, it's alio sentemce-bused ("corvect harse batterg stapfe") then ut would be quiti eady to guess even wits worse accurasy.

(If on-top of sentence lenth, it's also sentence-based ("correct horse battery staple") then it would be quite easy to guess even with worse accuracy.)


potential solution: keep a few intentional typos in your passphrases. It also makes dictionary attacks much harder.


now you have to remember the the typos


Plus, if they can tell what the actual words would be, then brute forcing the typos is trivial


95% accuracy means for each stroke, the most likely key is the top choice. Most models return a probability distribution per key, and it's very like the other keys are in the top 2 or 3.

Then you simply have the password cracker start trying passwords ordered by probability, and I bet it breaks your sentence within very few tries.


95% means that on average only 1 in 20 keystroke will be wrong. Even if your password is very long (40-60) that means only 2-3 errors. Since more people are not machines their long password will be a combination of words like the famous "horsestaplebatterycorrect" example from xkcd.

Even if you flip a few letters from something like the above a human attacker will easily be able to fix it manually.

"horswstaplevatterucorrect" for example is still intelligible.


On average 2-3 errors. However the real thing we want to look at is what is my chance of guessing right across ALL characters. For 1 it's 95%, for 2 it's 90.2%, and it gets worse from there. The formula for accuracy would be .95^c where c is the number of characters in the password. So the chance of getting EVERY key correct in a 40 character password is < 13% and < 5% for 60 characters.


Right. The comment above is saying even if you are incorrect in 2-5 keystrokes it’s not hard to guess the correct keystrokes if you’re using a sentence style password.

You don’t need to guess every character.


What if the password is typed twice? You can easely figure it out then.


that's pretty high when you can use a computer to run the guesses


Doesn't everybody not require only a password?

Offline you need the database which isn't public.

Online you usually need something else on new machines to get at the true master password.


[insert yubikey plug]

I don't use one but I know people who swear by them.

Also this is an extremely obvious result. Typing is obviously a form of "penmanship", it was well known that telegraph operators could identify each other by how they tapped out Morse code in the 1800s.

People have been able to do this based upon key stroke latency and even identify people based on habitual mouse patterns for decades.

Audio recordings work as yet another reliable proxy? Shocked!!

I am amazed that people can do such obvious things and get published, have articles written on them... I need to get in on that, sounds easy

I can make a web demo. You turn on the microphone type a couple things into a box on the web browser.

Then you go to a different window and continue typing and then the model predicts What you are typing. As long as it's proper grammar you can get to effectively 100% accuracy. It'll appear to be spooky magic.

I just might take the time.


You sound confident enough that'd I'd like to see you show that off :P.


sounds like a good exercise although it'll literally just be for my own personal amusement. Nobody actually cares about this unless you've got some institutional clout which I do not. Praise for the PhD would be ridicule for you and me.

But really, should be fun ... the laptop dock mic will be great for this. If it's external you're in trouble ... but the researchers just used the onboard so it'll be fine.


Don't type your master password on zoom calls


Or use your fingerprint


why is that ?


1Password allows unlocking with a fingerprint (Touch ID) or Apple Watch, at least on a Mac. So you can unlock your password manager during a Zoom call, and nobody can snoop your master password.

(With 1Password, the master password is not enough to do a remote account takeover, you also need the second-factor key. And you can't snoop it, since it is only required during the first login, so a user will never type it after that.)


What actually are you going to do if you spy on my zoom call and learn my master password is bigjarofpickles?


Hacker: tedunangst, what’s your email? Wanna invite you to that thing!

Hacker: man, I hate typing passwords. Do you use password managers? Any reccos?

… I am become hacker, destroyer of tedunangst’s bank account.


1Password requires an extra key upon the first login that you never have to type afterwards. So, have fun trying to log in to that password manager, even if you have the master password.

Also, you can also use and require a hardware FIDO2 token as second factor.


Or just use 2fa


If you have 2FA and one part of it is easily figured out, then you have one factor authentication.

If you cared enough about the authentication in the first place to bother with 2FA, then I guess it seems like the reduction there is still something to be worried about, right?

Lots of “two factor authentication” schemes seem to involve just getting a text or something, so, not very secure at all. Of course, this is bad 2FA, but it is popular.


Perfect is the enemy of good. Text based 2FA is compromisable relatively easily but at least it's an extra hurdle.


It's the "or just" being the issue there, not the "use 2fa".


which is the point of 2fa – when the 1st factor fails the 2nd holds


Now that I know about the existence of this generation of acoustic attacks I would like to have the possibility to insert a second "master password" different from the main one, that instead of letting me directly access to my passwords just allows me to use fingerprint to get them. Guess if it's already possible


I think maybe you wouldn't even need to see the keystrokes. Given enough examples of just audio, I wonder if you could work out the keys using the statistical letter patterns in language.


And there are therefore millions of hours of video that could be attack surface area already in the wild


for a few years I've used rtx voice to remove keyboard typing and other background noise


seems like a very niche case to be warranting the headline and Hackernews front page


I think this linited attack surface can work without having to generalize one model to multiple people or keyboards. One advantage of a Zoom attack is that you get “plaintext” shortly after hearing the “ciphertext” if you can get the target to type into the chat window. And when you hear typing in other contexts it’s likely to be something that matches a handful of grammars that an LLM can recognize already (written languages, programming languages, commands, calculation inputs) - and when it doesn’t, that’s probably a password.


Do keystrokes still come through Zoom? The noise filtering has become extremely aggressive lately, often hear people say “Sorry about that engine / ambulance / city noise” but nobody knows what they’re talking about.


It's for a targeted attack. It doesn't need to be generalized.


How come keyboard sound suppression is not a standard option in all online communication apps? It’s not that hard, keyboard sounds are pretty distinct.


Maybe because it's easier said in an HN comment than in real life


VoiceMeter worked reasonably well for me after some tinkering with sliders. Nvidia RTX voice should filter that out too.


Yeah and in fact, I've heard of this attack being done in the past, but it heavily depends on the typist, the keyboard, etc. Cadence, sound, etc changes with the typist and hardware. This isn't new, and has very few, if any practical applications for wide spread replication.


The answer is that likely all the above are used.

Asking for “what signal it is detecting” might be better asked from a “what is the greatest signal bearing information” being used… which would help in averting attacks.

This kind of stuff could be real menacing in all sorts of public places like airports, coffee shops and etc.


Seems simple to defend - use a password manager.


until you have to type your password to unlock it


High security safe locks have had protection against this for a long time: you press up/down arrows to move from a random starting digit to the correct digit.

On screen pin entry with jumbled number mappings does the same thing. It also makes the inter-stroke delay rather independent of position, because the brain has to search the screen (although repeated digits and previously occuring digits are quicker, which is why some jumble at every keystroke).

Keyboards with OLED keys (like the Apple Touchbar or the Optimus[1]) might also work.

[1] https://www.artlebedev.com/optimus/popularis/


Biometric unlock or PIN ? I have to type my master password on restart, hopefully you can do that off screen.


your password manager hopefully uses an additional factor to enable it on a new device, so definitely avoid typing that in on Twitch


Good enough for PoC.


it is definitely possible to generalise this, a couple of years ago I did the same with a pair of microphones.


I did a similar acoustic side-channel attack as final year project at uni. There's a treasure trove of findings in this area, I'm just waiting for someone to combine methodologies. There are pretty good results using geometric models, trained and untrained statistical models like this and others, and combining these features with assorted language models.

Here's a few random papers I read along the way:

https://doi.org/10.1007/s10207-019-00449-8 - SonarSnoop, which uses a phone's speaker to produce ultrasonic audio that can be used to profile the user's interaction (e.g. entering swipe-based passcodes).

https://people.eecs.berkeley.edu/~daw/papers/ssh-use01.pdf - "Timing Analysis of Keystrokes and Timing Attacks on SSH", a paper from 2001 that uses statistical models of keystroke timings to retrieve passwords from encrypted SSH traffic.

https://doi.org/10.1145/1609956.1609959 - "Keyboard acoustic emanations revisited", which uses hidden Markov models and some other English language features to recover text based on classification via cepstrum features.

https://doi.org/10.1145/2660267.2660296 - "Context-free Attacks Using Keyboard Acoustic Emanations" which uses a geometric approach, using time-difference-of-arrival to estimate physical locations probabilistically.


I'm not clear why people are poo-pooing this as if it's not a big deal. From a security and espionage point of view this is pretty significant - the audio learning has got to the point that a sensitive audio bug can bascially be key logger. There are a ton of context where an audio tap would be much easier to get in place than a traditional network attack (and with modern shotgun mics, might not even require being in the building). That is applicable to much more than just password stealing.

I've always been a bit fascinated by this attack vector and wondered if would get to this point.


Yes it seems like any possible physical side channel (eg Tempest as well) is now amenable to machine learning approaches. Very interesting indeed.


I wonder if playing the typing sound constantly could help. Not an abstract sound, but recording of your actual typing on this particular keyboard, mixed to play some realistic-sounding phrases / sequences. It should pause for a split second to let your actual keystrokes mix in. That would be really hard to decipher, or to correlate your typing with whatever other events (time to enter a password).

Better yet, play some white noise around you. I heard that it's actually done sometimes at really important meetings.

If you're not such a VIP, just type important things only on your phone; touch screens don't produce enough sound, hopefully.


you would need to tie microphone input with the actual keys typed, and enough of it to train a model. nothingburger


Fascinating. I'm really curious what the acoustic properties are that it's recognizing.

Is it more of a physical fingerprint of each key, such that if you swapped keys/springs the model would need to be updated? So it's produced by manufacturing inconsistencies, the way individual typewriters used to be forensically identified?

Or is more each key being identical, but producing a different resonance pattern within the keyboard/laptop due to the shape of all of the matter surrounding it? If you move the keyboard in the room, do you have to re-train the model?

I also wonder how much it varies depending on how hard you press each key -- not at all or a great deal? And what about by keyboard -- when you compare thin MacBook keys with an external full-height keyboard, is one easier/harder to recognize each key on than the other?


Building on what you said: (1) just the key's properties; (2) key properties relative to other keys; (2) sound transmission and environment between key and microphone; (3) relationship between key and finger; (4) relationship between key and associated dendritis


I presume typing style matters aswell. How quickly you reach each key, rythm, how hard you tend to hit a specific key.

My sense is that they profile the person more than the keyboard.


By the way, some (most?) videoconferencing software removes keyboard sounds from the audio, because it's particularly a distracting problem with laptops where the microphone is right next to the keys.

I'm pretty sure Zoom does this by default as part of its noise cancellation (it's potentially even easier since you can use keydown events to help identify, not just the audio stream).

So as long as basic default noise cancellation is on, that would at least prevent this over regular videoconferencing. And because of this, I'm having a hard time thinking of when else this would be a realistic threat, where the attacker wouldn't already have enough physical access to either install a regular keylogger or else a hidden camera.


Teams definitely don't have this, at least not by default, or not by default in our corp. Anytime somebody on the call starts typing you hear it very clearly.


Meetings between organizations, multi-office cafeterias, or coffee shops, perhaps.


If any random webpage is granted access to the microphone, I would think this could be a problem.


Georgi Gerganov created one a few years ago

https://github.com/ggerganov/kbd-audio


The example figure shows a key hit every half second, which suggests a pecking style of typing at around 24 wpm. This way the model gets very clean waveforms. I wonder how their approach would work with average or fast typists. The sound profiles might be much harder to link to characters.


Even if there was ambiguity, some data is better than none. Given enough training data, I suspect you could find repeatable patterns in standard typists: on a qwerty layout, after typing an "A", "Q" takes 1.2-2.3x as long to type as a "J" kind of pairwise tempo patterns. Anything to reduce the search space from brute-forcing every candidate character.

Even better if the target uses a passphrase, "hXXXse battXXX stXXXXX cXXXXXX" becomes interpretable given a few landmark letter identified with high probability.


Sovjet listened successfully to typewrites back in the 1970s.


Impressive. To be fair, a lot of typewriters jam if you press more than one key at a time, plus they are very loud.


What's more impressive that the vibration of the glass windows can be used, too.


In response to this post, I just open sourced a starter project to a variation of this idea: https://github.com/secretlessai/audio-mnist. I've been interested in doing image classification techniques like CNN on audio data for a while.

A couple years ago for a weekend project I made a simple "audio-mnist" dataset from handwritten digit audio recordings. I never got past a few days worth of work, but open-sourcing it has been on my mind for a minute. This post kicked me into action. Getting some more data, basic CNN examples, etc. could provide a nice starting point for a lot of research and tools.

There is still separate code I'd have to find and make intelligible to create the recordings and split the audio.

Anyway, in case anyone finds part of this process interesting or useful.


Would love a wireless keyboard that works using this! It wouldn’t need any battery, charging or syncing!


Some old TV remotes used to work this way. They were made by Zenith and are called Space Command remotes. Apparently they are the reason TV remotes are sometimes called clickers.

https://www.theverge.com/23810061/zenith-space-command-remot...


I've never considered how odd clicker is for remote but it feels totally natural to me. Like something my parents or grandparents would say. Never thought about where it came from.


Imagine the UX of 1 in 20 characters typed being incorrectly inferred though. The P_failure*Cost impact would strike me as insufferable even if error rate were to improve by an order of magnitude.


I was thinking it could be a keyboard designed to make sounds special sounds so it can be interpreted very accurately


Time to inject background audio of me typing "fuck you" into my zoom calls.


Text-to-keystroke-audio where the text comes from the LLM Prompt "fanfiction based on HGTV's Love It or List It starring an Ewok realtor and Klingon interior designer in iambic pentameter".

The goal is to cause the eavesdropper to totally reevaluate their life choices, and maybe even get caught up in the story.


Tactical noise!


That might make it even easier to decipher. A nice reference point.


Using an image classifier on spectrograms is pretty funny. Not a bad idea, given image classifiers are dime a dozen, but still.


It's actually quite common. One of the big bird recognition apps does just this.


There are multiple apps for this? Seems like PBS KIDS should own the authoritative one, and the licensing.


I don't use the qwerty layout, I use colemak. Likely this mitigates this for myself.


This is just security through obscurity. For real security, you need a cryptographically rolling keyboard layout.


My sister in law uses voice recognition and dictation software, so she doesn't even use a keyboard! Totally safe!


Whereas for practical security, having some common substring in all your passwords that you don't type but insert through some global hotkey would be just fine as a mitigation against eavesdrop attacks.

Yes, that's also obscurity, but obscurity is actually good - it only got a (deservedly) bad reputation from when it gets used as a substitute (but I fail to see how using a nonstandard keyboard layout would even count as obscurity in the context of an audio attack, as the clear text reference would surely go through the same layout?)


Brilliant suggestion. Have a TRNG or a CSPRNG (if too poor for a TRNG) choose the next layout at random for you, ideally with every keystroke. Good luck cracking that!


Some places use touchscreen keypads for PIN entry exactly for this reason: to allow randomization, e.g. for opening a locked door, or for authorizing a transaction.


That is interesting.

I’m sure it depends on the application to some extent. I can type my pin in without looking at all, so I can cover it up while doing it. If I had to hunt and peck, it’d easier for an onlooker to observe my slower motions I think.

But if I used the same machine often enough to produce wear specific to me, this randomization would be really useful.


I use a randomized PIN pad on my phone, and I've gotten quite used to it. I can enter my PIN almost as fast as I could on an unscrambled pad; it's definitely not hunting and pecking.


Do they randomize the key locations though?

Otherwise, you leave behind grease where your fingers touched


Yes, the layout is randomized every time you use it.


Could be done by using a device with a display - e.g. an "ereader" - to present a random keyboard layout. But, good luck being efficient typing on that. At that point, better use a different input model.

Or, use techniques such as those in the article, such as random keypresses played during the actual ones.


Some banks went through a phase of this - website would present an on screen keyboard for the password field with a randomized layout.

I'm sure customer frustration was huge.


Even using Vim or Emacs would add some obufsCTRL[dbiobfuscation from all the spurious keystrokes.


...wait, are you telling me Konami shuffling the touch input for e-Amusement PINs[0] was a good idea!?

[0] Okay... deep breath

Konami is a pachinko manufacturer with a side hustle making rhythm games for Japanese arcades. They have an online service that all their games connect to called e-Amusement. You can log into it using an e-Amusement Pass card, and your card is locked to a PIN number you have to set up when you first use it. Cabinets with touchscreens give you a touch keypad, except all the digits are shuffled around, which is a total pain in the ass and you have to do this for every credit.


Indeed. Let me add that how your fingers come into contact with the keys is probably just as important. I recommend a cryptographically rolling choice of dustballs, crumbs, and boogers.


Why not just a keyboard that produces random noise?


Finally, a use for Buffy's Swearing Keyboard.

Or possibly the exact opposite of that, I can't tell if it's a one-to-one mapping on mobile: https://www2.b3ta.com/buffyswear/

(Also, I'm feeling my age now, given how many years have elapsed since that kind of thing passed for internet culture…)


Because the real data stream would still be there, just mixed with some noise. It feels harder to analyze whether the noise sufficiently obscures the real keystrokes than it does to ensure the actual keystrokes reveal no information.


I'm pretty confident that statistical analysis would give away your layout (assuming there's enough data), I wouldn't be so sure.


Stealing your layout.


At least it would have, until just now, when you recklessly disclosed your secret keyboard layout. :P


That's the equivalent of a shift cipher with a well known offset.


This specific attack could also be easily mitigated by dictating your passwords instead.


Couldn't they just translate the detected keystrokes to colemak layout?


Yes but you would have to know or try all possible layout


this is a targeted attack, it won't do much at all.


Now they can make wireless keyboards that don't need a battery or radio!


That's already possible, the lack of battery, but likely impractical.

There is enough energy during key press/release to be usable for sending radio signal, however it won't be sufficient to do it while holding a key. A combination of a solar panel, piezoelectric keys and a tiny li-ion (as backup) may be sufficient for a 'battery-less' keyboard, but it will be too expensive.


Could you send a separate 'key up' signal on release from the energy of the up-stroke?


That likely would require a (beefer) string to store the energy as release the key alone doesn't require any force


This is hardly a new concept btw.

In 2005 ACM's CCS Zhuang, Zhou and Tygar presented Keyboard Acoustic Emanations Revisited [1]

    We examine the problem of keyboard acoustic emanations. We
    present a novel attack taking as input a 10-minute sound recording
    of a user typing English text using a keyboard, and then recovering 
    up to 96% of typed characters. There is no need for a labeled
    training recording. Moreover the recognizer bootstrapped this way
    can even recognize random text such as passwords: In our experiments, 
    90% of 5-character random passwords using only letters can
    be generated in fewer than 20 attempts by an adversary; 80% of 10-
    character passwords can be generated in fewer than 75 attempts.
    Our attack uses the statistical constraints of the underlying content, 
    English language, to reconstruct text from sound recordings
    without any labeled training data. The attack uses a combination
    of standard machine learning and speech recognition techniques,
    including cepstrum features, Hidden Markov Models, linear classification, 
    and feedback-based incremental learning
which builds up on Asonov & Agrawal's work [2] who came up with the idea the previous year (2004).

    We show that PC keyboards, notebook keyboards, telephone 
    and ATM pads are vulnerable to attacks based on
    differentiating the sound emanated by different keys. Our
    attack employs a neural network to recognize the key being 
    pressed. We also investigate why different keys produce
    different sounds and provide hints for the design of homophonic 
    keyboards that would be resistant to this type of attack.
[1] https://dl.acm.org/doi/10.1145/1609956.1609959

[2] https://ieeexplore.ieee.org/document/1301311



So microphones need to get muted automatically by password prompts, seems simple enough in principle.


That would certainly solve the password issue. And if a sufficiently paranoid person is aware of this attack vector, they could just manually mute the mic at any time they are typing in any sensitive information. I initially was thinking that using a Dvorak or even better custom layout would help, but upon further reflection I think not -- the first-pass output would be equivalent to a substitution cipher, and quickly solved as such.

This topic has me wondering though if it's possible to detect finger positioning or for that matter screen information from the reflection off the typist's eyeballs/eyeglasses shown in a webcam, or perhaps even if possible in principle, in practice most webcam resolution is simply too poor for that.


Zoom is good at filtering out rather loud background noises. I can't imagine that the sound of background typing during a conversation could be detected by the other party.


What? Zoom (by default with auto mic adjustment) catches everything. Typing on laptop is especially bad as it is closer to the mic than the person speaking (unless there is external mic), so it's like a stampede of rhinos.


It shouldn't. Auto (the default) is designed to filter out keystrokes along with other noises, precisely because typing on the laptop is horrible for the reason you mention.

Keystrokes should only be a problem when noise suppression is set to low/off, which you want to do for e.g. playing music.

But noise suppression is applied to sending audio, not receiving it. So you might need to tell your coworkers to re-enable their noise suppression.


In this case the parent comment is considering Zoom as an ally, while you are considering it an adversary.

So, in case that “what” was intended to denote some confusion, there is the most likely source.


If you’re on macOS, you can use the voice isolation mic mode.


I think about this attack when streamers on Twitch logs into websites etc.


I think an attacker would find that many streamers with high quality audio have properly setup their mics with noise gate filters to remove their relatively quiet keystrokes.


I wonder how hard this problem is. I bet it’s actually not that bad. If I were to guess, A huge part of the problem is likely the position of the microphone.

Note that the testing data in the confusion matrix appears to have a uniformish distribution of each key being pressed. I suspect this data was not generated by someone actually typing because you would rarely see numbers and rare letters. It is possible these were simply pressed one at a time rather than in a series of rapid presses.

My guess is this approach uses the mic to identify where the sound of the key press was coming from rather than what each key press sounds like. Which does not invalidate the results but may make it seem less magical. Tbh it’s probably much worse this way because such a model could probably generalize very well across all keyboards and typing styles.


This idea could also be used for good at some point. Imagine “connecting” any keyboard to a device just by enabling the microphone.

It would have its own set of problems: not two people using it at once, eavesdropping would be really easy… but it’d have its own set of interesting applications


New? Sovjet listened to typewriters in the 1970s.


But what passwords are you typing while on zoom and why aren't you on mute?


When calling my cellular/internet/medical/financial provider, it might be interesting to "see" what they are typing. (Or if they're randomly surfing the internet.)


Given your username, you might find this interesting:

https://en.m.wikipedia.org/wiki/Tempest_(codename)

TEMPEST considered almost everything from electromagnetic leakage to exactly the attack described here.


How long are you talking to them that you've been able to record samples of the sound of all their keystrokes and perform this analysis?


Call support, get the URLs and logins for all their internal apps. Ouch!


Presumably all their backoffice stuff is only accessible via VPN. Oh, wait...


I can imagine many, many situations where you might do this. But maybe another thing to be worried about are scammees being able to know the Password of people they are calling.


Timing attacks have been attack vector for a while? I remember reading a tool on HN a couple years ago about it. You don’t even need audio, the rate of which you enter the keys into the password field is enough.


How do you get the rate?


Maybe any one of your browser tabs has JS listening to the accelerometer. It doesn't even require a permission, AFAIK.


Looking at the traffic of an SSH session?


I seriously doubt that.


There's a great scene in Le chant du Loup (The Wolf's Call) a French 2019 submarine flick (at one point on Netflix) where the sonar guy hears a password typed and reconstructs it from the sound of each keystroke.

https://youtu.be/a9Gz7Bg07u8

This attack is about as realistic as the film: a parallel universe where million to one chances happen nine times out of ten.


I wonder would it be possible / how much data would you need if you'd only have long recording but no clear text to combine it with. Maybe you'd hear space bar as it often has a distinct sound (maybe backspace and return as well), and could create a script that finds the key associated with the sound by brute forcing every key to every unique sound and trying which combinations come out as reasonable sentences.


I wonder how well this would go paired with that attack from a year or so ago that can recover audio from video of a glass window pane. Set up a camera pointed at the outside of your competitor's office? Hear their passwords? heck even send them an email, recieve a reply, and train on them typing emails sent to you?


I heard about stuff like this years ago, and how the CIA could get passwords by pointing long distance microphones at people's windows.

I suspected that the famously terrible Treasury Direct website with its on-screen keyboard was a half-assed attempt to prevent this sort of attack.


Wow that's kinda worrying for streamers on Twitch and Youtube etc. They sometimes enter passwords while buying a game on Steam or purchasing something on Amazon. Now they're going to have to think about muting as they are already targets of doxing.


Similar to the unique heartbeat each of us have, the way people type may be another fingerprinting method. When I type passwords and PINs, I often make motions to keys that I'm not hitting to fool the invisible stalker behind me.


Sounds like a great kickstarter/home diy: “mechanical keyboard noise scrambler”, which is just a portable speaker/mic that upon hearing your keyboard, starts playing fake attenuated noise.


What would be a good quality/price ratio microphone for this sort of keystroke sound recording?

It would be nice to try to tokenize the strokes and then try to assign labels probabilistically.


Encrypted keyboards. Each key is randomly remapped at the start of each session. Some high security locks already use this to prevent over-the-shoulder cameras capturing codes.


The bank pin UI from the game RuneScape comes to mind. https://imgur.io/UAgrY7e?r

The locations of the numbers move around to prevent mouseloggers from recording your movements.

It seems like any way of doing it would end up slowing down the typist though. If it is just for the password, I could see it being possible, but if you're dealing with lots of information that needs to be protected, then it seems impossible.


When I type my login or wallet password, I've done it so many times that the sound profile is going to be quite different to normal typing. Does the model handle that?


There’s an app somewhere that removes your keyboard audio from your audio streams. Sounds like it is a vulnerability remediation.


Passwords aren't the only at-risk category. "This presentation is a tire fire" is a vector, too.


Some systems have a setting to disable touchpad for x milliseconds after a key press.

Do we need something similar for microphones too?


Users will do anything and everything for not getting rid of using FOSS which doesn't spy against a user by definition.


How does it handle against me shrieking loudly while I type? Specifically screaming at my keyboard


So, from this point on, one time passwords only? I can't imagine any other proper solution.


Biometrics, physical security tokens, etc.


If this means the end of those loud mechanical keyboards then good. I never liked the clicking noise.


No it means the beginning of people playing recordings of loud mechanical keyboards all day to thwart the snooping algorithms.


I thought something about this in 1999, this can also be done in high volume beeps like in an ATM.


Physical Access Owns, as usual.


Would mechanical keyboards be easier targets for this than quieter ones?


Oh cool, so it's time to learn Dvorak or other keyboard setups.


You'd need to randomise the keyboard layout every so often, perhaps every 100 strokes.


As someone who teaches Dvorak touchtyping I recommend to do it no later than in sweet twenties because you will not be able to type passwords, if this a goal of your learning. Typing passwords is a final exam for my students.


It's always a good time to get a moonlander. :o)


A plain old desk fan makes an excellent white noise generator


If this means I have to abandon my clicky keyboard I give up.


That would be really terrible for streamers


We’re entering a post-privacy era jesus


i use 1password and have never ever typed password, so i am probably safe.


The risk isn't limited to passwords:

"...passwords, discussions, messages, or other sensitive information..."


Two words for you: Master password.


Touch ID


Death metal.

Suck it.


Very interesting that this is even possible. But seems somewhat dangerous, making an audio recording is very easy.


I find this really hard to believe. If it were really possible then people could do it with their ears, and they would be doing it and showing off that they can do it. The human ear (and brain) are really, really good at finding patterns and getting signal out of noise.


You're really surprised that computers can outperform humans at pattern recognition?


Yes. Humans have fantastic audio and video processing abilities, particularly picking out signal from noise. Even now human operators listen to sonar signals on submarines. There's a reason for that.


Part of the issue with keyboard audio is that it's very "noisy". It's like comparing two instances of white-ish noise. Statistics would be able to discern the instances immediately, but a human probably wouldn't.


Another part of the issue is if the laptop has two microphones, it can distinguish a place for low-freq sounds. The human head cannot locate low frequency sound sources such as a sub-woofer in a 2.1 system.


Computers are better at stuff than humans? Impossible! I am the king of math, no machine beats me in calculating numbers!


Piano players can do it if the typist uses a piano keyboard. Also 88 keys but arranged in one row.


I think that a person could do this too with enough training.


This isn't new. Soviet listened to typewiters back in the 1970s.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: