Malicious attackers can extract PIN codes and text messages from audio recorded by smart speakers from up to 1.6 feet away. That’s according to a new study coauthored by researchers at the University of Cambridge, which showed that it’s possible to capture virtual keyboard taps with microphones located on nearby devices like Alexa- and Google Asisstant-powered speakers.
Amazon Echo, Google Home, and other smart speakers pack microphones that are always on in the sense that they process audio they hear in order to detect wake-up phrases like “OK Google” and “Alexa.” These wake-phrase detectors occasionally send audio data to remote servers; studies have found that up to a minute of audio can be uploaded to servers without any keywords present, either by accident or absent privacy controls. Reporting has revealed that accidental activations have exposed contract workers to private conversations, and according to the researchers, these activations could reveal sensitive information like passwords if a victim happens to be within range of the speaker.
The researchers assume for the purposes of the study that a malicious attacker has access to the microphones on a speaker. (They might make a call to or tamper with the speaker, for instance, or gain access to the speaker’s raw audio logs.) They also assume that the device from which the attacker wishes to extract information is held close to the speaker’s microphones and that the make and model are known to the attacker.
In experiments, the researchers used a ReSpeaker, a 6-microphone accessory for the Raspberry Pi designed to run Alexa on the Pi while providing access to raw audio. As the coauthors note, the setup is similar to the Amazon Echo minus the center microphone, which all of the Echo models lack.
Taps in audio recordings on the “victim” device — in this case an HTC Nexus 9 tablet, a Nokia 5.2 smartphone, and a Huawei Mate20 Pro — can be recognized using microphones by a short 1-2 milliseconds spike with frequencies between 1000-5,500 Hz, followed by a longer burst of frequencies in and around 500 Hz, according to the coauthors. The sound waves propagate both in solid material like smartphone screens and the air, making them easy for a microphone to pick up.
The team trained an AI model to filter taps and distinguish actual taps from false positives in recordings. Then, they created a separate set of classifiers to identify potential digits and letters from the taps detected by the first classifier. Given just 10 guesses, The results suggest that 5-digit PINs can be guessed up to 15% of the time and that text can be inferred with 50% accuracy.
The researchers note that their proposed attack might not be possible on Alexa and Google Assistant devices because neither Amazon nor Google allow third-party skills to access raw audio recordings. Moreover, phone cases or screen protectors could alter the tap acoustics and provide some measure of protection against snooping. But they assert that their work demonstrates how any device with a microphone and audio log access could be exploited to collect sensitive information.
“This shows that remote keyboard-inference attacks are not limited to physical keyboards but extend to virtual keyboards too,” they wrote in a paper describing their work. “As our homes become full of always-on microphones, we need to work through the implications.”