You probably cannot recall how many times you’ve heard, “Your call may be recorded for quality assurance purposes.” Just as your fingerprint or faceprint is considered distinctive enough for biometric recognition system purposes, so too is your voiceprint unique enough to identify you. Unlike fingerprints where you’ve touched something or facial recognition biometrics in which you had to go out in public to have your image captured, your voice could be captured from the privacy of your home.
The FBI Biometric Center of Excellence said that voice recognition systems are "a popular choice for remote authentication due to the availability of devices for collecting speech samples (e.g., telephone network and computer microphones) and its ease of integration." Furthermore, the FBI believes voice biometrics will be a “reliable and consistent means of identification for use in remote recognition.” Deploying voice recognition requires no “special equipment” other than a good quality microphone which most of us have thanks to our mobile phones.
Slate had an interesting article about how law enforcement can identify you via VoiceGrid Nation created by a company called SpeechPro in the United States, but which operates as a “Speech Technology Center” in Russia. This sent me to read up about SpeechPro and its voice recognition technology.
This image shows how VoiceGrid works and here’s some other info gleaned via their documentation. Voice matching technology can “automatically separate the voices within a two-person dialog and send each voice individually for matching” and is being used as “part of a comprehensive plan to best leverage existing and new audio data.” Even without considering the NSA surveillance via intercepting calls, the whitepaper gives numerous examples of passive sources for voice recognition data that has “already been collected.” These include voicemail, recordings made while speaking to commercial service providers such as banks, cell phone companies, and cable TV companies, as well as 911 calls, suspect interviews and court recordings.
The company’s technology uses three methods for voice matching and an algorithm that automatically compares “voice models against voice recording obtained from different sources such as cell phones, land lines, covert recordings and recorded investigative interviews.” When combined, there is a 90% voice match to identification accuracy within 15 seconds. However, according to VoiceGrid’s “key figures,” it only takes:
· 3 seconds is the minimum required speech pattern for analysis.
· In 5 seconds, it can search/match in 10,000 voice samples.
· 10 seconds is the average time for feature extraction.
· Executes up to 100 simultaneous searches.
· Accommodates up to 1,000 active users.
· Stores up to 2,000,000 samples.
The VoiceGrid ID leaflet states, “Voice recording is recognized as a non-invasive technology,” but don’t confuse that with privacy-friendly. While the documentation doesn’t explicitly state how-to capture covert recordings, it does advise that even a high quality microphone must not be poorly placed in room “where it is susceptible to echoes, ambient noise from fluorescent lighting and too far away from the subject.” It gives law enforcement some “best practice” tips on voice data collection when booking a suspect. These included “prompting the subject to say the alphabet, counting to 30 and stating their name and address. However because of the text independent nature of the technology it is not so important what the subject says as how much they say or in other words as long as they are talking, the necessary data can be collected."
Just as you can recognize a friend’s voice when they call even if he or she sounds sick, upset or intoxicated, or can recognize their voice over loud background noises, a SpeechPro VoiceGrid Voice Recognition whitepaper explains overcoming “unique challenges” like Channel Effects. Examples were differences in the quality of device microphones used to obtain the recording as well the need to separate the voice data from background noises like music or traffic horns. “Additionally, once the voice sample has been collected, the technology must compensate for voice variability such as emotional state, illness and physical condition such as slurring of words if the subject in under the influence of drugs or alcohol.”
Voice biometric identification is used by the government, law enforcement, the telecommunication industry, and some commercial applications for “identification and verification uses.” It can be extracted from audio or video to be used in “any situation where audio may be the only lead or evidence.” Cited examples include domestic abuse, calls violating protection orders, prank calls, false 911 emergency calls, terrorist threat calls, agency radio communication abuse, inmate call monitoring, kidnapping, extortion, corruption, gang and organized crime communication. The Mexican Federal Police have a nationwide system which provides “voice recognition against a database of over 600,000 records” that were taken “during the criminal booking process as well as from public workers.” The system was "projected to grow to over 1 million records during 2012."
According to the SpeechPro whitepaper published in 2011, the company obtained a "commitment from a State Justice Agency to deploy a pilot system for the purposes of performance studies and further development of best practices for obtaining voice samples in a booking environment" in the USA. A "beta version of the system" was supposed to have been delivered by the end of 2011. SpeechPro president Aleksey Khitrov told Slate’s Ryan Gallagher their voice recognition systems are used in “more than 70 countries and that the Americas, Europe, and Asia.” Khitrov added that “the company is working with a number of agencies in the United States at a state and federal level.”
How does SpeechPro know that their technology won’t be used to allow “state security agencies to very effectively monitor and identify phone calls made by targeted political dissidents (or anyone else for that matter)?” Khitrov assured Gallagher that the technology is “used for only very noble causes” and that SpeechPro makes sure, “we work with trusted law enforcement agencies and try to make sure that they use it properly.”