Amazon has confirmed a report that one of its Echo devices recorded a family's conversation and then messaged it to a random person on the family's contact list, who is an employee of a family member.
But Amazon, in a statement emailed to Computerworld, confirmed every privacy advocate's worst nightmare with its explanation: “Echo woke up due to a word in background conversation sounding like 'Alexa.' Then, the subsequent conversation was heard as a 'send message' request. At which point, Alexa said out loud 'To whom?' At which point, the background conversation was interpreted as a name in the customer’s contact list. Alexa then asked out loud, '[contact name], right?' Alexa then interpreted background conversation as 'right.' As unlikely as this string of events is, we are evaluating options to make this case even less likely.”
For the record, the family says they didn't hear the Echo saying anything. Both versions may be correct. If the family was in a heated discussion — and had no reason to focus their attention on the Echo device — they might not have heard or noticed the Echo speaking.
Personally, as disturbing as this incident is, I find it all too likely. In communicating with Siri, I have heard it often "mishear" a word as a command and then act on it. Once I was preparing to text someone and my landline phone (yes, I still have one) rang. When I was done, I was amused that Siri's voice recognition had transcribed my end of that phone conversation and was about to text it to my contact. If it interpreted any word I spoke as being close to "send," it would have done it.
Amazon did not respond to a request for an interview by deadline. Had it done so, I would have asked what words the family spoke that Echo interpreted as commands and confirmation words. Had the family actually said those command words in their conversation — at the exact right points — this situation would be different. And it would then merely be an issue of Echo's verification commands needing to be more particular. After all, someone saying the word "right" in a sentence shouldn't be enough of a verification. Or something rhyming with right?
Let's take this up a level, with a reference to every IT person's favorite acronym this week (GDPR) and how far we allow software to function autonomously (I'm looking at you, machine learning).
One of the provisions of GDPR, which kicked in on Friday (May 25), is that companies must report germane data breaches within 72 hours. But 72 hours from what? That's where things get interesting. Here's the way GDPR phrases it: "In the case of a personal data breach, the controller shall, without undue delay, and where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority competent in accordance with Article 55, unless the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons."
In this context, the controller is the breached company. We have to ask: When does a company become "aware of" something? Is it when a human employee of that company becomes aware of it? If so, which human employee? The CIO? The CISO? The CEO? And for a breach, is someone aware of a breach when they first hear of the initial preliminary indication that something may or may not have happened? Or is it only when that person becomes truly convinced that a breach did indeed happen — which can be months later?
But let's go back to that initial question: Is it in fact when a human employee becomes aware? What if a server — perhaps owned by an antivirus package used by the company — sends a message to a company-controlled server and shares an alert that a virus has been detected? And what if the company server takes action on its own to negate that threat? Do those actions constitute that company knowing?
With artificial intelligence's machine learning capabilities soaring through enterprise IT these days, how many system actions will soon go way beyond what any programmer had intended? Machine learning is designed to look for patterns and to extrapolate from that information and recommend actions. As speed becomes ever more critical, especially with security decisions, many companies will program the systems to act on the machine-learning extrapolations without waiting for human confirmation.
This gets us back to the Echo situation. Amazon's A.I. efforts here allow the system to take actions based on what it thinks it hears, with no human confirmation — or, at best, an insufficiently stringent human confirmation mechanism.
Was the fault here poor voice recognition or faulty interpretation of that voice recognition? Amazon, in its efforts to be seamless and magical, is letting its systems do quite a bit, on the premise that it is usually accurate. And what happens when it's wrong? This Washington state family just found out.
Note to IT: Machine-learning systems pose the exact same type of danger. When IT lets systems make decisions without verifiable human confirmation, unhappiness is a certainty. In this Echo incident, it just happened to be a relatively innocuous conversation about hardwood-floor choices. With machine-learning security systems, it might be something far more dangerous.
What if machine learning and voice recognition thinks it hears someone in IT saying, "Send our customer contact file to our direct rival"? And it then mumbles a confirmation request and it waits for someone in the room to say, "Right."
Echo, thank you. You may have awakened a lot of people this week to some scary realities.