Should you tell customers they're talking to AI?

When chatbots reach the level of always passing the Turing Test, and can flawlessly pass for human with every interaction, do you disclose to users that it's AI? That's the dilemma.

Pay attention to Amazon. The company has a proven track record of mainstreaming technologies.

Amazon single-handedly mainstreamed the smart speaker with its Echo appliance, first released in November 2014. Or consider their role in mainstreaming enterprise on-demand cloud services with Amazon Web Services (AWS). That's why a new Amazon service for AWS should be taken very seriously.

Amazon last week introduced a new service for AWS customers called Brand Voice, which is a fully managed service within Amazon's voice technology initiative, Polly. The text-to-speech service enables enterprise customers to work with Amazon engineers to create unique, AI-generated voices.

It's easy to predict that Brand Voice leads to a kind of mainstreaming of voice as a form of "sonic branding" for companies, which interacts with customers on a massive scale. ("Sonic branding" has been used in jingles, sounds products make, and very short snippets of music or noise that reminds consumers and customers about brand. Examples include the startup sounds for popular versions of the Mac OS or Windows, or the "You've got mail!" statement from AOL back in the day.)

In the era of voice assistants, the sound of the voice itself is the new sonic branding. Brand Voice exists to enable AWS customers to craft a sonic brand through the creation of a custom simulated human voice, that will interact conversationally via customer-service interacts online or on the phone.

The created voice could be an actual person, a fictional person with specific voice qualities that convey the brand -- or, as in the case of Amazon's first example customer, somewhere in between. Amazon worked with KFC in Canada to build a voice for Colonel Sanders. The idea is that chicken enthusiasts can chit-chat with the Colonel via Alexa. Technologically, they could have simulated the voice of KFC founder Harland David Sanders. Instead, they opted for a more generic Southern-accented voice. This is what it sounds like.

Amazon's voice generation process is revolutionary. It uses a generative neural network that converts individual sounds a person makes while speaking into a visual representation of those sounds. Then a voice synthesizer converts those visuals into an audio stream, which is the voice. The result of this training model is that a custom voice can be created in hours, rather than months or years. Once created, that custom voice can read text generated by the chatbot AI during a conversation.

Brand Voice enables Amazon to leap-frog over rivals Google and Microsoft, which each has created dozens of voices to choose from for cloud customers. The problem with Google's and Microsoft's offerings, however, is that they're not custom or unique to each customer, and therefore are useless for sonic branding.

But they'll come along. In fact, Google's Duplex technology already sounds notoriously human. And Google's Meena chatbot, which I told you about recently, will be able to engage in incredibly human-like conversations. When these are combined, with the added future benefit of custom voices as a service (CVaaS) for enterprises, they could leapfrog Amazon. And a huge number of startups and universities are also developing voice technologies that enable customized voices that sound totally human.

How will the world change when thousands of companies can quickly and easily create custom voices that sound like real people?

We'll be hearing voices

The best way to predict the future is to follow multiple current trends, then speculate about what the world looks like if all those trends continue until that future at their current pace. (Don't try this at home, folks. I'm a professional.)

Here's what's likely: AI-based voice interaction will replace almost everything.

  • Future AI versions of voice assistants like Alexa, Siri, Google Assistant and others will increasingly replace web search, and serve as intermediaries in our formerly written communications like chat and email.
  • Nearly all text-based chatbot scenarios -- customer service, tech support and so -- will be replaced by spoken-word interactions. The same backends that are servicing the chatbots will be given voice interfaces.
  • Most of our interaction with devices -- phones, laptops, tablets, desktop PCs -- will become voice interactions.
  • The smartphone will be largely supplanted by augmented reality glasses, which will be heavily biased toward voice interaction.
  • Even news will be decoupled from the news reader. News consumers will be able to choose any news source -- audio, video and written -- and also choose their favorite news "anchor." For example, Michigan State University got a grant recently to further develop their conversational agent, called DeepTalk. The technology uses deep learning to enable a text-to-speech engine to mimic a specific person's voice. The project is part of WKAR Public Media's NextGen Media Innovation Lab, the College of Communication Arts and Sciences, the I-Probe Lab, and the Department of Computer Science and Engineering at MSU. Their goal is to enable news consumers to pick any actual newscaster, and have all their news read in that anchor's voice and style of speaking.

In a nutshell, within five years we'll all be talking to everything, all the time. And everything will be talking to us. AI-based voice interaction represents a massively impactful trend, both technologically and culturally.

The AI disclosure dilemma

As an influencer, builder, seller and buyer of enterprise technologies, you're facing a future ethical dilemma within your organization that almost nobody is talking about. The dilemma: When chatbots that speak with customers reach the level of always passing the Turing Test, and can flawlessly pass for human with every interaction, do you disclose to users that it's AI?

That sounds like an easy question: Of course, you do. But there are and will increasingly be strong incentives to keep that a secret -- to fool customers into thinking they're speaking to a human being. It turns out that AI voices and chatbots work best when the human on the other side of the conversation doesn't know it's AI.

A study published recently in Marketing Science called "The Impact of Artificial Intelligence Chatbot Disclosure on Customer Purchases: found that chatbots used by financial services companies were as good at sales as experienced sales people. But here's the catch: When those same chatbots disclosed that they weren't human, sales fell by nearly 80 percent.

It's easy now to advocate for disclosure. But when none of your competitors are disclosing and you're getting clobbered on sales, that's going to be a tough argument to win.

Another related question is about the use of AI chatbots to impersonate celebrities and other specific people -- or executives and employees. This is already happening on Instagram, where chatbots trained to imitate the writing style of certain celebrities will engage with fans. As I detailed in this space recently, it's only a matter of time before this capability comes to everyone.

It gets more complicated. Between now and some far-off future when AI really can fully and autonomously pass as human, most such interactions will actually involve human help for the AI -- help with the actual communication, help with the processing of requests and forensic help analyzing interactions to improve future results.

What is the ethical approach to disclosing human involvement? Again, the answer sounds easy: Always disclose. But most advanced voice-based AI have elected to either not disclose the fact that people are participating in the AI-based interactions, or they mostly bury the disclosure in the legal mumbo jumbo that nobody reads. Nondisclosure or weak disclosure is already the industry standard.

When I ask professionals and nonprofessionals alike, almost everybody likes the idea of disclosure. But I wonder whether this impulse is based on the novelty of convincing AI voices. As we get used to and even expect the voices we interact with to be machines, rather than hominids, will it seem redundant at some point?

Of course, future blanket laws requiring disclosure could render the ethical dilemma moot. The State of California passed last summer the Bolstering Online Transparency (BOT) act, lovingly referred to as the “Blade Runner” bill, which legally requires any bot-based communication that tries to sell something or influence an election to identify itself as non-human.

Other legislation is in the works at the national level that would require social networks to enforce bot disclosure requirements and would ban political groups or people from using AI to impersonate real people.

Laws requiring disclosure reminds me of the GDPR cookie code. Everybody likes the idea of privacy and disclosure. But the European legal requirement to notify every user on every website that there are cookies involved turns web browsing into a farce. Those pop-ups feel like annoying spam. Nobody reads them. It's just constant harassment by the browser. After the 10,000th popup, your mind rebels: "I get it. Every website has cookies. Maybe I should immigrate to Canada to get away from these pop-ups."

At some point in the future, natural-sounding AI voices will be so ubiquitous that everyone will assume it's a robot voice, and in any event probably won't even care whether the customer service rep is biological or digital.

That's why I'm leery of laws that require disclosure. I much prefer self-policing on the disclosure of AI voices.

IBM published last month a policy paper on AI that advocates guidelines for ethical implementation. In the paper, they write: “Transparency breeds trust; and the best way to promote transparency is through disclosure, making the purpose of an AI system clear to consumers and businesses. No one should be tricked into interacting with AI.” That voluntary approach makes sense, because it will be easier to amend guidelines as culture changes than it will to amend laws.

It's time for a new policy

AI-based voice technology is about to change our world. Our ability to tell the difference between a human and machine voice is about to end. The tech change is certain. The culture change is less certain.

For now, I recommend that we technology influencers, builders and buyers oppose legal requirements for the disclosure of AI. voice technology, but also advocate for, develop and adhere to voluntary guidelines. The IBM guidelines are solid, and worth being influenced by.

Oh, and get on that sonic branding. Your robot voices now represent your company's brand.

Copyright © 2020 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon