Why is retail afraid of voice recognition for mobile apps?

Retailers have overwhelmingly avoided any attempts to dabble with voice recognition, but that is a huge error.

Apple CarPlay

Retailers have overwhelmingly avoided any attempts to dabble with voice recognition — and even mobile music integration is just being addressed slightly — but that is a huge error.

One of the inherent limitations in GUI design for mobile as well as desktop devices is the pulldown menu. Granted, it comes with the huge plus of being able to deliver a clean and relatively effortless experience. Instead of having to limit how customers phrase a question, your pulldown forces them to use pre-programmed answers.

That is, in effect, forcing shoppers to interact with your system in your way, rather than your system tailoring its content to the customer. This tactic was used by designers 10 years ago because technology limitations made it the best option. But technology has come a long way since then, and GUIs need to incorporate those changes.

Envision a shopper driving to the mall being able to ask his Macy's mobile app, "Limiting your answers to blouses size XX to ZZ, in either blue, green or yellow, tell me everything that your store at location YYYY has in stock from either designer 1234 or 6789."

It's certainly true that your system may not be able to handle question requests phrased so elaborately. But even that could be a positive. With the current pulldown approach, if your pulldowns frustrate a customer, there is no practical way for you to learn that. The customer clicks away unhappy, and you'll have no idea why.

The open-ended approach, especially when integrated with voice recognition, will detail for your team every request that was made and failed, including when it was made and sometimes who made it.

Why is that useful? Let's say the shopper used a mobile cookie and was known to the system. How powerful would it be for that shopper to receive a call from customer service, apologizing for your system's failure and — this is crucial — providing an explicit answer. As in, "I am sorry that our automated system didn't understand your request, but I do. I checked and here's what that store has in stock, in the sizes/colors and designers you specified." Think you have a good shot at closing that sale?

Also, how valuable will that information be to your analytics team? The pulldown approach blocks feedback, whereas open-ended automatically delivers it.

This kind of voice system requires two basic software skills: voice recognition and natural-language interpretation (a.k.a., the system's ability to figure out what the shopper is probably trying to ask). Both capabilities have improved dramatically in the last couple of years, to the point where something that wasn't feasible in 2013 may indeed work decently today.

How much have they improved? Let's take a look.

There have been plenty of reviews of the leading mobile voice systems — mostly Apple's Siri, Amazon's Alexa, Google Now and Windows' Cortana — but the reviews typically find that each one thrives in different areas. Siri usually comes out slightly ahead among voice systems trying to do it all, which is what most customers want.

I have recently been testing the latest revision of Apple CarPlay, which integrates Siri into the more challenging world of a car.

I have a few concerns — and lots of retail-relevant praise — for CarPlay, but I need to first echo a complaint that many other CarPlay reviews have had: the means of interface. CarPlay strongly prefers a hard-wired connection, which is fine, although a bit old-fashioned. Given that there also is support for Bluetooth, why not allow simultaneous connections? That way, if the wire gets tugged loose while driving on the highway at 72 mph (excuse me: In case any state troopers, or my wife, are reading this, I meant to say 55 mph), the Bluetooth connection keeps the music playing and the map directions directing.

Speaking of which, it would be nice if the map would download enough data to keep working even if the connection is briefly lost. Heck, even my 11-year-old Toyota's original nav system (which I have refused to upgrade) can keep the directions flowing if the satellite signal gets blocked.

But I digress. The point of CarPlay is that it uses Siri to allow for a wide range of Apple capabilities to happen — including those of an ever-increasing list of third-party apps — without the driver having to take his/her eyes off of the road. Great goal. Now if Apple had only chosen to actually deliver that consistently, we'd be somewhere.

Let's be fair. CarPlay is layering difficulties on top of difficulties. It has to deal with any problems inherent with Siri as well as all of the problems of getting third-party apps to behave as expected and then all of the challenges of a moving car.

For example, one of CarPlay's big claims to fame is making phone calls. When it works, it's impressive. A week ago, while driving, I remembered that I had a doctor's appointment and I was therefore going to be a tad late. I told Siri to call Dr. X. Siri said that it had found three listings and read them off. Not quite sure what to do, I said "Call the second one." And it did. Impressive.

But the next day, I asked it to call a business that I knew was in my contacts. It said that it found two listings. And yet I didn't hear the familiar beep that indicates Siri is listening. I spoke my response but nothing happened. I asked again and got the same response. Turns out that this time, for no particular reason, Siri displayed the choices on my screen and I had to touch the one I wanted. Remember that part about not having to look at the screen while driving? Oh well.

Today I tried another of Siri's most bragged-about functions. I told Siri, "Remind me to drop off the books in my trunk at the library when I leave home." It acknowledged and indeed flashed up the precisely phrased reminder. When I left home an hour later, I was several blocks away when I realized that I had heard no reminder.

"Hey, Siri." Siri activated and the conversation felt like I was talking with a 4-year-old. "Siri, what are my reminders?" There was just that one. "When are you to remind me?" When you leave your house. "Have I left my house?" Yes. Pause. I wanted to ask it "So why didn't you deliver the reminder?" but I figured it would say some variation of "I don't know" and I would give up.

But in general, it is surprisingly intelligent about performing a variety of functions and dealing with them however I phrase things. If I remember the name of a song wrong, it's uncannily clever about figuring out the song in my collection that I probably meant.

It's particularly good at finding new places — those that are not in my contacts — and then directing me there. Problem: It will identify a place and ask me if I want to be directed there. I'll say yes, but instead of doing it, a screen will pop up asking for confirmation. Remember, Apple, that the whole point of Apple CarPlay is for voice to handle almost everything so that I don't crash into a tree. Pretend it's an apple tree, if that helps.

All of these gripes aside, if Apple— not to mention Google, Amazon and even Microsoft — can make all of these interactions happen in natural voice, why has this magic yet to appear in the mobile apps of Walmart, Macy's, Costco, Walgreens, Target, Home Depot or Safeway?

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon