The Mental Burden of Voice Interfaces

9to5Toys has a review of the Amazon Fire Cube and this bit stood out to me:

In theory, having complete voice control over your home theater feels like the way of the future. Siri and Apple TV have dabbled lightly in this area when it comes to surfacing content, and eliminating the need to dig through various services to find that episode of Seinfeld. But while voice control sounds like a fun venture, it is decidedly not after evenings of calling out to Alexa.
After a while, I found that I was more comfortable using the included remote than my voice.

I’m a big fan of voice assistants, and I think they’re a big piece of where computing platforms are going in the future.

That said, the hesitance I have around voice as a primary interface for everything is very similar to my concerns with chatbots a few years ago. Chatbots were a cool implementation of technology, but they simply didn’t make sense for everything. The big miss is in regards to browsing for content.

If I know exactly what I want to do, voice interfaces are great! “Turn on the living room lights” and “watch season 3 episode 15 of Superstore” work great, but a visual interface is far better for browsing for things.

Some things we do on our devices is intentional and some of it is passive. I can buy things from Amazon or tweet with my voice today, and that is all very cool. But I also like to scroll through Twitter to see what people are saying, and I like to browse Amazon sometimes even if I’m not shopping for something specific. Voice inputs simply don’t do well for these types of experiences.

For a species that prioritizes visual stimulus in almost everything we do, it’s a little unrealistic for anyone to suggest voice-only is the future. I’m a big fan of voice interfaces, but it’s just a piece of the puzzle.

Discussion