Category
Theme

Note: This website was automatically translated, so some terms or nuances may not be completely accurate.

This article presents content originally published in "Design Mind," a design journal operated by frog, under the supervision of Mr. Noriaki Okada of Dentsu Inc. BX Creative Center.

frog

When it comes to communicating with machines, clicking a mouse used to be the norm. But today, we live in an era where computers respond to us with voice when we speak to them.

As an interaction designer, my job is to enable humans to communicate with computers. When I started working, communication with computers was largely done by clicking a mouse. My role was to guide users through the graphical user interface (GUI), showing them where to click and where to type on the keyboard. Now I also work on touch interfaces, enabling users to interact anytime, anywhere with small mobile computers (so-called smartphones) through taps and swipes.

In the future, voice user interfaces (VUI) seem poised to become the driving force of an interaction revolution. Until now, users had to learn how to use GUIs. But with voice technology, we can make computers "speak" our words.

At frog, we're also seeing increased demand for voice UI. Wanting to gain a broader industry perspective, I attended the second annual All About Voice conference in Munich last fall. Hosted by 169 Labs, a developer of voice applications, the conference covered a wide range of topics, from the current state of smart speakers to designing the personality of voice assistants. At conferences like this (though recently virtual), several key questions usually emerge. This time, they were: "Is voice even an important theme?" and "Is voice UI just a passing trend? Or will it fundamentally change how people communicate with the world?"

The conclusion? The paradigm shift toward voice UI is now imminent, and there's no turning back.

Generational Demand

Today's children live in a world where they can operate smartphones, lighting, and even home appliances simply by speaking to them. While touchscreens made computers more intuitive than before, I believe advances in voice UI might even eliminate the need to learn how to use those touchscreens. Anyone capable of conversation will be able to engage in natural, "voice-first" interactions. As someone belonging to the oldest segment of the millennial generation, I still feel a certain awkwardness about microwaves equipped with Amazon's Alexa voice assistant. But I imagine our children's generation won't even question it.

At the aforementioned conference, many speakers acknowledged that the underlying voice technology is still in its "growing pains" phase, yet its growth speed is remarkable. Smart headphones are rapidly spreading into homes and cars worldwide.

Andrea Muttoni of Amazon's Alexa development team frequently referenced this trend while discussing smart speaker-compatible devices launched in late September 2019. Muttoni openly stated that Amazon's vision is to embed Alexa into every conceivable device in daily life, from glasses to microwaves. "Alexa everywhere" is the goal, he declared.

Another example showcasing the momentum of voice technology is Google's "always listening" wireless earbuds, the Pixel Buds 2. Furthermore, Google's latest smartphone, the Pixel 4, features "Raise to talk," which activates Google Assistant the moment you lift the device. This indicates Google anticipates voice will eventually become the primary means of device interaction.

Why Voice Matters Now

There are many reasons why voice is considered the future interaction model. Let's explore a few.

1. The proliferation of smart speakers
Brett Kinsella, founder of Voicebot.ai, noted in a presentation that the number of US households with smart speakers increased by nearly 40% between 2018 and 2019. This means that by September 2019, approximately 32% of the US population, or over 80 million households, had a smart speaker. Adoption is also steadily increasing in the EU, with penetration rates at the end of 2019 reaching 21.1% in the UK and 11.6% in Germany.

2. Compatibility with an Inclusive Society
High-quality voice UIs are also key to an inclusive society. For people with disabilities affecting vision, mobility, or motor function, voice technology provides a means to communicate and control their lives in ways that suit them, both in physical activities and digital life. It also offers seniors and those prone to social isolation valuable opportunities to connect with others and find emotional comfort.

3. Speaking is Natural
Speaking is a far more natural method compared to click or touch interfaces. Of course, how natural it feels is heavily influenced by the personality of the voice UI. Adva Levin, founder and CEO of Pretzel Labs, which develops voice apps for children, shared this perspective: "Designing a voice assistant's personality is very much like creating a character. How old are they? What's their background? How do they speak?" These elements have now become critical design considerations.

How should we speak?

As humans, we have high expectations for technology that mimics us. Voice is fundamental to shaping a person; it's a far more intimate and emotionally resonant means of interaction than clicking a button. Precisely because of this, if a computer fails to respond well in conversation, the frustration will be much greater.

Unfortunately, conversation is inherently messy, even between people speaking the same language. While the human brain is adept at handling this messiness, computers are not. They prioritize logic over emotional nuance, making them highly prone to misinterpreting speech.

"The quality of a voice app will be judged by how it handles misunderstandings during conversation," said John Bloom, a senior conversation designer at Google, in a talk about error handling.

According to Bloom, one of the biggest challenges is "recognition." The problem here arises when the device cannot hear the user's voice (e.g., due to excessive background noise in the room) or when it cannot understand what the user is saying (e.g., due to long silences or unusual phrasing). Among the various possible scenarios, Bloom emphasizes that the most crucial thing is for the voice assistant to know when and how to ask for clarification in a way that feels comfortable for the user.

For example, a typical response when a voice assistant doesn't understand a user's request is to ask them to repeat the question or rephrase it. However, if this happens two or three times and the assistant still doesn't understand, it might be better to simply turn off the microphone and have the user start over, rather than continuing to frustrate them. Whether this is the "right" response depends on how far the conversation had progressed at that point and the nature of the conversation itself.

Another challenge Bloom highlights is the issue of human attention span (or rather, its brevity). While completing all travel reservations by voice sounds convenient, in reality, when there are 20 flights, most people won't wait for the computer to read each one aloud. In some cases, it's faster to display a list of 20 flights on a smartphone screen using a multi-modal approach and have the user look at that. Therefore, voice designers must determine what makes sense in each situation. This requires considering designs that converge on key points—the ability to see beyond "disciplinary walls." Fixating solely on voice can become a barrier to this kind of innovation.

frog and Voice UI

frog already works with numerous client companies aiming to integrate voice into their products and services across automotive, healthcare, consumer goods, and more. Recently, when advising companies, we increasingly examine these use cases and propose where voice can best enhance the customer experience. When should voice technology be used at home, in the car, or at work? In what situations is voice mode most efficient, or most enjoyable?

Conversely, understanding where to choose interaction models other than voice is also part of our responsibility. For example, in a car, functions operable by voice (like navigation systems or media players) should also support touch input, allowing users to switch to it in noisy environments. Furthermore, even if a destination is given to the navigation system via voice, it might be easier to confirm the route on the display map rather than listening to spoken directions afterward. As designers, we must understand that various situations are possible.

Until technology advances further and can handle complex instructions under difficult conditions, this multi-modal approach—the ability to switch between voice mode and modes using visuals or touch—should be an effective means of addressing the current limitations of voice assistants.

Designing an artificial yet "human-like" personality

This multi-modal approach often draws inspiration from convergent design techniques. Convergent design is a method of creating transformative solutions and experiences by integrating products, services, and digital technologies. At frog, when discussing this type of strategy with clients, we are sometimes asked for advice on an element unique to voice: personality. While GUI designers can express brand personality to some extent through chosen colors, typefaces, and images, designing voice UIs is entirely different.

In voice design, you cannot ignore words. Conversation itself is the interface, so you can't just placeholder text. Designers working with voice must understand how people respond emotionally to voice and to the distinct characteristics of different voices. This requires knowledge of social sciences like psychology, sociology, and linguistics, and sometimes even humanities fields like literature, philosophy, and history.

In his lecture, Kinsella stated, "What matters in voice is that people can be human. That machines understand us without us having to learn machine language." The possibility of bringing human-centered voice experiences within reach—or within earshot—of those who need them most excites me as an interaction designer.

This article is also published in the web magazine "AXIS".

Was this article helpful?

Share this article

Author

frog

frog

frog is a company that delivers global design and strategy. We transform businesses by designing brands, products, and services that deliver exceptional customer experiences. We are passionate about creating memorable experiences, driving market change, and turning ideas into reality. Through partnerships with our clients, we enable future foresight, organizational growth, and the evolution of human experience. <a href="http://dentsu-frog.com/" target="_blank">http://dentsu-frog.com/</a>

Also read