Rise of Voice UI: A New Era of Human-Computer Interaction

Key Takeaways

Voice user interfaces (VUIs) rely on advanced AI processes, including speech-to-text (STT), natural language understanding (NLU), and text-to-speech (TTS), to enable seamless conversations between humans and machines.
VUIs can enhance business operations through use cases like simulation and training (e.g., virtual subject matter experts for sales teams), dissemination and retrieval (e.g., digital twins for customer care), and acquisition and automation (e.g., voice AI for candidate intake in recruitment).
People naturally anthropomorphize machines, making it crucial to design AI voices and personas with relatable, human-like qualities to improve user experience and engagement.
As VUIs become more human-like, businesses must evaluate the ethical implications, ensuring alignment with values, transparency in AI usage, and a focus on beneficial outcomes for users.

Listen: Rise of voice UI: a new era of human-computer interaction.

How many of you know the name HAL 9000? It’s hard to believe that it’s been more than half a century since the 1968 film, 2001: A Space Odyssey, graced the silver screen. For much of the general public, the fictional HAL 9000—the hauntingly calm-voiced artificial intelligence (AI) character aboard the Discovery—was our first introduction to the concept of voice user interface, or VUI. The film, and HAL in particular, offered a glimpse into an exciting but uncertain future of human and machine interaction.

Now, the future is here. Contemporary companies are faced with AI tools developing at the speed of light. Leaders must determine how they want to use these tools—and, most importantly, how to do it ethically.

Exploring the vast world of voice user interface (VUI)

Let’s dig deeper into how we got here. Most of us are familiar with VUI or voice UI technologies like Siri or Alexa. But the very recent advent of Large Language Models (LLMs) like Chat-GPT and improvements in Natural Language Understanding (NLU) have put us on the precipice of a fundamental change in the way we interact with data and machines.

For the last 50 years, the dominant paradigm for human computer interface (HCI) has been the graphical user interface or GUI. Apple famously created the desktop graphical user interface, which made it easy for users to read, write, and delete data with a simple metaphor of desktop and folders instead of command line interfaces like DOS. Now, you could do things like click and drag on a folder and drag it to the trash instead of issuing a dos command. The GUI for personal computers helped usher in a new age of personal computing.

Over the last 40 years, the paradigm for human computer interaction has barely changed. We still use a mouse and graphical interface to perform most of the tasks with our computers. Though this paradigm has been wildly successful, this paradigm has barely changed in 40 years! Now, with complex computing tasks and the advent of large language models, is it not time for something new? Are VUIs the answer?

The reality is that we won’t necessarily see one HCI paradigm supplant the other, but we will see a hybrid form of HCI emerge as we see the use of VUI powered by AI come online. For example, say we want to edit a set of photos. We may still use a mouse and GUI to push around pixels and do fine adjustments, but perhaps now we will use VUIs to ask the computer nicely to please search for and show us photos of a man with a taxi in an urban scene in our database. This is possible because of significant advances in LLM technology and natural language understanding (NLU). Because of AI, we now have machines that understand their users' intent, know our preferences, and anticipate our needs—even our moods! With AI and LLMs, it will be possible to create “emotional interfaces” that can better interact with human users. In addition to impacting many people's daily lives, AI-powered voice UI, also known as voice bots, can potentially transform all types of different business sectors.

What is voice UI?

But what is VUI? VUI is a new interface layer consisting of voice and character, both of which need to be designed to interact with their human users. Like Siri, a voice must be created that a user feels comfortable interacting with. Also, the tone and choice of words create a persona or a character, which can motivate or dissuade a user from interacting with the VUI.

November 11, 2024

How creative leaders can transform AI hype into high value.

Insights from InsideOut, Leadership & Management, Content & Creative, Innovation & Emerging Tech, Technology

By Susie Hall

Back to Siri and Alexa. Most of us think of these VUIs as very limited in regard to how we can interact with them. Now, with the addition of LLMs, we can make them robust characters capable of remarkable human-like interaction with the intelligence of a PhD! These capabilities create many possibilities as well as concerns for the ethical use of these technologies.

Voice interfaces enable us to interact with machines in a way that is fundamentally more human. Ray Kurzweil, a famous inventor, futurist, and author, argues that by making machines emotional, people can better engage with them and thus be more productive. While this idea of emotional computing may seem like a sci-fi fantasy a la the movie Her, recent developments with LLMs are making this possible.

Have you talked to an AI before? Explore the ideas in this blog with our Voice AI here. Simply click the floating green dot to begin.

How do VUIs work?

So, how do VUIs work? In the simplest terms, VUIs as an interface can be embedded into any device that has a speaker and microphone, pretty much any device we now have! Additionally, they can be embedded in a web page, or accessed by phone or a hardware device like a smart speaker.

Here’s a general flow for how VUIs work:

When a person speaks to voice-powered AI, the machine must listen and use NLU to convert what it hears into intelligible language.

Next, the user's voice is converted to text, known as speech-to-text (STT).

After the text is passed to the LLM, like Meta's Llama, the LLM needs to formulate a response as text output.

That text response needs to be converted back to voice, known as text-to-speech (TTP). A voice service like ElevenLabs helps speak the text back to the user.

All of this must be done on the fly and in the Cloud. This means that most of the computational work is done on a remote server, not a device.

Voice communication between humans, by contrast, seems to happen without any effort, but there is vast computation happening in our brains. When a human and a machine converse, we are modeling some of these speech processes of humans but with an artificial brain—the LLM.

To make these conversations feel more natural and accessible, there is a need for Designers to create front-end features of this interface, just like with GUIs. This roughly consists of developing synthetic voices, prompt engineering the LLM to behave in a certain way, and training the LLM with the correct knowledge to make its responses useful.

The advantages of voice user interface in business

Let’s look at another pop culture example. In Star Trek IV: The Voyage Home, Spock must relearn everything he knows after recovering from his injuries. He stands surrounded by screens where he must master a lifetime of knowledge. He simultaneously talks with various VUIs that teach him and test him as he rapidly grows his knowledge base. At the same time, he talks to the computers, he also inputs some responses to queries into touchscreens. This example points out that depending on the type of task, certain HCI paradigms are more useful.

Here’s a VUI example from the mobility space. If someone is driving, keeping their eyes on the road is the most important thing. So, voice commands to do things we used to do with our hands, like turn on the AC, make a phone call, or answer a message, will now become more important. It may also mean that you develop a much more personal and affectionate relationship with your car! Who wants to talk to a robot with no personality?

Humans have a natural tendency to anthropomorphize cats, dogs, and even our cars. This natural tendency highlights one of the basic needs to do the same with our machines. So, how will we use these new capabilities?

Let’s look at the recruiting industry. At Aquent—a staffing and managed services company—there’s an opportunity to create synthetic personas to accomplish many complex tasks. As I see it, the use cases for voice AI fall under three areas:

Simulation and training.
Dissemination and retrieval.
Acquisition and automation.

Simulation and training

Let’s first consider a scenario in which a Sales Team will be selling a new product or service. Often, a Sales Enablement Team will create training materials and will use subject matter experts to help Sales Teams understand and sell the products. What if we create a set of virtual SMEs with voice AIs and make these available to sales teams 24 hours a day?

We can also build in quiz-like modules to make sure they have proficiency in the subject to set them up for success. Often with the growing array of products and services, people in sales face a steep learning curve to sell a product effectively. Creating knowledge bases with human-like personas is a potentially powerful tool to help sales understand and sell a product.

Dissemination and retrieval

Conversely, voice AI agents could be used to proactively communicate and stay in touch with our clients. As Customer Care Agents' workloads increase, they could have a “digital twin” of themselves that sends messages to clients, follows up with them to see if they need anything, and if the voice AI sees something that needs immediate attention, alerts the CSM. Though this starts to blur the line between human and machine communications, we have seen automated messages and communications for a long time. Here, it is becoming more intelligent.

Acquisition and automation

Finally, we must see the potential of using these agents as a way to obtain information from job candidates. An example is when talent want to sign up with an agency. They can input their information by typing or speaking to our voice AI, who will have a casual conversation to interview and record their information, post it to an internal database, and then make that available to recruiters who may be looking for similar job candidates.

The question is, will candidates like or accept this type of pre-interview? That depends mainly on the quality of the interaction, how a staffing company discloses the use of AI, and whether it results in returns on our investments in these technologies. These are exciting times indeed, and many questions remain to be answered.

Ethical considerations to be aware of with voice AI

Voice AI is an exciting frontier for emerging technologies. For many, it will literally embody the idea of AI because it makes it so human-like. Similar to the character of Samantha in Her, people will start to relate differently to machines that seem so human. Is this a good thing?

February 19, 2024

In an AI-driven world, humans are still irreplaceable.

Career Advice, Leadership & Management, Talent Acquisition & Recruitment, Technology

By Simon Lusty

Sam Altman, the CEO of Open AI, thinks so and is actively working to give AI a human-like voice. Regarding the capability of voice AI systems, he said, “I suspect that in a couple of years on almost any topic, the most interesting, maybe the most empathetic conversation that you could have will be with an AI.”

Before jumping all in, we must ask the hard questions of why and how we will use automation technologies and AI-powered voice bots in our enterprises. Then, we must look at our values and see if they align with the consequences of our choices. We must ask ourselves if the voice of AI is an ethical one. That is to say, are we using it in a way that privileges people first? In our rush for ever more efficient processes, are we thinking about the human first and the reciprocal effects on our lives? One must suspect that whoever holds the keys to these technologies might be winner-take-all. A human-centered approach to the AI process might help us answer what we should and should not do with these technologies.

For example, should we automate all processes that we can so we can create maximum efficiency without considering where in the system humans end up? How does that potential displacement affect the health of a society? Should corporations reap all the rewards? Can we design AI systems that enable instead of eliminate human labor?

Every corporation should draft an AI ethics charter and set an example for how we ought to use AI. Like with our sustainability goals, we make a positive contribution to society by reducing our carbon footprint. Should AI's ethical footprint be included in every company charter? The Facebook credo of “Move fast and break things” may not work with AI.

The future of conversational AI

VUI signals a significant paradigm shift with computers. However, the power of these increasingly human-like interfaces demands careful consideration. While advancements in NLP and empathetic AI personas hold promise for creating more authentic and engaging VUI interactions, addressing inherent biases and fostering user acceptance of AI-driven interviews remain crucial hurdles. As business leaders embrace this technology, a thoughtful and ethical approach is paramount. Prioritizing transparency in AI usage ensures beneficial outcomes for both businesses and individuals will be essential in shaping a future where VUIs are a fixture of our everyday lives.

Consulting & Operations (78)