Fighting Bias Voice AI

Fighting Bias in Voice Artificial Intelligence

November – December 2021 (2 months)
Working solo

Background

The world is becoming increasingly digitized every year. One of the ethical challenges it faces is adopting inclusivity and tolerance for all values. In the first part assignment for this project, I analyzed how various AI (Artificial Intelligence) powered devices can be biased against different groups of people.

The Problem

I decided to narrow the “Bias in AI” topic down to the AI voice technologies that recognize real-time human language and respond to it. Recent studies show that voice recognition technologies are still biased against groups such as older people and males versus females (with a preference towards females). However, individuals with a non-native English tongue are the most affected by biased voice technologies. So the question to consider is:

How might we fight bias in speech recognition and NLP (Natural Language Processing) technologies so that technology solutions are inclusive for all?

DISCOVER (RESEARCH)

Exploratory Research

Key findings from the exploratory research

1. Today’s natural language processing and speech recognition algorithms are unintentionally biased against various groups of people.
2. The worst affected group is non-native English speakers and people with distinct accents.
3. This results in increased financial costs due to additional customer service time, user disappointment, and decline in customer loyalty, and the opportunity cost of missed sales for home speech recognition devices.

Origins of AI

AI is a research field that has been around since the 1940s. The whole idea of Artificial intelligence is to try to understand the nature and design of “intelligent agents”.
AI aims to find a way to artificially mimic human thinking, feeling, and other human features. With modern technology, we have never been closer to achieving this aim.
Machine learning is a subdivision of AI that has gained a lot of applications in the business world. In machine learning, experts do not have to provide knowledge of the system.

What is Bias in machine learning?

If they favour one social group against another, machine learning algorithms will continue doing that, forming preferential bias towards or against one social group.

The Washington Post collaborated with two research organizations to evaluate thousands of voice commands given by more than 100 persons in nearly 20 cities to examine the accent imbalance of intelligent speakers. They discovered that the systems revealed significant differences in how individuals from various regions of the United States are perceived.
However, those with foreign accents experienced the most significant difficulties. In one investigation, the system demonstrated that speech from the test group had roughly 30% more mistakes when compared to what Alexa believed it had heard:

Overall accuracy by accent group from Washington Post

Explanatory Research (Challenges in the NLP Sector )

Key findings:

  • We can categorize NLP algorithms into two high-level categories by the input type: Speech and text-based NLPs. For this research, we will be focusing only on speech-based NLPs.
  • There are many different applications for speech and text-based algorithms. All these applications are created by companies that use commercially available APIs (application programming interfaces) so they do not create the algorithms. Companies that provide APIs control and create the algorithms.
  • All NLP Algorithms get better at understanding language by analyzing speech input.
  • Users whose speech NLP algorithms do not understand well enough may choose not to use NLP-powered devices and services. Unfortunately, this situation may create a toxic feedback loop because, at the same time, the only way for NLPs to learn is by listening to distinctive accents.
Research Process & Approach

In my research of the challenges for the NLP sector, I continue looking into available studies and literature, hoping to:
1) Understand how the NLPs work;
2) Figure out what the touchpoints for the user experience are;
3) Look into possible reasons for biases.

What is NLP, and how does it work

There have been many attempts to define or summarise what natural language processing is. In general, it attempts to comprehend written and spoken language using technology. As mentioned above, the modern NLP is tackled by artificial intelligence and Machine learning algorithms. NLP has many applications, such as text recognition for web scraping, automatic article writing, analyzing user-generated content for censorship, and many more.

Classification of NLP algorithms


At a very high level and by function, we can categorize NLP technologies into two main types:

  • Speech recognition technologies
  • Text recognition technologies.

Most NLPs are “hidden” from the user, and the user never gets to interact with them. However, some NLPs have direct access and interaction with the users, and for this research, I will only be looking at them.

Natural language processing applications categorisation
Top Three Commercially Available NLP APIs For Speech Recognition

1. IBM Watson Speech-to-Text API

  • Transcribes English speech to text;
  • Provides developers with speech transcription abilities for applications;
  • Speech recognition accuracy is dependable on the quality of input audio;
  • Service only transcribes words it knows.

2. Google Assistant API

  • Can be embedded into devices to enable voice control;
  • Provides a way to have a conversation with smart devices;
  • Voice control for phone applications, speakers, smart displays, automobiles, watches, laptops, TV, and other Google Home devices;
  • Allows to conduct a search, manage tasks, add reminders, and control home devices.

3. Houndify by SoundHound

  • Voice-enabled conversation functionality
  • It enables developers to integrate this functionality into their apps or create new ones.

What Causes Biases In NLP Technologies?

According to the Washington Post article (reviewed in the previous chapter), Algorithms need diverse inputs to learn and ultimately understand different accents. However, data Scientists say that too many people training, testing, and working with the systems sound the same. That means accents that are less common or prestigious are more likely to be misunderstood, met with silence, or the dreaded “Sorry, I didn’t get that.” (Harwell, 2018).
In other words, one of the main functions of the algorithm is to self-educate and self-correct. However, NLPs face the toxic feedback loop as those exact user groups who would be the most beneficial for the algorithms to listen to avoid talking to it due to shame and frustration of not being understood.

User Interview and Testing

To better understand and empathize with the users trying to use the NLPs, I conducted quick user research with one of my close friends, a non-native English speaker with a strong accent.
I asked him to perform five queries* with Google phone assistant:

  • Please find a carrot cake recipe.
  • How long does it take to drive to Wexford from Bray?
  • Who is the tallest man on the planet?
  • What year did the first moon landing happen?
  • What are the opening times in Smyths Carrikmines today?

Google Assistant was great at catching everything he said. After the test, I conducted a short user interview that facilitated building the empathy map that led to developing a primary persona in the Define/Synthesis phase.

Note: this research does not indicate real-world situations because users do not ask pre-formulated questions in the real world. However, I was more interested in his feedback and thoughts than the performance of Google’s NLP API.

DEFINE (SYNTHESIS)

Affinity Diagram

I started the “Define” phase of the UX Process by synthesizing user research findings in Affinity Diagram. First, I reviewed User tests and noted the meaningful insights on post-it notes. Then I categorized the messages into distinctive categories. Different colours mean different respondents (users and stakeholders).

User Empathy Map and Persona

User Pain-Points interview and testing facilitated building the empathy map that led to developing a primary persona – Miguel Hernandez.

User Empathy Map

Primary Persona

DEVELOP(IDEATION)

Ideation Techniques

Mindmapping

During the exploratory and later explanatory research, I gathered a lot of information about the bias in NLP Algorithithms and the effect on users and businesses.

Using the mindmap technique in this proposal
The mindmap I came up with (Figure1) helped me understand that I should focus on finding ways to increase diversity in voice input data.

Mindmap ideation for reducing Bias in Voice AIs
Crazy Eights

The next step in the ideation process was coming up with as many ideas as possible. I thought the most suitable way to develop rapid ideas was to use the strictly timed Crazy 8s technique.

Crazy 8s for this UX proposal
Using the Crazy 8s technique, I devised eight sketches representing eight ideas for tackling bias in voice technologies. I then discussed the ideas with my UX design lecturer. This discussion was a great help in narrowing down and polishing the final concept.

Crazy Eights Sketch

Rapid Prototype

Rapid Prototype

Journey Map – Context

Google is aware of bias in its NLP algorithms and concerned about its NLP API being sold to other companies; it has come up with a solution.
The solution is to design a cheaper version of the Nest device, hoping it will appeal to a broader variety of users and motivate them to buy and use the device despite their worries about their voices not being recognized.
In addition, it comes with an educational marketing campaign targeting the above audience in the US, Ireland, and the UK.

Journey Map Actors

Journey Map

Journey map

Business Opportunity Discovered
I discovered the opportunity to include a language learning module in the new assistant device. I assumed that only a reduced price alone would not be a sufficient factor for increasing the sales for the device (which in turn would provide it with a broader range of accents).
Therefore, a language training module should serve as an attractive feature and draw a unique range of people with accents that are more difficult to understand (which, in this case, is very beneficial to the algorithm).

DELIVER (IMPLEMENTATION)

Hypothesis Statement

We believe that a new Google Assistant device – Nest Lingua with a well-executed educational/marketing campaign and an English Language Training module will draw a broader range of people to use it. According to our research, this will result in less biased NLP algorithms because of more diverse learning datasets.

We will measure the success by:
1. The sales amount of Nest Lingua devices and the number of people who turn the language training module on;
2. By conducting experiments similar to the Washington Post experiment every six months. We aim to ensure that understanding the accents of distinct languages reaches an understanding of white individuals from the West Coast USA.


Prototype: Google Nest Landing Page

NEST homepage prototype

Unmoderated Usability Testing

User Screening
To perform unmoderated user research, I use a usertesting.com website. It allows specifying a target audience by nationality, interests, occupation and many other factors. They also allow using of screening questions.
Deciding what users to include/exclude from the research was very tough. I was puzzled about limiting myself to English-speaking countries and the rest of the world.
After much hesitation, I decided to opt for all countries and introduced several screener questions. I based my decision on the fact that there are more non-native English speakers worldwide than in English – speaking countries, and I will be more likely to find suitable candidates if I don’t restrict countries.
I decided to look for someone who does not have native or excellent English skills and is interested in improving their English. I wanted to test if the prototype would be appealing for this category.
They would also be technologically savvy as they should know what AI-based assistants are and be familiar with the YouTube Music service.

Testing Social Media Advert

By asking the following questions, I wanted to know if the message was clear if they knew what we were selling and where the advert was leading them. I was also interested in whether the design and composition were appealing.

  1. This is a sponsored ad on Facebook. Please explain in your own words what you think this advert offers. [Verbal response]
  2. How likely would you click on it if you saw it in your Facebook feed? [Multiple choice: Not likely, Likely, Very likely]
  3. In the last question, we asked how likely you would click on this ad. Would you please explain your answer? [Verbal response]
Testing Landing Page

I designed 1-4 questions to examine if the user grasps the website’s purpose, the USP (unique selling proposition), whether the pricing is straightforward and the overall visual attractiveness of the site.
Question 5 seeks to unveil any missing information on the website that I could include. It also provides valuable ideas for the video content.
Questions 9-11: This user research was unmoderated, meaning I could not witness a user trying to use and engage with the prototype. However, I wanted to include several questions to help me know if the website’s navigation is straightforward and if users understand the iconology.

  1. Here you can see a homepage prototype (the links are not working yet). What do you think is the primary purpose of this website?
    [Verbal response]
  2. Looking at this website, what are the main differences between regular and lingua Nest versions? [Verbal response]
  3. How much is this product? What is included in the price? [Verbal response]
  4. How visually attractive is this website to you? [5-point Rating scale: Very attractive to Very unattractive]
  5. At the bottom of the website, there is a video. What would you expect to see in that video? [Verbal response]
  6. Supposing you’d like to buy this product. What steps would you take? [Verbal response]
  7. What do you think will happen if you click on the first (from the right) icon on the top bar of the page? [Verbal response]
  8. What do you think will happen if you click on the SHOP link? [Verbal response]
  9. How would you improve this website? Is there anything missing (e.g. is there something you would like to know before deciding to go ahead with the purchase)? [Verbal response]
  10. How appealing is the offer to get a free YouTube Music subscription for a year?
    (Multiple choice):
    -Not appealing, I don’t care about YouTube Music,
    -Not appealing, I already have a YouTube Music account,
    -Appealing, I would love to have YouTube Music for a year,
    -Not sure yet, I don’t know what YouTube Music is.
  11. What features of the Nest Lingua product are the most attractive to you? [Verbal response]
  12. What concerns might stop you from buying this device? [Verbal response]

Now retrospectively, I know I made several mistakes when designing this questionnaire. It turns out; users are good at verbalizing their responses, so there was no need to include questions that explain their multiple-choice answers.
Also, I should have included a couple of additional screener questions, such as if they are familiar with youtube music or assistants, as one of the users was utterly unaware of what an assistant is.
However, using userresearch.com for the first time, I was impressed by the quality of responses and the tones of valuable and actionable information I obtained.

Design Iterations: Advert

I tested the prototype and the social media advert with three users. For this research, I will call them User A, User B, and User C. All three are male, in their 20s (20, 29 and 22 years old, respectively). User A and User B come from India and are Indian, while User C comes from the United Kingdom but was born in Middle East Arabic countries. All three are non-native English speakers.

The good:
All three users think that advert is appealing and visually attractive. They love bright colours. User A and User C indicated that they are likely or very likely to click if they saw it in their Facebook feed.

Improvements:
User B said he would not click on the ad because he bought a similar assistant a couple of months ago and doesn’t think he needs a second one.
User A thought we should highlight Nest Lingua more, especially where it says “Google’s assistant and language trainer.”

As a result, I made the following changes:

  • “Google Assistant and language trainer” text moved below Nest Lingua;
  • Font Color changed from dark to red to highlight;
  • The price moved down towards the product to give it significance;
  • Slightly increased “Nest Lingua” text font size.
advert before
Before usability test
advert after
After usability test

Landing Page Testing Notes

1. Overall Understanding of the Website’s Idea, Pricing and Unique Selling Proposition

The good:
All three users understand the primary purpose of the website. Users B and C saw and read the subhead, while User A did not.
User B and User C think the website is very appealing and modern
All three users understood the pricing and the discount promotion.

Improvements:
User A could not explain the difference between a regular Nest and a Nest Lingua (he didn’t notice the subhead of the website. The subhead should be more apparent in presenting the USP of the product.

User A thinks the website is not appealing and looks somewhat fishy. Too basic and too minimalistic. He doesn’t like a logo or recognise the NEST word as a logo. He thinks there should be a “proper” logo at the top left-hand side of the page. He also believes that the website is too “crammed” and that the space is used inefficiently.

User B did not get the idea at first because he didn’t know there was a scroll down. However, when he came across the section that explained how the language module works, he became very excited about the product. He also mentioned that he has an audible subscription and listens to audiobooks to learn English better.

Google Nest Landing Page before iterations
2. Expected Video Content

Users have similar ideas for the video.
For example, user A would like to see “how the device would look in the room”, its visual properties, how it is used, and understand the USP (because he didn’t notice the subhead).
User B wants to see how the language module works and whether the conversations are live, real-time, or pre-recorded. Finally, user C was interested in unpackaging and hands-on product use.

3. Testing Usability

The good:
All three users understood the icons and links on the prototype. Also, they all could ideally indicate the correct steps towards purchasing the product.

Improvements:
During User Testing of this section, I discovered the following usability pain points for the users:
The scroll bar or some link to the bottom is missing (heuristics: status of the journey)
The Shop link and the cart icon on the top bar are too close, and User B made a good point that it was confusing.

4. Validating the Offer and Testing Its Presentation

The good:
All three users read and understand the How-To section (how the language module works).
User B and User C think the website is beautiful and well-marketed.

User B was particularly interested in the language component and would like to hear more about it. User C also mentioned that the language component is a great addition, although the “traditional assistant features” would be more enticing.

Users A and C find the YouTube Music subscription appealing, while User B says he does not care about it.

Improvements:
User A thinks the design should be more attractive; he misses the reviews section (he believes they should go after the video). He also noted shipping speed and cost were missing.
User B mentioned that he would like to know if the price is for the whole range of NEST Lingua devices or just for this single model.
User C would like to hear if the device supports native languages other than English if a person doesn’t speak English.
Users A and B were not happy about the price. They think the price is too high, while User C thinks it is very affordable.

When asked what concerns might stop them from buying the device, User A also mentioned other competitors (he believes other competitors on the market provide the same product but are cheaper).
Users B and C also said they already own similar devices (both mentioned Amazon Alexa), which would be their primary concern against buying the Nest Lingua. However, user B also noted that the device he already owns understands his accent well enough, so he doesn’t find accent-friendliness as a unique selling point. In contrast, the language training sounds very attractive to him.

Google Nest Landing Page before iterations

Design Iterations: Landing Page

Landing page testing resulted in the following changes:

  • Changed the header icon placement.
  • Changed the “user” icon symbol in the header.
  • Changed the text underneath the main head to: “We created this Nest device to comprehend distinct English accents better. The new Nest Lingua provides real-time conversational English Training. It is a unique solution for people worldwide who want to improve their language skills.”
  • I moved the Nest Lingua image to the right and included information about shipping, returns and warranty on the left.
  • Introduced a scroll-down button.
  • Added a headline to the user testimonials section.
  • Included an Icon to see more user testimonials.
Google Nest Landing Page before iterations
Before usability test
Landing page After usability test
After usability test