Designing Digital Future

How to perform software tests for voice apps?

Many products of daily use now have not only become smarter but also have integration for voice-first applications. Each device that connects to a voice-first app requires a custom skill that enables the interaction between the device and the user. These skills have to be tested thoroughly and perfected to give appropriate responses. It has gained mileage because it is more convenient for a user to speak with an assistant than to type. The results of voice queries are also faster.

With the increase in the use of digital assistants, optimizing for voice search is critical to your SEO success. Voice technology is another step in the direction of improving the user experience with semantics. It relies on Natural Language Processing (NLP) to recognize voice texture, interests, and behaviour. Testing remains one of the major bottlenecks in creating successful voice apps. Two factors determine the success of voice-first apps or devices: speed and accuracy. Most of the challenges that VA testers would face revolve around the following three challenges.

 

Challenges in voice-first skill testing:

 

High volume of datasets: When it comes to operating a voice-first device, all it involves is the user saying the “wake up” word, followed by the command or instruction and the device/application responding with the appropriate action. Though on the outside, this may look like a simple function, if we go deeper, we can see many layers of complexity.
Apart from the regular types of testing methodologies – unit, system, integration, performance, endurance, etc. – we also need to ensure that voice apps or devices have been tested for many different commands/utterances for the same skills.
Another thing testers need to keep in mind is that these services tend to evolve. Their responses may change due to knowledge gained over interactions as part of personalization services. This adds another layer of complexity.

So how do you ensure that you include different ways the question is spoken and then derive the correct answer from it? Skill testing involves a large number of data sets that will give appropriate responses no matter how the question is framed.

However, it is manually impossible to generate such an exhaustive list. Testers will not be able to test such high volumes when each sprint may only last for a few days.

Language and accents: It is evident that people speak different languages in different regions. Therefore, voice assistants need to be able to understand and answer queries in various languages. So how will we test skills in multiple languages? This is another challenge when it comes to selling your devices across the world. The most crucial factor here is personalizing the experience. Testers will not know all the languages in which the device needs to be tested.

Another factor that matters while doing voice-first testing is the accent – even when the language is the same, accents can vary significantly. A language like English, which is spoken the world over, requires much more testing as accents are different depending on different regions. Therefore, it becomes challenging to identify various dialects and test them to derive accurate results.

Different age groups: When we try to sell a product out in the market, we want to cater to a broad audience around the world. The audience can be of different age groups. Voice first based devices are not built for specific age groups. When you are catering to an audience of various age groups, you also want to ensure that it provides excellent user experience to all customers.
A young school kid would have different ways of formulating a question compared to that of an older person. Their needs would also be different. So the device needs to cater to queries of various age groups. This requires including various age groups in the data set. A list of questions that are corroborated by multiple age groups needs to be included. Now, this where the tester needs to create that kind of data sets.

 

How to overcome these challenges?

 

The key ingredient that can help testers overcome these challenges is test automation. When you have large volumes of datasets, it is challenging to impose manual testing. The solution here is the automation in voice-first testing.

It is capable of automating end-to-end use case from device to cloud, including device configuration, environment setup, connectivity testing, speech translation and recognition, multi-lingual support, device functionality, device to cloud connectivity, and data generation. Therefore, it covers the challenges we already addressed and a lot more than that.

 

Manual Testing Drawbacks

 

How would you go about structuring your testing environment to test these voice-first devices? Would you go the usual manual testing route or develop automation so that you can have a consistent workflow. Some drawbacks associated with manual testing are that it could be expensive to implement at a large scale. Offshoring this type of testing would be difficult for countries like India, China, and Vietnam if you wanted to test a primary English speaking skill. If you intended your app to be used globally, you would need to hire testers that could speak all the languages required by your desired skill. This could get quite expensive and time-consuming in locating the testers with the specific skillset. Having your testers vocally test the devices might not be feasible as well since you will probably end up with testers with hoarse throats after a long testing session. Some tools could facilitate this by generating the required commands you wanted to test beforehand in whatever language you desired.

The drawback of this method would be that everything would have to be generated beforehand, which is a time-consuming process. You might not be aware of all the commands you want to test as well. The rigidness of this method is a real detriment to testing these devices, which are conversational in nature. An example of this would be a pizza ordering application. Not all pizza chains offer the same toppings, so to test an app like this, you would have to compile all the various toppings that pizza chains can provide and have those lines recorded ahead of time. A dynamic solution would be preferred.

 

Automation Strengths

 

Automation testing of voice-first devices is the answer. In the voice user interface world of these voice-first devices, the platform is agnostic and device agnostic. Since we are testing on the voice level, the automated tests can be executed against any kind of voice apps running on the device. One set of tests have the capability to be utilized cross-platform, or cross-device. Automation does pose its own set of challenges. When testing a voice device, you will find that it does not always respond the same way to the same requests. Some reasons are that the speech recognition engine is not perfect, and it might have some difficulty in understanding you, or that it is programmed to respond differently to give a more human-sounding nature. So when automating these devices, the automation framework has to be smart enough in dealing with multi-turn conversations. The automation has to be able to interpret voice app responses and provide a response to keep the conversation going until it ends naturally. Being able to set the automation on autopilot makes the test execution more robust and useful.