IT Management

One way of helping mobile workers access enterprise applications is by providing a data interface that operates on the worker’s Smartphone. However, you can also reach out to mobile workers through a voice interface to back-end applications. To do this, you need software that can perform both Automatic Speech Recognition (ASR) and Text To Speech (TTS). 

TTS is relatively easy. Many people use this feature on their laptops. The voice sounds a little mechanical, and most systems have trouble pronouncing certain words, but overall, it does the job. The tougher part is to perform the automatic speech recognition. If the system is only required to recognize the speech patterns of a small set of users, it has much less work to do than if it needs to recognize all possible accents and sound envelopes expressed by the general population. Typically, there’s a trade-off in voice recognition systems between a large user base and large vocabulary. Each takes a lot of processing power, so a given system generally supports either a large user base or a large vocabulary, but not both. 

Once the system is able to recognize spoken words, the next challenge is to pick out sentences and assign meaning to what the user said. Computer technology still isn’t at a state where it can recognize natural language, so the system has to coach—or prompt—the user to say things in a way it can understand. Once users realize they can’t talk to the system as they would another human, the interface can be quite useful for certain applications. 

Here’s an example: Let’s say a field service automation application dispatches an engineer by calling his cell phone and informing him through TTS that there’s an outstanding work order and providing necessary details. The engineer then responds with commands to do things such as ask for more information, accept the dispatch, or decline the order. The same system might allow the worker to close an order after the job is finished.

 Although this set of procedures could be implemented by having a human make the call, a single dispatcher can service only a certain number of field engineers. In addition, after placing the call, the dispatcher must enter status changes and other updates into the system; it’s hard to ensure this is accurately done. 

Another way of providing this functionality is through a data interface, whereby a text message is sent to the engineer, prompting him to click on a URL to get more information. However, the speech-enabled field service automation system has many advantages over this approach. When exchanges are short, users frequently prefer the voice interface over the data interface. Some people don’t like the limited input systems provided by Smartphones, and sometimes engineers have a hard time stopping what they’re doing to use both hands to enter data. 

The voice interface is often provided by a software package independent from the enterprise application. The speech-enablement software takes text from the application and converts it to voice, which it speaks to the user. When the user talks, the software converts speech to text, provides some sanity checking, and then passes text back to the application. 

A voice interface isn’t useful in all circumstances; for example, if there’s a lot of interaction between the user and the application. However, when the exchange is simple, a voice interface is sometimes the most natural way for workers in the field to access an enterprise application using a Smartphone.