Given the state of the art of current language and speech technology, errors are unavoidable in present-day spoken dialogue systems. Therefore, one of the main concerns in dialogue design is how to decide whether or not the system has understood the user correctly. In human-human communication, dialogue participants are continuously sending and receiving signals on the status of the information being exchanged. We claim that if spoken dialogue systems were able to detect such cues and change their strategy accordingly, the interaction between user and system would improve. The goals of the present study are therefore twofold: (i) to find out which positive and negative cues people actually use in human-machine interaction in response to explicit and implicit verification questions and how informative these signals are, and (ii) to explore the possibilities of spotting errors automatically and on-line. To reach these goals, we first perform a descriptive analysis, followed by experiments with memory-based machine learning techniques. It appears that people systematically use negative/marked cues when there are communication problems. The experiments using memory-based machine learning techniques suggest that it may be possible to spot errors automatically and on-line with high accuracy, in particular when focussing on combinations of cues. This kind of information may turn out to be highly relevant for spoken dialogue systems, e.g., by providing quantitative criteria for changing the dialogue strategy or speech recognition engine.
Krahmer, E. J., Swerts, M. G. J., Theune, M., & Weegels, M. F. (2001). Error detection in spoken human-machine interaction. International Journal of Speech Technology, 4(1), 19-30. https://doi.org/10.1023/A:1009648614566, https://doi.org/10.1080/10630739708724546