%0 Journal Article
%T Estimating a User's Internal State before the First Input Utterance
%A Yuya Chiba
%A Akinori Ito
%J Advances in Human-Computer Interaction
%D 2012
%I Hindawi Publishing Corporation
%R 10.1155/2012/865362
%X This paper describes a method for estimating the internal state of a user of a spoken dialog system before his/her first input utterance. When actually using a dialog-based system, the user is often perplexed by the prompt. A typical system provides more detailed information to a user who is taking time to make an input utterance, but such assistance is nuisance if the user is merely considering how to answer the prompt. To respond appropriately, the spoken dialog system should be able to consider the user＊s internal state before the user＊s input. Conventional studies on user modeling have focused on the linguistic information of the utterance for estimating the user＊s internal state, but this approach cannot estimate the user＊s state until the end of the user＊s first utterance. Therefore, we focused on the user＊s nonverbal output such as fillers, silence, or head-moving until the beginning of the input utterance. The experimental data was collected on a Wizard of Oz basis, and the labels were decided by five evaluators. Finally, we conducted a discrimination experiment with the trained user model using combined features. As a three-class discrimination result, we obtained about 85% accuracy in an open test. 1. Introduction Speech is the most basic medium of human-human communication and is expected to be one of the main modalities of more flexible man-machine interaction along with various intuitive interfaces rather than traditional text-based interfaces. One major topic of speech-based interfaces is the spoken dialog system. Studies on spoken dialog systems have attempted to introduce a user model, which models the user＊s internal states, to make the dialog more flexible. The user＊s internal states represent various aspects of the user, such as belief [1], preference [2], emotion [3], and familiarity with the system [4每6]. These aspects can also be categorized according to their persistency: as the user＊s knowledge and preference are persistent, they can be used for personalizing of the dialog system [7]. Other internal states such as emotion or belief are transient and so are used for making a dialog more natural and smooth. These kinds of internal state should be estimated session-by-session. In this paper, we focus on the latter, transient states. These internal states are estimated based on the verbal and nonverbal information included in the interaction between the user and the system. In this work, we consider a system-initiative dialog system that presents a prompt message at the beginning of a session. In such a system, a session between the
%U http://www.hindawi.com/journals/ahci/2012/865362/