Output details
13 - Electrical and Electronic Engineering, Metallurgy and Materials
University of Birmingham : A - Electronic, Electrical and computer engineering
Gaze-contingent automatic speech recognition
Human communication is multi-modal, combining speech, gesture, lip-movement and gaze, particularly in noisy environments. The relationship between the focus of gaze and words spoken is variable, asynchronous and complex, representing a major challenge for automatic interpretation systems. This paper presents a new probabilistic model for fusing such “loosely coupled” modalities, which is demonstrated through the application of speech and gaze in a map navigation task. The results represent the first demonstration that the inclusion of gaze in speech recognition leads to performance improvements. This has significant potential impact for human computer interaction in any visually-oriented civil or military application