%0 Journal Article %T Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations %A Md. Rabiul Islam %A Md. Abdus Sobhan %J Applied Computational Intelligence and Soft Computing %D 2014 %I Hindawi Publishing Corporation %R 10.1155/2014/831830 %X The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI) system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM) is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCCs) are combined to get the audio feature vectors and Active Shape Model (ASM) based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA) method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features. 1. Introduction Human speaker identification is bimodal in nature [1, 2]. In a face-to-face conversation, we listen to what others say and at the same time observe their lip movements, facial expressions, and gestures. Especially, if we have a problem in listening due to environmental noise, the visual information plays an important role for speech understanding [3]. Even in the clean environment, speech recognition performance is improved when the talking face is visible [4]. Generally, it is true that audio-only speaker identification system is not sufficiently adequate to meet the variety of user requirements for person identification. The AVSI system promises to alleviate some of the drawbacks encountered by audio-only identification. Visual speech information can play an important role in the improvement of natural and robust human-computer interaction [5, 6]. Indeed, various important human-computer components, such as speaker identification, verification [7], localization [8], speech event detection [9], speech signal separation [10], coding [11], video indexing and retrieval [12], and text-to-speech [13], have been shown to benefit from the visual channel [14]. Audio-visual identification system can significantly improve the performance of %U http://www.hindawi.com/journals/acisc/2014/831830/