%0 Journal Article
%T Multiple Feature Fusion Based on Co-Training Approach and Time Regularization for Place Classification in Wearable Video
%A Vladislavs Dovgalecs
%A Rémi Mégret
%A Yannick Berthoumieu
%J Advances in Multimedia
%D 2013
%I Hindawi Publishing Corporation
%R 10.1155/2013/175064
%X The analysis of video acquired with a wearable camera is a challenge that multimedia community is facing with the proliferation of such sensors in various applications. In this paper, we focus on the problem of automatic visual place recognition in a weakly constrained environment, targeting the indexing of video streams by topological place recognition. We propose to combine several machine learning approaches in a time regularized framework for image-based place recognition indoors. The framework combines the power of multiple visual cues and integrates the temporal continuity information of video. We extend it with computationally efficient semisupervised method leveraging unlabeled video sequences for an improved indexing performance. The proposed approach was applied on challenging video corpora. Experiments on a public and a real-world video sequence databases show the gain brought by the different stages of the method. 1. Introduction Due to the recent achievements in the miniaturization of cameras and their embedding in smart devices, a number of video sequences captured using such wearable cameras increased substantially. This opens new application fields and renews the problematics posed to the Multimedia research community earlier. For instance, visual lifelogs can record daily activities of a person and constitute a rich source of information for the task of monitoring persons in their daily life [1–4]. Recordings captured using wearable camera depict a view that is inside-out, close to the subjective view of the camera wearer. It is a unique source of information, with applications such as a memory refresh aid or as an additional source of information for the analysis of various activities and behavior related events in healthcare context. This often comes at the price of contents with very high variability, rapid camera displacement, and poorly constrained environments in which the person moves. Search for specific events in such multimedia streams is therefore particularly challenging. As was shown in [5, 6], multiple aspects of the video content and its context can be taken into account to provide a complete view of activity related events: location, presence of objects or persons, hand movements, and external information such as Global Positioning System (GPS), Radio Frequency Identification (RFID), or motion sensor data. Amongst these, location is an important contextual information, that restricts the possible number of ongoing activities. Obtaining this information directly from the video stream is an interesting application in
%U http://www.hindawi.com/journals/am/2013/175064/