OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Advances in Multimedia 2013

Multiple Feature Fusion Based on Co-Training Approach and Time Regularization for Place Classification in Wearable Video

DOI: 10.1155/2013/175064

Vladislavs Dovgalecs,Rémi Mégret,Yannick Berthoumieu

Full-Text Cite this paper Add to My Lib

Abstract:

The analysis of video acquired with a wearable camera is a challenge that multimedia community is facing with the proliferation of such sensors in various applications. In this paper, we focus on the problem of automatic visual place recognition in a weakly constrained environment, targeting the indexing of video streams by topological place recognition. We propose to combine several machine learning approaches in a time regularized framework for image-based place recognition indoors. The framework combines the power of multiple visual cues and integrates the temporal continuity information of video. We extend it with computationally efficient semisupervised method leveraging unlabeled video sequences for an improved indexing performance. The proposed approach was applied on challenging video corpora. Experiments on a public and a real-world video sequence databases show the gain brought by the different stages of the method. 1. Introduction Due to the recent achievements in the miniaturization of cameras and their embedding in smart devices, a number of video sequences captured using such wearable cameras increased substantially. This opens new application fields and renews the problematics posed to the Multimedia research community earlier. For instance, visual lifelogs can record daily activities of a person and constitute a rich source of information for the task of monitoring persons in their daily life [1–4]. Recordings captured using wearable camera depict a view that is inside-out, close to the subjective view of the camera wearer. It is a unique source of information, with applications such as a memory refresh aid or as an additional source of information for the analysis of various activities and behavior related events in healthcare context. This often comes at the price of contents with very high variability, rapid camera displacement, and poorly constrained environments in which the person moves. Search for specific events in such multimedia streams is therefore particularly challenging. As was shown in [5, 6], multiple aspects of the video content and its context can be taken into account to provide a complete view of activity related events: location, presence of objects or persons, hand movements, and external information such as Global Positioning System (GPS), Radio Frequency Identification (RFID), or motion sensor data. Amongst these, location is an important contextual information, that restricts the possible number of ongoing activities. Obtaining this information directly from the video stream is an interesting application in

References

[1]	A. Doherty and A. F. Smeaton, “Automatically segmenting lifelog data into events,” in Proceedings of the 9th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '08), pp. 20–23, May 2008.
[2]	E. Berry, N. Kapur, L. Williams et al., “The use of a wearable camera, SenseCam, as a pictorial diary to improve autobiographical memory in a patient with limbic encephalitis: a preliminary report,” Neuropsychological Rehabilitation, vol. 17, no. 4-5, pp. 582–601, 2007.
[3]	S. Hodges, L. Williams, E. Berry et al., “SenseCam: a retrospective memory aid,” in Proceedings of the 8th International Conference on Ubiquitous Computing (Ubicomp '06), pp. 177–193, 2006.
[4]	R. Mégret, D. Szolgay, J. Benois-Pineau et al., “Indexing of wearable video: IMMED and SenseCAM projects,” in Workshop on Semantic Multimodal Analysis of Digital Media, November 2008.
[5]	A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin, “Context-based vision system for place and object recognition,” in Proceedings of the 9th IEEE International Conference on Computer Vision, vol. 1, pp. 273–280, October 2003.
[6]	A. Quattoni and A. Torralba, “Recognizing indoor scenes,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '09), pp. 413–420, June 2009.
[7]	R. Mégret, V. Dovgalecs, H. Wannous et al., “The IMMED project: wearable video monitoring of people with age dementia,” in Proceedings of the International Conference on Multimedia (MM '10), pp. 1299–1302, ACM Request Permissionss, October 2010.
[8]	S. Karaman, J. Benois-Pineau, R. Mégret, V. Dovgalecs, J.-F. Dartigues, and Y. Ga？stel, “Human daily activities indexing in videos from wearable cameras for monitoring of patients with dementia diseases,” in Proceedings of the 20th International Conference on Pattern Recognition (ICPR '10), pp. 4113–4116, August 2010.
[9]	C. Schüldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local SVM approach,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), pp. 32–36, August 2004.
[10]	P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72, October 2005.
[11]	L. Ballan, M. Bertini, A. del Bimbo, and G. Serra, “Video event classification using bag of words and string kernels,” in Proceedings of the 15th International Conference on Image Analysis and Processing (ICIAP '09), pp. 170–178, 2009.
[12]	D. I. Kosmopoulos, N. D. Doulamis, and A. S. Voulodimos, “Bayesian filter based behavior recognition in workflows allowing for user feedback,” Computer Vision and Image Understanding, vol. 116, no. 3, pp. 422–434, 2012.
[13]	M. Stikic, D. Larlus, S. Ebert, and B. Schiele, “Weakly supervised recognition of daily life activities with wearable sensors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2521–2537, 2011.
[14]	D. H. Nguyen, G. Marcu, G. R. Hayes et al., “Encountering SenseCam: personal recording technologies in everyday life,” in Proceedings of the 11th International Conference on Ubiquitous Computing (Ubicomp '09), pp. 165–174, ACM Request Permissions, September 2009.
[15]	M. A. Perez-QuiNones, S. Yang, B. Congleton, G. Luc, and E. A. Fox, “Demonstrating the use of a SenseCam in two domains,” in Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06), p. 376, June 2006.
[16]	S. Karaman, J. Benois-Pineau, V. Dovgalecs et al., Hierarchical Hidden Markov Model in Detecting Activities of Daily Living in Wearable Videos for Studies of Dementia, 2011.
[17]	J. Pinquier, S. Karaman, L. Letoupin et al., “Strategies for multiple feature fusion with Hierarchical HMM: application to activity recognition from wearable audiovisual sensors,” in Proceedings of the 21 International Conference on Pattern Recognition, pp. 1–4, July 2012.
[18]	N. Sebe, M. S. Lew, X. Zhou, T. S. Huang, and E. M. Bakker, “The state of the art in image and video retrieval,” in Proceedings of the 2nd International Conference on Image and Video Retrieval, pp. 1–7, May 2003.
[19]	S.-F. Chang, D. Ellis, W. Jiang et al., “Large-scale multimodal semantic concept detection for consumer video,” in Proceedings of the International Workshop on Multimedia Information Retrieva (MIR '07), pp. 255–264, ACM Request Permissions, September 2007.
[20]	J. Ko？ecká, F. Li, and X. Yang, “Global localization and relative positioning based on scale-invariant keypoints,” Robotics and Autonomous Systems, vol. 52, no. 1, pp. 27–38, 2005.
[21]	C. O. Conaire, M. Blighe, and N. O’Connor, “Sensecam image localisation using hierarchical surf trees,” in Proceedings of the 15th International Multimedia Modeling Conference (MMM '09), p. 15, Sophia-Antipolis, France, January 2009.
[22]	J. Ko？ecká, L. Zhou, P. Barber, and Z. Duric, “Qualitative image based localization in indoors environments,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. II-3–II-8, June 2003.
[23]	Z. Zovkovic, O. Booij, and B. Krose, “From images to rooms,” Robotics and Autonomous Systems, vol. 55, no. 5, pp. 411–418, 2007.
[24]	L. Fei-Fei and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), pp. 524–531, June 2005.
[25]	D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), vol. 2, pp. 2161–2168, 2006.
[26]	O. Linde and T. Lindeberg, “Object recognition using composed receptive field histograms of higher dimensionality,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), vol. 2, pp. 1–6, August 2004.
[27]	S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), vol. 2, pp. 2169–2178, 2006.
[28]	A. Bosch and A. Zisserman, “Scene classification via pLSA,” in Proceedings of the 9th European Conference on Computer Vision (ECCV '06), May 2006.
[29]	J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings of the 9th IEEE International Conference On Computer Vision, pp. 1470–1477, October 2003.
[30]	J. Knopp, Image Based Localization [Ph.D. thesis], Chech Technical University in Prague, Faculty of Electrical Engineering, Prague, Czech Republic, 2009.
[31]	M. W. M. G. Dissanayake, P. Newman, S. Clark, H. F. Durrant-Whyte, and M. Csorba, “A solution to the simultaneous localization and map building (SLAM) problem,” IEEE Transactions on Robotics and Automation, vol. 17, no. 3, pp. 229–241, 2001.
[32]	L. M. Paz, P. Jensfelt, J. D. Tardós, and J. Neira, “EKF SLAM updates in O(n) with divide and conquer SLAM,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA '07), pp. 1657–1663, April 2007.
[33]	J. Wu and J. M. Rehg, “CENTRIST: a visual descriptor for scene categorization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1489–1501, 2011.
[34]	H. Bulthoff and A. Yuille, “Bayesian models for seeing shapes and depth,” Tech. Rep. 90-11, Harvard Robotics Laboratory, 1990.
[35]	P. K. Atrey, M. Anwar Hossain, A. El Saddik, and M. S. Kankanhalli, “Multimodal fusion for multimedia analysis: a survey,” Multimedia Systems, vol. 16, no. 6, pp. 345–379, 2010.
[36]	A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, “SimpleMKL,” The Journal of Machine Learning Research, vol. 9, pp. 2491–2521, 2008.
[37]	S. Nakajima, A. Binder, C. Müller et al., “Multiple kernel learning for object classification,” in Workshop on Information-based Induction Sciences, 2009.
[38]	A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman, “Multiple kernels for object detection,” in Proceedings of the 12th International Conference on Computer Vision (ICCV '09), pp. 606–613, October 2009.
[39]	J. Yang, Y. Li, Y. Tian, L. Duan, and W. Gao, “Group-sensitive multiple kernel learning for object categorization,” in Proceedings of the 12th International Conference on Computer Vision (ICCV '09), pp. 436–443, October 2009.
[40]	M. Guillaumin, J. Verbeek, and C. Schmid, “Multimodal semi-supervised learning for image classification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 902–909, Laboratoire Jean Kuntzmann, LEAR, INRIA Grenoble, June 2010.
[41]	J. Yang, Y. Li, Y. Tian, L. Duan, and W. Gao, “Multiple kernel active learning for image classification,” in Proceedings of IEEE International Conference on Multimedia and Expo (ICME '09), pp. 550–553, July 2009.
[42]	A. Abdullah, R. C. Veltkamp, and M. A. Wiering, “Spatial pyramids and two-layer stacking SVM classifiers for image categorization: a comparative study,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '09), pp. 5–12, June 2009.
[43]	J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998.
[44]	L. Ilieva Kuncheva, Combining Pattern Classifiers. Methods and Algorithms, Wiley-Interscience, 2004.
[45]	A. Uhl and P. Wild, “Parallel versus serial classifier combination for multibiometric hand-based identification,” in Proceedings of the 3rd International Conference on Advances in Biometrics (ICB '09), vol. 5558, pp. 950–959, 2009.
[46]	W. Nayer, Feature based architecture for decision fusion [Ph.D. thesis], 2003.
[47]	M.-E. Nilsback and B. Caputo, “Cue integration through discriminative accumulation,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), vol. 2, pp. II578–II585, July 2004.
[48]	A. Pronobis and B. Caputo, “Confidence-based cue integration for visual place recognition,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '07), pp. 2394–2401, October-November 2007.
[49]	A. Pronobis, O. Martinez Mozos, and B. Caputo, “SVM-based discriminative accumulation scheme for place recognition,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA '08), pp. 522–529, May 2008.
[50]	F. Lu, X. Yang, W. Lin, R. Zhang, R. Zhang, and S. Yu, “Image classification with multiple feature channels,” Optical Engineering, vol. 50, no. 5, Article ID 057210, 2011.
[51]	P. Gehler and S. Nowozin, “On feature combination for multiclass object classification,” in Proceedings of the 12th International Conference on Computer Vision, pp. 221–228, October 2009.
[52]	X. Zhu, “Semi-supervised learning literature survey,” Tech. Rep. 1530, Department of Computer Sciences, University of Winsconsin, Madison, Wis, USA, 2008.
[53]	X. Zhu and A. B. Goldberg, Introduction to Semi-Supervised Learning, Morgan and Claypool Publishers, 2009.
[54]	O. Chapelle, B. Scholkopf, and A. Zien, Semi-Supervised Learning, MIT Press, Cambridge, Mass, USA, 2006.
[55]	M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: a geometric framework for learning from labeled and unlabeled examples,” The Journal of Machine Learning Research, vol. 7, pp. 2399–2434, 2006.
[56]	U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007.
[57]	D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, and B. Scholkopf, “Learning with local and global consistency,” Advances in Neural Information Processing Systems, vol. 16, pp. 321–328, 2004.
[58]	S. Melacci and M. Belkin, “Laplacian support vector machines trained in the primal,” The Journal of Machine Learning Research, vol. 12, pp. 1149–1184, 2011.
[59]	B. Nadler and N. Srebro, “Semi-supervised learning with the graph laplacian: the limit of infinite unlabelled data,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS '09), 2009.
[60]	A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT' 98), pp. 92–100, October 1998.
[61]	D. Zhang and W. Sun Lee, “Validating co-training models for web image classification,” in Proceedings of SMA Annual Symposium, National University of Singapore, 2005.
[62]	W. Tong, T. Yang, and R. Jin, “Co-training For Large Scale Image Classification: An Online Approach,” Analysis and Evaluation of Large-Scale Multimedia Collections, pp. 1–4, 2010.
[63]	M. Wang, X.-S. Hua, L.-R. Dai, and Y. Song, “Enhanced semi-supervised learning for automatic video annotation,” in Proceedings of IEEE International Conference on Multimedia and Expo (ICME '06), pp. 1485–1488, July 2006.
[64]	V. E. van Beusekom, I. G. Sprinkuizen-Kuyper, and L. G. Vuurpul, “Empirically evaluating co-training,” Student Report, 2009.
[65]	W. Wang and Z.-H. Zhou, “Analyzing co-training style algorithms,” in Proceedings of the 18th European Conference on Machine Learning (ECML '07), pp. 454–465, 2007.
[66]	C. Dong, Y. Yin, X. Guo, G. Yang, and G. Zhou, “On co-training style algorithms,” in Proceedings of the 4th International Conference on Natural Computation (ICNC '08), vol. 7, pp. 196–201, October 2008.
[67]	S. Abney, Semisupervised Learning for Computational Linguistics, Computer Science and Data Analysis Series, Chapman & Hall, University of Michigan, Ann Arbor, Mich, USA, 2008.
[68]	D. Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” in Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (ACL '95), pp. 189–196, University of Pennsylvania, 1995.
[69]	W. Wang and Z.-H. Zhou, “A new analysis of co-training,” in Proceedings of the 27th International Conference on Machine Learning, pp. 1135–1142, May 2010.
[70]	C. M. Bishop, Pattern Recognition and Machine Learning. Information Science and Statistics, Springer, Secaucus, NJ, USA, 2006.
[71]	B. Scholkopf and A. J. Smola, Learning with Kernels, MIT Press, Cambridge, Mass, USA, 2002.
[72]	R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: a library for large linear classification,” The Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
[73]	A. J. Smola, B. Sch？lkopf, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998.
[74]	A. Pronobis, O. Martínez Mozos, B. Caputo, and P. Jensfelt, “Multi-modal semantic place classification,” The International Journal of Robotics Research, vol. 29, no. 2-3, pp. 298–320, 2010.
[75]	T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, “The elements of statistical learning: data mining, inference and predictionvolume,” The Mathematical Intelligencer, vol. 27, no. 2, pp. 83–85, 2005.
[76]	S. Rüping, A Simple Method For Estimating Conditional Probabilities For SVMs. American Society of Agricultural Engineers, 2004.
[77]	T. Tommasi, F. Orabona, and B. Caputo, “An SVM confidence-based approach to medical image annotation,” in Proceedings of the 9th Cross-Language Evaluation Forum Conference on Evaluating Systems for Multilingual and Multimodal Information Access (CLEF '08), pp. 696–703, 2009.
[78]	K. Grauman and T. Darrell, “The pyramid match kernel: discriminative classification with sets of image features,” in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), vol. 2, pp. 1458–1465, October 2005.
[79]	J. Luo, A. Pronobis, B. Caputo, and P. Jensfelt, “The KTH-IDOL2 database,” Tech. Rep., Kungliga Tekniska Hoegskolan, CVAP/CAS, 2006.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133