全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Mobile Visual Recognition on Smartphones

DOI: 10.1155/2013/843727

Full-Text   Cite this paper   Add to My Lib

Abstract:

This paper addresses the recognition of large-scale outdoor scenes on smartphones by fusing outputs of inertial sensors and computer vision techniques. The main contributions can be summarized as follows. Firstly, we propose an ORD (overlap region divide) method to plot image position area, which is fast enough to find the nearest visiting area and can also reduce the search range compared with the traditional approaches. Secondly, the vocabulary tree-based approach is improved by introducing GAGCC (gravity-aligned geometric consistency constraint). Our method involves no operation in the high-dimensional feature space and does not assume a global transform between a pair of images. Thus, it substantially reduces the computational complexity and memory usage, which makes the city scale image recognition feasible on the smartphone. Experiments on a collected database including 0.16 million images show that the proposed method demonstrates excellent recognition performance, while maintaining the average recognition time about 1?s. 1. Introduction In recent years, smartphone has developed rapidly, almost all of inexpensive smartphones are equipped with cameras, GPS, wireless network, and gravity sensing. The improvements in imaging capabilities and computational power have given rise to many exciting mobile applications. Among these is mobile visual location recognition where users can take pictures of the place of interest by using their smartphone, to find the corresponding information related to the captured landmark anywhere [1–3]. Most current applications adopt client-server (C/S) mode to transfer image information [4, 5] (such as compressed image, image descriptors, and image location) to a remote server through, wireless network or 3G, on which a searching process will be carried out to, then the related information will be returned to phones for observation. In such systems, sets of local features [6–9] are used to represent images information, and image matching algorithms are based on vocabulary tree (VT) [10–12]. Features of the query image are quantized into visual words through the VT algorithm and then scalable textual indexing and retrieval schemes are applied to find similar candidate image from the database [10]. However, there are some inherent limits in the existing systems. For example, the growing city scale candidate images need more time for retrieval, which will affect the efficiency of mobile visual recognition applications. Moreover, the words quantization losing discriminative power and spatial relations of the features will

References

[1]  G. Takacs, V. Chandrasekhar, N. Gelfand et al., “Outdoors augmented reality on mobile phone using loxel-based visual feature organization,” in Proceedings of the 1st International ACM Conference on Multimedia Information Retrieval (MIR '08), pp. 427–434, New York, NY, USA, August 2008.
[2]  G. Schroth, R. Huitl, D. Chen, M. Abu-Alqumsan, A. Al-Nuaimi, and E. Steinbach, “Mobile visual location recognition,” IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 77–89, 2011.
[3]  S. S. Tsai, D. Chen, G. Takacs et al., “Fast geometric re-ranking for image-based retrieval,” in Proceedings of the 17th IEEE International Conference on Image Processing (ICIP '10), pp. 1029–1032, Hong Kong, September 2010.
[4]  D. M. Chen, G. Baatz, K. K?ser et al., “City-scale landmark identification on mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 737–744, Providence, RI, USA, June 2011.
[5]  G. Baatz, K. Koeser, D. Chen, R. Grzeszczuk, and M. Pollefeys, “Handling Urban Location Recognition as a 2D Homothetic Problem,” in Proceedings of the IEEE European Conference on Computer Vision, pp. 266–279, 2010.
[6]  D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[7]  H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: speeded up robust features,” in Proceedings of the 9th IEEE European Conference on Computer Vision, pp. 404–417, Graz, Austria, May 2006.
[8]  V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, “CHoG: compressed histogram of gradients a low bit-rate feature descriptor,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09), pp. 2504–2511, Miami, Fla, USA, June 2009.
[9]  S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: binary robust invariant scalable keypoints,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV '11), pp. 2548–2555, Barcelona, Spain, November 2011.
[10]  D. Nistér and H. Stewénius, “Scalable recognition with a vocabulary tree,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), vol. 2, pp. 2161–2168, June 2006.
[11]  z. Wu, Q. Ke, M. Isard, and J. Sun, “Bundling features for large scale partial-duplicate web image search,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09), pp. 25–32, Miami, Fla, USA, June 2009.
[12]  Z. Wu, Q. Ke, J. Sun, and H.-Y. Shum, “A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval,” in Proceedings of the 12th International Conference on Computer Vision (ICCV '09), pp. 1992–1999, Kyoto, Japan, October 2009.
[13]  J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings of the 9th IEEE International Conference on Computer Vision, vol. 2, pp. 1470–1477, Nice, France, October 2003.
[14]  O. Chum and J. Matas, “Matching with PROSAC-Progressive sample consensus,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 1, pp. 220–226, June 2005.
[15]  X. Wang, M. Yang, and K. Yu, “Efficient re-ranking in vocabulary tree based image retrieval,” in Proceedings of the 45th Asilomar Conference on Signals, Systems and Computers, pp. 855–859, 2011.
[16]  H. Jégou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in Proceedings of the European Conference on Computer Vision, pp. 304–317, October 2008.
[17]  X. Wang, M. Yang, T. Cour, S. Zhu, K. Yu, and T. X. Han, “Contextual weighting for vocabulary tree based image retrieval,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV '11), pp. 209–216, Barcelona, Spain, November 2011.
[18]  A. Kumar, J.-P. Tardif, R. Anati, and K. Daniilidis, “Experiments on visual loop closing using vocabulary trees,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '08), pp. 1–8, Anchorage, Alaska, USA, June 2008.
[19]  D. Kurz and S. Ben Himane, “Inertial sensor-aligned visual feature descriptors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11), pp. 161–166, Providence, RI, USA, June 2011.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413