%0 Journal Article %T Mobile Visual Recognition on Smartphones %A Zhenwen Gui %A Yongtian Wang %A Yue Liu %A Jing Chen %J Journal of Sensors %D 2013 %I Hindawi Publishing Corporation %R 10.1155/2013/843727 %X This paper addresses the recognition of large-scale outdoor scenes on smartphones by fusing outputs of inertial sensors and computer vision techniques. The main contributions can be summarized as follows. Firstly, we propose an ORD (overlap region divide) method to plot image position area, which is fast enough to find the nearest visiting area and can also reduce the search range compared with the traditional approaches. Secondly, the vocabulary tree-based approach is improved by introducing GAGCC (gravity-aligned geometric consistency constraint). Our method involves no operation in the high-dimensional feature space and does not assume a global transform between a pair of images. Thus, it substantially reduces the computational complexity and memory usage, which makes the city scale image recognition feasible on the smartphone. Experiments on a collected database including 0.16 million images show that the proposed method demonstrates excellent recognition performance, while maintaining the average recognition time about 1£¿s. 1. Introduction In recent years, smartphone has developed rapidly, almost all of inexpensive smartphones are equipped with cameras, GPS, wireless network, and gravity sensing. The improvements in imaging capabilities and computational power have given rise to many exciting mobile applications. Among these is mobile visual location recognition where users can take pictures of the place of interest by using their smartphone, to find the corresponding information related to the captured landmark anywhere [1¨C3]. Most current applications adopt client-server (C/S) mode to transfer image information [4, 5] (such as compressed image, image descriptors, and image location) to a remote server through, wireless network or 3G, on which a searching process will be carried out to, then the related information will be returned to phones for observation. In such systems, sets of local features [6¨C9] are used to represent images information, and image matching algorithms are based on vocabulary tree (VT) [10¨C12]. Features of the query image are quantized into visual words through the VT algorithm and then scalable textual indexing and retrieval schemes are applied to find similar candidate image from the database [10]. However, there are some inherent limits in the existing systems. For example, the growing city scale candidate images need more time for retrieval, which will affect the efficiency of mobile visual recognition applications. Moreover, the words quantization losing discriminative power and spatial relations of the features will %U http://www.hindawi.com/journals/js/2013/843727/