全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于预测一致性嵌入的注视目标检测
Gaze Target Detection Based on Predictive Consistency Embedding

DOI: 10.12677/JISP.2023.122015, PP. 144-157

Keywords: 注视目标检测,注视跟随,域自适应,RGB图像,深度图像
Gaze Target Detection
, Gaze Follow, Domain Adaptation, RGB Image, Depth Image

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文研究了第三人称视角下图像的注视目标检测问题我们提出了一个深度架构推断场景中的人在看哪里。该模型在蕴含丰富上下文信息的场景图像、深度图像和头部图像上进行训练。与现有的技术不同,我们的模型不需要监视注视角度,不依赖于头部方向信息和眼睛信息。大量的实验表明,我们的方法在多个基准数据集上具有更强的性能。我们还研究了注视目标检测的域自适应方法,使用一致性嵌入确保源域和目标域对齐,使得我们的模型能够有效地处理数据集之间的间隙。
In this paper, we study the problem of gaze target detection in images from the third person perspective. We propose a deep architecture to infer where people are looking in the scene. The model is trained on scene image, depth image and head image containing rich contextual information. Unlike existing technologies, our model does not need to monitor gaze angles and does not rely on head direction information and eye information. A large number of experiments show that our method has stronger performance on multiple benchmark data sets. We also study a domain adaptive approach to gaze target detection, using consistency embedding to ensure the alignment of source and target domains, so that our model can effectively deal with gaps between datasets.

References

[1]  Fang, Y., Tang, J.P., Shen, W., Shen, W., Gu, X., Song, L. and Zhai, G.T. (2021) Dual Attention Guided Gaze Target Detection in the Wild. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 11390-11399.
https://doi.org/10.1109/CVPR46437.2021.01123
[2]  Niewiadomski, R., Chauvigne, L., Mancini, M. and Camurri, A. (2018) Towards a Model of Nonverbal Leadership in Unstructured Joint Physical Activity. Proceedings of the 5th International Conference on Movement and Computing (MOCO’18), Genoa, 28-30 June 2018, 1-8.
https://doi.org/10.1145/3212721.3212816
[3]  Thakur, S.K., Beyan, C., Morerio, P. and Del Bue, A. (2021) Predicting Gaze from Egocentric Social Interaction Videos and IMU Data. Proceedings of 2021 International Conference on Multimodal Interaction (ICMI’ 21), Montreal, 18-22 October 2021, 717-722.
https://doi.org/10.1145/3462244.3479954
[4]  Chong, E.J., Ruiz, N., Wang, Y.X., Zhang, Y., Rozga, A. and Rehg, J.M. (2018) Connecting Gaze, Scene and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency. ECCV 2018: 15th European Conference, Munich, 8-14 September 2018, 397-412.
https://doi.org/10.1007/978-3-030-01228-1_24
[5]  Chong, E.J., Wang, Y.X., Ruiz, N. and Rehg, J.M. (2020) Detecting Attended Visual Targets in Video. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 5396-5406.
https://doi.org/10.1109/CVPR42600.2020.00544
[6]  Hu, Z., Yang, D., Cheng, S., Zhou, L., Wu, S. and Liu, J. (2022) We Know Where They Are Looking at From the RGB-D Camera: Gaze Following in 3D. IEEE Transactions on Instrumentation and Measurement, 17, 1-14.
https://doi.org/10.1109/TIM.2022.3160534
[7]  Zhang, X., Huang, M.X., Sugano, Y. and Bulling, A. (2018) Training Person-Specific Gaze Estimators from User Interactions with Multiple Devices. Proceedings of 2018 CHI Conference on Human Factors in Computing Systems, Montreal, 21-26 April 2018, 1-12.
https://doi.org/10.1145/3173574.3174198
[8]  Liu, M., Li, Y. and Liu, H. (2020) 3D Gaze Estimation for Head-Mounted Eye Tracking System with Auto-Calibration Method. IEEE Access, 8, 104207-104215.
https://doi.org/10.1109/ACCESS.2020.2999633
[9]  Lian, D., Yu, Z. and Gao, S. (2018) Believe It or Not, We Know What You Are Looking At! ACCV 2018: 14th Asian Conference on Computer Vision, Perth, 2-6 December 2018, 35-50.
https://doi.org/10.1007/978-3-030-20893-6_3 Lian D, Yu Z, Gao S. Believe it or not, we know what you are looking at![C]//Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14. Springer International Publishing, 2019: 35-50.
[10]  Recasens, A., Khosla, A., Vondrick, C. and Torralba, A. (2015) Where Are They Looking? Advances in Neural Information Processing Systems, 28, 199-207.
[11]  Jin, T., Yu, Q., Zhu, S., Lin, Z., Ren, J., Zhou, Y. and Song, W. (2022) Depth-Aware Gaze-Following via Auxiliary Networks for Robotics. Engineering Applications of Artificial Intelligence, 113, Article ID: 104924.
https://doi.org/10.1016/j.engappai.2022.104924
[12]  Ranftl, R., Lasinger, K., Hafner, D., Schindler, K. and Koltun, V. (2020) Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1-14.
[13]  Li, Y., Liu, M. and Rehg, J. (2021) In the Eye of the Beholder: Gaze and Actions in First Person Video. IEEE Transactions on Pattern Analysis and Machine Intelligence.
https://doi.org/10.1109/TPAMI.2021.3051319
[14]  Min, K. and Corso, J.J. (2021) Integrating Human Gaze into Attention for Egocentric Activity Recognition. Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2021, 1069-1078.
https://doi.org/10.1109/WACV48630.2021.00111
[15]  Dohan, M. and Mu, M. (2019) Understanding User Attention In VR Using Gaze Controlled Games. Proceedings of 2019 ACM International Conference on Interactive Experiences for TV and Online Video (TVX’ 19), Salford, 5-7 June 2019, 167-173.
https://doi.org/10.1145/3317697.3325118
[16]  Wei, P., Liu, Y., Shu, T., Zheng, N. and Zhu, S.-C. (2018) Where and Why are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Lake City, 18-23 June 2018, 6801-6809.
https://doi.org/10.1109/CVPR.2018.00711
[17]  Marin-Jimenez, M.J., Kalogeiton, V., Medina-Suarez, P. and Zisserman, A. (2019) LAEO-Net: Revisiting People Looking at Each Other in Videos. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 3472-3480.
https://doi.org/10.1109/CVPR.2019.00359
[18]  Yang, X., Xu, F., Wu, K., Xie, Z. and Sun, Y. (2021) Gaze-Aware Graph Convolutional Network for Social Relation Recognition. IEEE Access, 9, 99398-99408.
https://doi.org/10.1109/ACCESS.2021.3096553
[19]  Zhuang, N., Ni, B., Xu, Y., Yang, X., Zhang, W., Li, Z. and Gao, W. (2019) Muggle: Multi-Stream Group Gaze Learning and Estimation. IEEE Transactions on Circuits and Systems for Video Technology, 30, 3637-3650.
https://doi.org/10.1109/TCSVT.2019.2940479
[20]  Recasens, A., Vondrick, C., Khosla, A. and Torralba, A. (2017) Following Gaze in Video. Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1444-1452.
https://doi.org/10.1109/ICCV.2017.160
[21]  Brau, E., Guan, J., Jeffries, T. and Barnard, K. (2018) Multiple-Gaze Geometry: Inferring Novel 3D Locations from Gazes Observed in Monocular Video. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., ECCV 2018: Computer Vision—ECCV 2018, Lecture Notes in Computer Science, Vol. 11208, Springer, Cham, 612-630.
https://doi.org/10.1007/978-3-030-01225-0_38
[22]  Massé, B., Ba, S. and Horaud, R. (2017) Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 2711-2724.
https://doi.org/10.1109/TPAMI.2017.2782819
[23]  Long, M., Cao, Y., Wang, J. and Jordan, M. (2015) Learning Transferable Features with Deep Adaptation Networks. Proceedings of the 32nd International Conference on Machine Learning, Lille, 6-11 July 2015, 97-105.
[24]  Xu, R., Li, G., Yang, J. and Lin, L. (2019) Larger Norm More Transferable: An Adaptive Feature Norm Approach for Unsupervised Domain Adaptation. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop, Seoul, 27-28 October 2019, 1426-1435.
[25]  Zen, G., Sangineto, E., Ricci, E. and Sebe, N. (2014) Unsupervised Domain Adaptation for Personalized Facial Emotion Recognition. Proceedings of the 16th International Conference on Multimodal Interaction (ICMI’14), Istanbul, 12-16 November 2014, 128-135.
https://doi.org/10.1145/2663204.2663247
[26]  Cui, S., Wang, S., Zhuo, J., Su, C., Huang, Q. and Tian, Q. (2020) Gradually Vanishing Bridge for Adversarial Domain Adaptation. Proceedings of the 16th IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 12455-12464.
[27]  Zhu, J.-Y., Park, T., Isola, P. and Efros, A.A. (2017) Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2242-2251.
[28]  da Costa, T.V.G., Zara, G., Rota, P., Oliveira-Santos, T., Sebe, N., Murino, V. and Ricci, E. (2022) Dual-Head Contrastive Domain Adaptation for Video Action Recognition. Proceedings of 2020 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2022, 1181-1190.
https://doi.org/10.1109/WACV51458.2022.00229
[29]  Wang, Q., Dai, D., Hoyer, L., Van Gool, L. and Fink, O. (2021) Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 8515-8525.
https://doi.org/10.1109/ICCV48922.2021.00840
[30]  Xu, J., Xiao, L. and López, A.M. (2019) Self-Supervised Domain Adaptation for Computer Vision Tasks. IEEE Access, 7, 156694-156706.
https://doi.org/10.1109/ACCESS.2019.2949697
[31]  Kellnhofer, P., Recasens, A., Stent, S., Matusik, W. and Torralba, A. (2019) Gaze360: Physically Unconstrained Gaze Estimation in the Wild. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 6912-6921.
https://doi.org/10.1109/ICCV.2019.00701
[32]  Tzeng, E., Hoffman, J., Saenko, K. and Darrell, T. (2017) Adversarial Discriminative Domain Adaptation. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2962-2971.
https://doi.org/10.1109/CVPR.2017.316
[33]  Yu, Y., Liu, G. and Odobez, J.-M. (2019) Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Beach, 15-20 June 2019, 11937-11946.
https://doi.org/10.1109/CVPR.2019.01221
[34]  Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M. and Lempitsky, V. (2016) Domain-Adversarial Training of Neural Networks. The Journal of Machine Learning Research, 17, 2096-2030.
[35]  Tomas, H., Reyes, M., Dionido, R., Ty, M., Mirando, J., Casimiro, J., Atienza, R. and Guinto, R. (2021) Goo: A Dataset for Gaze Object Prediction in Retail Environments. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 19-25 June 2021, 3125-3133.
https://doi.org/10.1109/CVPRW53098.2021.00349
[36]  Zhou, B., Lapedriza, A., Xiao, J., Torralba, A. and Oliva, A. (2014) Learning Deep Features for Scene Recognition Using Places Database. Advances in Neural Information Processing Systems, 27, 487-495.
[37]  Mora, K.A.F., Monay, F. and Odobez, J.-M. (2014) Eyediap: A Database for the Development and Evaluation of Gaze Estimation Algorithms from RGB and RGB-D Cameras. Proceedings of 2014 Symposium on Eye Tracking Research and Applications (ETRA’ 14), Safety Harbor Florida, 26-28 March 2014, 255-258.
https://doi.org/10.1145/2578153.2578190
[38]  Judd, T., Ehinger, K., Durand, F. and Torralba, A. (2009) Learning to Predict Where Humans Look. Proceedings of 2009 IEEE 12th International Conference on Computer Vision, Kyoto, 29 September-02 October 2009, 2106-2113.
https://doi.org/10.1109/ICCV.2009.5459462

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413