全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

一致性对比采样网络的弱监督时序动作定位
Consensus Contrastive Sampling Network for Weakly-Supervised Temporal Action Localization

DOI: 10.12677/CSA.2024.142019, PP. 183-199

Keywords: 时序动作定位,弱监督方法,一致性前景采样,对比采样
Temporal Action Localization
, Weakly-Supervised Method, Consensus Foreground Sampling, Contrastive Sampling

Full-Text   Cite this paper   Add to My Lib

Abstract:

弱监督时序动作定位使用视频级标签,不需要高成本的动作实例标签,具有重要的研究价值。弱监督时序动作定位的难点在于,视频中的前景片段被淹没在背景片段中,难以得到精确的前景样本用于训练模型。关注于分析背景和前景片段在时间类激活序列上的差异,提出一致性对比采样网络。该网络使用多头注意力模块来增强行为特征。为了缓解前景样本被背景样本干扰的问题,该网络设计了易混淆样本的随机采样策略,用于学习前景采样的提议分布。为了促进前景分布的收敛,该网络联合考虑多阶段的前景采样规则,设计多阶段一致性采样模块。此外,针对前景和背景过渡区域的前景样本和背景样本较为相似,难以区分的问题,该网络设计对比采样模块,并联合考虑多阶段一致性采样,用于挖掘出困难前景样本,并使用对比学习优化困难前景样本的特征。在THUMOS 14和Activity v1.3数据集上进行实验验证。实验结果表明,提出的方法达到现有弱监督时序动作定位方法的性能。
Weakly supervised temporal action localization uses video-level labels, and does not require high-cost action instance labels. It has important research value. The difficulty of weakly supervised temporal action localization is that the foreground clips in the video are confused with the surrounding background clips, making it difficult to obtain accurate foreground samples for model training. This paper focuses on analyzing the difference between background and foreground clips on the temporal class activation sequence and proposes a consistent contrastive sampling network. The network uses a multi-headed attention module to enhance action features. To alleviate the problem that foreground samples are disturbed by background samples, the network designs a random strategy to sample confusable samples to learn the proposed distribution of foreground sampling. To facilitate the convergence of the foreground distribution, our network jointly considers multi-stage foreground sampling rules to design multi-stage consistent sampling modules. In addition, to address the problem that foreground and background samples in the foreground and back-ground transition regions are highly similar and difficult to distinguish, our network designs the contrastive sampling module. Our network jointly considers multi-stage consistent sampling to select hard foreground samples and uses contrast learning to refine the features of hard foreground samples. Experiments on THUMOS 14 and Activity v1.3 datasets show that network achieves the performance of existing weakly supervised temporal action localization methods.

References

[1]  Liu, Z., Wang, L., Tang, W., et al. (2021) Weakly Supervised Temporal Action Localization through Learning Explicit Subspaces for Action and Context. Proceedings of the 35th AAAI Conference on Artificial Intelligence, 35, 2242-2250.
https://doi.org/10.1609/aaai.v35i3.16323
[2]  Islam, A., Long, C. and Radke, R. (2021) A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization. Proceedings of the 35th AAAI Conference on Artificial Intelligence, 35, 1637-1645.
https://doi.org/10.1609/aaai.v35i2.16256
[3]  肖进胜, 申梦瑶, 江明俊, 雷俊峰, 包振宇. 融合包注意力机制的监控视频异常行为检测[J]. 自动化学报, 2022, 48(12): 2951-2959.
[4]  Luo, Z., Guillory, D., Shi, B., et al. (2020) Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning. In: Vedaldi, A., Bis-chof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, Cham, 729-745.
https://doi.org/10.1007/978-3-030-58526-6_43
[5]  Zhai, Y., Wang, L., Tang, W., et al. (2020) Two-Stream Con-sensus Network for Weakly-Supervised Temporal Action Localization. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, Cham, 37-54.
https://doi.org/10.1007/978-3-030-58539-6_3
[6]  Yang, W., Zhang, T., Yu, X., et al. (2021) Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection. Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 53-63.
https://doi.org/10.1109/CVPR46437.2021.00012
[7]  He, B., Yang, X., Kang, L., et al. (2022) ASM-Loc: Ac-tion-aware Segment Modeling for Weakly-Supervised Temporal Action Localization. Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 13915-13925.
https://doi.org/10.1109/CVPR52688.2022.01355
[8]  Zeng, R., Huang, W., Tan, M., et al. (2019) Graph Convo-lutional Networks for Temporal Action Localization. Proceedings of the 2019 IEEE International Conference on Com-puter Vision, Seoul, 27 October—2 November 2019, 7093-7102.
https://doi.org/10.1109/ICCV.2019.00719
[9]  Liu, X., Wang, Q., Hu, Y., et al. (2022) End-to-End Temporal Ac-tion Detection with Transformer. IEEE Transactions on Image Processing, 31, 5427-5441.
https://doi.org/10.1109/TIP.2022.3195321
[10]  Zhang, C.L., Wu, J. and Li, Y. (2022) ActionFormer: Localizing Moments of Actions with Transformers. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., Computer Vision—ECCV 2022, Springer, Cham, 492-510.
https://doi.org/10.1007/978-3-031-19772-7_29
[11]  Zhang, D., Huang, C., Liu, C. and Xu, Y. (2022) Weakly Su-pervised Video Anomaly Detection via Transformer-Enabled Temporal Relation Learning. IEEE Signal Processing Let-ters, 29, 1197-1201.
https://doi.org/10.1109/LSP.2022.3175092
[12]  Ge, Y., Qin, X., Yang, D., et al. (2021) Deep Snippet Selective Network for Weakly Supervised Temporal Action Localization. Pattern Recognition, 110, Article ID: 107686.
https://doi.org/10.1016/j.patcog.2020.107686
[13]  Pardo, A., Alwassel, H., Caba, F., et al. (2021) RefineLoc: Iter-ative Refinement for Weakly-Supervised Action Localization. Proceedings of the 2021 IEEE Winter Conference on Ap-plications of Computer Vision, Waikoloa, 3-8 January 2021, 3318-3327.
https://doi.org/10.1109/WACV48630.2021.00336
[14]  Su, R., Xu, D., Zhou, L., et al. (2021) Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-Resolution Information in Temporal Domain. IEEE Transactions on Image Processing, 30, 6659-6672.
https://doi.org/10.1109/TIP.2021.3089355
[15]  Zhang, C., Cao, M., Yang, D., et al. (2021) Cola: Weak-ly-Supervised Temporal Action Localization with Snippet Contrastive Learning. Proceedings of the 2021 IEEE Confer-ence on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 16010-16019.
https://doi.org/10.1109/CVPR46437.2021.01575
[16]  Rizve, M.N., Mittal, G., Yu Y, et al. (2023) PivoTAL: Pri-or-Driven Supervision for Weakly-Supervised Temporal Action Localization. Proceedings of the 2023 IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, 17-24 June 2023, 22992-23002.
https://doi.org/10.1109/CVPR52729.2023.02202
[17]  Tang, X., Fan, J., Luo, C., et al. (2023) DDG-Net: Dis-criminability-Driven Graph Network for Weakly-Supervised Temporal Action Localization. Proceedings of the 2023 IEEE International Conference on Computer Vision, Paris, 1-6 October 2023, 6599-6609.
https://doi.org/10.1109/ICCV51070.2023.00609
[18]  Idrees, H., Zamir, A.R., Jiang, Y.G., et al. (2017) The Thu-mos Challenge on Action Recognition for Videos “in the Wild”. Computer Vision and Image Understanding, 155, 1-23.
https://doi.org/10.1016/j.cviu.2016.10.018
[19]  Caba Heilbron, F., Escorcia, V., Ghanem, B., et al. (2015) Activi-tyNet: A Large-Scale Video Benchmark for Human Activity Understanding. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 961-970.
https://doi.org/10.1109/CVPR.2015.7298698
[20]  Lee, P., Uh, Y. and Byun, H. (2020) Background Suppression Network for Weakly-Supervised Temporal Action Localization. Proceedings of the 34th AAAI Conference on Artificial Intelligence, 34, 11320-11327.
https://doi.org/10.1609/aaai.v34i07.6793
[21]  Zhao, T., Han, J., Yang, L., et al. (2022) Equivalent Classification Mapping for Weakly Supervised Temporal Action Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 3019-3031.
[22]  Zhai, Y., Wang, L., Tang, W., et al. (2022) Adaptive Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 4136-4151.
https://doi.org/10.1109/TPAMI.2022.3189662
[23]  Huang, L., Huang, Y., Ouyang, W. and Wang, L. (2021) Modeling Sub-Actions for Weakly Supervised Temporal Action Localization. IEEE Transactions on Image Processing, 30, 5154-5167.
https://doi.org/10.1109/TIP.2021.3078324
[24]  Luo, W., Zhang, T., Yang, W., et al. (2021) Action Unit Memory Network for Weakly Supervised Temporal Action Localization. Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 9964-9974.
https://doi.org/10.1109/CVPR46437.2021.00984
[25]  Liu, Y., Chen, J., Chen, Z., et al. (2021) The Blessings of Unlabeled Background in Untrimmed Videos. Proceedings of the 2021 IEEE Conference on Computer Vision and Pat-tern Recognition, Nashville, 20-25 June 2021, 6172-6181.
https://doi.org/10.1109/CVPR46437.2021.00611
[26]  Qu, S., Chen, G., Li, Z., et al. (2021) ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal Action Localization. Proceedings of the 35th AAAI Con-ference on Artificial Intelligence, 35, 2233-2241.
https://doi.org/10.1609/aaai.v35i3.16322
[27]  Huang, L., Wang, L. and Li, H. (2022) Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation. Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 3262-3271.
https://doi.org/10.1109/CVPR52688.2022.00327
[28]  Ren, H., Yang, W., Zhang, T. and Zhang, Y.D. (2023) Pro-posal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization. Proceedings of the 2023 IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, 17-24 June 2023, 2394-2404.
https://doi.org/10.1109/CVPR52729.2023.00237
[29]  Liu, Q., Wang, Z., Rong, S., et al. (2023) Revisiting Fore-ground and Background Separation in Weakly-Supervised Temporal Action Localization: A Clustering-Based Approach. Proceedings of the 2023 IEEE International Conference on Computer Vision, Paris, 1-6 October 2023, 10433-10443.
https://doi.org/10.1109/ICCV51070.2023.00957

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413