OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computer Science and Application 2024

图数据领域的频繁项集挖掘
Frequent Itemset Mining in the Graph Data Field

DOI: 10.12677/CSA.2024.141017, PP. 158-172

邱文韬, 兰红

Keywords: 频繁项集挖掘，图数据，频繁子图，图神经网络，机器学习
Frequent Itemset Mining, Graph Data, Frequent Subgraphs, Graph Neural Networks, Machine Learning

Full-Text Cite this paper Add to My Lib

Abstract:

本文对图数据中的频繁子图挖掘算法进行了综述。追溯了从传统非图数据频繁项集挖掘技术到图数据频繁子图挖掘的演化，详细阐述了经典算法如类Apriori方法和gSpan算法的原理与应用。也对近年来兴起的基于图表示学习的先进算法，如SPMiner、NSIC、LSS和NeurSC进行了系统的介绍和比较。本文不仅回顾了算法的历史演进，还对各类算法进行了详细的分析与讨论，通过分析这些算法的性能和特点，揭示了它们的优势与局限。最后，展望了图神经网络在频繁子图挖掘领域的未来发展方向，并指出了这些技术在生物网络分析、社交网络分析等领域应用的广阔前景。
We present a comprehensive review of frequent subgraph mining algorithms in graph data. We trace the evolution from traditional frequent itemset mining techniques for non-graph data to the current state-of-the-art in frequent subgraph mining for graph data, elaborating on the principles and applications of classic algorithms such as Apriori-like methods and the gSpan algorithm. We also systematically introduce and compare recent advanced algorithms that have emerged from graph representation learning, including SPMiner, NSIC, LSS, and NeurSC. We not only recount the historical progression of these algorithms but also provide a detailed analysis and discussion of each category. By examining their performance and characteristics, we reveal their advantages and limitations. Finally, we anticipate the future trajectory of graph neural networks in the field of fre-quent subgraph mining and highlight the vast potential for these technologies in applications like biological network analysis and social network analysis.

References

[1]	Agrawal, R., Imielinski, T. and Swami, A. (1993) Mining Association Rules between Sets of Items in Large Databases. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington DC, 25-28 May 1993, 207-216. https://doi.org/10.1145/170035.170072
[2]	Han, J. and Kamber, M. (2012) Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco.
[3]	Han, J. and Kamber, M. (2001) Applying the Aprio-ri-Based Graph Mining Method to Mutagenesis Data Analysis. Journal of Computer Aided Chemistry, 2, 87-92. https://doi.org/10.2751/jcac.2.87
[4]	Agrawal, R. and Srikant, R. (1994) Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, 12-15 September 1994, 487-499.
[5]	Han, J., Pei, J. and Yin, Y. (2004) Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, 8, 53-87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
[6]	Inokuchi, A., Washio, T. and Motoda, H. (2000) An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. Principles of Data Mining and Knowledge Discovery, Lyon, 13-16 September 2000, 13-23. https://doi.org/10.1007/3-540-45372-5_2
[7]	Inokuchi, A., Nishimura, T.K. and Motoda, H. (2002) A Fast Algo-rithm for Mining Frequent Connected Subgraph. IBM Research, Tokyo Research Laboratory, Tokyo.
[8]	Inokuchi, A., Washio, T. and Motoda, H. (2003) Complete Mining of Frequent Patterns from Graphs: Mining Graph Data. Machine Learning, 50, 321-354.
[9]	Jun, H.W., et al. (2004) Spin: Mining Maximal Frequent Subgraphs from Graph Databases. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 22-25 August 2004, 581-586.
[10]	Holder, L.B., Cook, D.J. and Djoko, S. (1994) Substucture Discovery in the SUBDUE System. AAAIWS’94: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Seattle, 31 July-1 August 1994, 169-180. https://doi.org/10.1109/TKDE.2005.127
[11]	Deshpande, M., Kuramochi, M., Wale, N. and Karypis, G. (2003) Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds. IEEE Transactions on Knowledge and Data Engineering, 17, 1036-1050. https://doi.org/10.1109/TKDE.2005.127
[12]	Yan, X.F. and Han, J.W. (2002) gSpan: Graph-Based Substructure Pattern Mining. 2002 IEEE International Conference on Data Mining, Maebashi City, 9-12 December 2002, 721-724.
[13]	Kuramochi, M. and Karypis, G. (2001) Frequent Subgraph Discovery. Proceedings 2001 IEEE Interna-tional Conference on Data Mining, San Jose, 29 November-2 December 2001, 313-320.
[14]	Huan, J., Wang, W. and Prins, J. (2003) Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. 3rd IEEE International Con-ference on Data Mining, Melbourne, 22 November 2003, 549-552.
[15]	Nijssen, S. and Kok, J.N. (2004) A Quickstart in Frequent Structure Mining Can Make a Difference. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 22-25 August 2004, 647-652. https://doi.org/10.1145/1014052.1014134
[16]	Ribeiro, P., et al. (2021) A Survey on Subgraph Counting: Con-cepts, Algorithms and Applications to Network Motifs and Graphlets. ACM Computing Surveys, 54, Article No. 28.
[17]	Bhattarai, B., Liu, H. and Huang, H.H. (2019) CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. Proceedings of the 2019 International Conference on Management of Data, Amsterdam, 30 June-5 July 2019, 1447-1462. https://doi.org/10.1145/3299869.3300086
[18]	Bi, F., Chang, L.J., Lin, X.M., Qin, L. and Zhang, W.J. (2016) Efficient Subgraph Matching by Postponing Cartesian Products. Proceedings of the 2016 International Confer-ence on Management of Data, San Francisco, 26 June-1 July 2016, 1199-1214. https://doi.org/10.1145/2882903.2915236
[19]	Archibald, B., et al. (2019) Sequential and Parallel Solution-Biased Search for Subgraph Algorithms. 16th International Conference, CPAIOR 2019, Thessaloniki, 4-7 June 2019, 20-38. https://doi.org/10.1007/978-3-030-19212-9_2
[20]	Ahmed, N.K., Neville, J., Rossi, R.A. and Duffield, N. (2015) Efficient Graphlet Counting for Large Networks. IEEE International Conference on Data Mining, Atlantic City, 14-17 November 2015, 1-10. https://doi.org/10.1109/ICDM.2015.141
[21]	Ortmann, M. and Brandes, U. (2017) Efficient Orbit-Aware Triad and Quad Census in Directed and Undirected Graphs. Applied Network Science, 2, Article No. 13. https://doi.org/10.1007/s41109-017-0027-2
[22]	Melckenbeeck, I., Audenaert, P., Colle, D. and Pickavet, M. (2018) Efficiently Counting All Orbits of Graphlets of Any Order in a Graph Using Autogenerated Equations. Bioinformatics, 34, 1372-1380. https://doi.org/10.1093/bioinformatics/btx758
[23]	Bodirsky, M. (2016) Graphs and Homomorphisms. In: Schr？-der, B., Ed., Ordered Sets: An Introduction with Connections from Combinatorics to Topology, Springer International Publishing, Cham, 155-171.
[24]	Cook, S.A. (1971) The Complexity of Theorem-Proving Procedures. Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, Shaker Heights, 3-5 May 1971, 151-158. https://doi.org/10.1145/800157.805047
[25]	Pinar, A., Comandur, S. and Vishal, V. (2017) ESCAPE: Efficiently Counting All 5-Vertex Subgraphs. Proceedings of the 26th International Conference on World Wide Web, Perth, 3-7 April 2017, 1431-1440. https://doi.org/10.1145/3038912.3052597
[26]	Zhang, H., Yu, J.X., Zhang, Y.K., Zhao, K.F. and Cheng, H. (2020) Distributed Subgraph Counting: A General Approach. Proceedings of the VLDB Endowment, 13, 2493-2507. https://doi.org/10.14778/3407790.3407840
[27]	Chen, X. and Lui, J.C. (2018) Mining Graphlet Counts in Online Social Networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 12, Article No. 41. https://doi.org/10.1145/3182392
[28]	Chen, Z., Chen, L., Villar, S. and Bruna, J. (2020) Can Graph Neural Net-works Count Substructures? https://arxiv.org/abs/2002.04025
[29]	Bressan, M., Chierichetti, F., Kumar, R. and Leucci, S. and Panconesi, A. (2017) Counting Graphlets: Space vs Time. Proceedings of the 10th ACM International Conference on Web Search and Data Mining, Cambridge, 6-10 February 2017, 557-566. https://doi.org/10.1145/3018661.3018732
[30]	Bhuiyan, M.A., Rahman, M., Rahman, M. and Al Hasan, M. (2012) GUISE: Uniform Sampling of Graphlets for Large Graph Analysis. 2012 IEEE 12th International Conference on Data Mining, Brussels, 10-13 December 2012, 91-100. https://doi.org/10.1109/ICDM.2012.87
[31]	Bressan, M., Leucci, S. and Panconesi, A. (2019) Motivo: Fast Motif Counting via Succinct Color Coding and Adaptive Sampling. Proceedings of the VLDB Endowment, 12, 1651-1663. https://doi.org/10.14778/3342263.3342640
[32]	Wang, H., et al. (2022) Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching. https://arxiv.org/abs/2201.11251
[33]	Scarselli, F., et al. (2009) The Graph Neural Network Model. IEEE Trans-actions on Neural Networks, 20, 61-80. https://doi.org/10.1109/TNN.2008.2005605
[34]	Leskovec, J., Kipf, T. and van der Maaten, L. (2023) CS224w: Machine Learning with Graphs. http://web.stanford.edu/class/cs224w/
[35]	Ying, R., Wang, A., You, J. and Leskovec, J. (2020) Frequent Subgraph Mining by Walking in Order Embedding Space. https://snap.stanford.edu/frequent-subgraph-mining/
[36]	Bonnici, V., Giugno, R., Pulvirenti, A., Shasha, D. and Ferro, A. (2013) A Subgraph Isomorphism Algorithm and Its Application to Biochemical Data. BMC Bioinformatics, 14, S13. https://doi.org/10.1186/1471-2105-14-S7-S13
[37]	Jüttner, A. and Madarasi, P. (2018) VF2++—An Improved Subgraph Isomorphism Algorithm. Discrete Applied Mathematics, 242, 69-81. https://doi.org/10.1016/j.dam.2018.02.018
[38]	Mhedhbi, A. and Salihoglu, S. (2019) Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proceedings of the VLDB Endowment, 12, 1692-1704. https://doi.org/10.14778/3342263.3342643
[39]	Shang, H.C., Zhang, Y., Lin, X.M. and Yu, J.X. (2008) Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism. Proceedings of the VLDB Endow-ment, 1, 364-375. https://doi.org/10.14778/1453856.1453899
[40]	Liu, X., et al. (2020) Neural Subgraph Isomorphism Counting. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 6-10 July 2020, 1959-1969. https://doi.org/10.1145/3394486.3403247
[41]	Zhao, K.F., Yu, J.X., Zhang, H., Li, Q.Y. and Rong, Y. (2021) A Learned Sketch for Subgraph Counting. Proceedings of the 2021 International Conference on Management of Data, 20-25 June 2021, 2142-2155. https://doi.org/10.1145/3448016.3457289
[42]	Wang, H., et al. (2022) Neural Subgraph Counting with Wasser-stein Estimator. Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data, Philadelphia, 12-17 June 2022, 160-175. https://doi.org/10.1145/3514221.3526163
[43]	Ying, Z.T., Lou, Z.Y., You, J.X., et al. (2020) Neural Subgraph Matching. https://doi.org/10.48550/arXiv.2007.03092
[44]	Davitkova, A., Gjurovski, D. and Michel, S. (2021) LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. https://arxiv.org/abs/2102.10588
[45]	Ullmann, J.R. (1976) An Algorithm for Subgraph Isomorphism. Journal of the ACM, 23, 31-42. https://doi.org/10.1145/321921.321925

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413

图数据领域的频繁项集挖掘Frequent Itemset Mining in the Graph Data Field

图数据领域的频繁项集挖掘
Frequent Itemset Mining in the Graph Data Field