OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2018

分布式事务型内嵌树模式挖掘

赵文,吴小莹

Keywords: 内嵌树模式,分布式模式挖掘,MapReduce,工作负载均衡

Full-Text Cite this paper Add to My Lib

Abstract:

为了从大规模数据集中更高效地发现有价值的规则,本文提出一种迭代的频繁内嵌无序树模式挖掘算法:TETPM.同时设计了两个工作负载划分策略:TETPM-P和TETPM-E.TETPM-P由模式划分工作负载,而TETPM-E则通过模式实例来划分工作负载.实验评估表明,两种算法均可以有效地从大型数据集中挖掘频繁内嵌模式,TETPM-P适合于模式实例数更均衡的数据集,而TETPM-E则更适合规模更大的数据集

References

[1]	KILPELAINEN P,MANNILA H.Ordered and unordered tree inclusion[J].SIAM Journal on Computing,1995,24(2):340-356.DOI:10.1137/S0097539791218202.
[2]	ASAI T,ABE K,KAWASOE S,et al.Efficient substructure discovery from large semi-structured data[C]//SIAM International Conference on Data Mining.Pennsylvania:SIAM,2002:158-174.
[3]	CHI Y,YANG Y R,MUNTZ R R,et al.HybridTreeMiner:An efficient algorithm for mining frequent rooted trees and free trees using canonical forms[C]//Statistical and Scientific Database Management.Berlin:Springer-Verlag,2004:11-20.DOI:10.1109/SSDM.2004.1311189.
[4]	TERMIER A,ROUSSET M,SEBAG M,et al.TreeFinder:A first step towards XML data mining[C]//International Conference on Data Mining.New York:IEEE Press,2002:450-457.DOI:10.1109/ICDM.2002.1183987.
[5]	DRIES A,NIJSSEN S.Mining patterns in networks-using homomorphism[C]//SIAM International Conference on Data Mining.Pennsylvania:SIAM,2011:260-271.DOI:10.1137/1.9781611972825.23.
[6]	The Apache Software Foundation.Apache Spark[DB/OL].[2016-01-04].http://spark.apache.org.
[7]	ZAKI M J.Efficiently mining frequent trees in a forest:Algorithms and applications[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(8):1021-1035.DOI:10.1109/TKDE.2005.125.
[8]	WU X Y,THEODORATOS D.Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees[C]//Database Systems for Advanced Applications.Switzerland:Springer-Verlag,2015:3-20.DOI:10.1007/978-3-319-18120-2_1.
[9]	ZAKI M J.Efficiently mining frequent embedded unordered trees[J].Fundamenta Informaticae,2004,66(1):33-52.
[10]	LIN W Q,XIAO X K,GHINITA G,et al.Largescale frequent subgraph mining in MapReduce[C]//International Conference on Data Engineering.New York:IEEE Press,2014:844-855.DOI:10.1109/ICDE.2014.6816705.
[11]	WUXY,THEODORATOS D.Homomorphic pattern mining from a single large data tree[J].Data Science and Engineering,2016,1:203-218.DOI:0.1007/s41019-016-0028-7.
[12]	HADZIC F,TAN H,DILLON T S,et al.Model guided algorithm for mining unordered embedded subtrees[J].Web Intelligence and Agent Systems:An International Journal,2010,8(4):413-430.DOI:10.3233/WIA-2010-0200.
[13]	The Apache Software Foundation.Apache Hadoop[DB/OL].[2016-02-11].http://hadoop.apache.org.
[14]	ASAI T,ARIMURA H,UNO T,et al.Discovering frequent substructures in large unordered trees[C]//International Conference on Discovery Science.Berlin:Springer-Verlag,2003:47-61.DOI:10.1007/978-3-540-39644-4_6.
[15]	CHEHREGHANI M H,BRUYNOOGHE M.Mining rooted ordered trees under subtree homeomorphism[J].Data Mining and Knowledge Discovery,2016,30(5):1249-1272.DOI:0.1007/s10618-015-0439-5.
[16]	DEAN J,GHEMAWAT S.MapReduce:Simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.DOI:10.1145/1327452.1327492.
[17]	LU W,CHEN G,TUNG A K H,et al.Efficiently extracting frequent subgraphs using MapReduce[C]//IEEE International Conference on Big Data.New York:IEEE Press,2013:639-647.DOI:10.1109/BigData.2013.6691633.
[18]	BHUIYAN M,HASAN M A.An iterative MapReduce based frequent subgraph mining algorithm[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(3):608-620.DOI:10.1109/TKDE.2014.2345408.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133