全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

SHELL:一种面向流数据的实时基数估计算法
SHELL:a real-time cardinality evaluation algorithm for stream data

Keywords: 大数据处理技术 流数据 基数估计 并行化算法
big data processing techniques stream data cardinality evaluation paralleling algorithm

Full-Text   Cite this paper   Add to My Lib

Abstract:

基数计算在流数据查询优化、网络安全、数据压缩等领域具有重要的应用价值。现有的基于概率统计原理的基数估计算法需要通过扫描历史静态数据才能进行基数统计,由于流数据具有持续、快速和实时等特点,不可能先持久化再处理分析,因而传统的基数估计算法无法直接应用在大数据流处理中。通过研究Spark、Storm实时分布式流处理机制和传统基数估计算法,设计和实现了实时的流数据基数估计算法SHELL(Streaming HypErLogLog),实验表明,SHELL在保证精确度不降低的情况下,单位滑动时间窗口内处理的消息量达到6.0×105~6.8×105,满足实时性处理的要求。
Cardinality estimation has an important application value in the fields of stream data query optimization,network security,data compression and so on.Some existing probabilistic algorithms are developed to estimate the cardinality by scanning the static historical data.Due to the infinite,fast,real time characteristics of data stream,the algorithms cannot be applied to an infinite data stream.By studying streaming data process mechanisms of Spark,Storm and existing probabilistic algorithms,a real time cardinality evaluation algorithm,Streaming HypErLogLog (SHELL),for stream data is designed and implemented.Experimental results show that SHELL can achieve 6.0×105-6.8×105 messages in one sliding time window.Therefore,SHELL can satisfy real time requirements

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133