全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

分布式数据库系统中的并行分组聚合实现

DOI: 10.3969/j.issn.1000-5641.2018.05.005

Keywords: OceanBase, GroupBy, Hash, 数据分布
Key words: OceanBase GroupBy Hash Data distribution

Full-Text   Cite this paper   Add to My Lib

Abstract:

摘要 伴随着新型互联网应用中对数据统计、分析需求的增大,分组、聚合已经成为数据分析应用中出现频率最多的请求之一.本文就类OLAP(on-line transactionprocessing)应用中常见的Aggregation、GroupBy原理进行了分析.针对一般事务型数据库采用排序分组的缺点,提出了两种Hash分组聚合的具体实现方案,并提出一种利用统计信息动态决策Hash桶数、Hash分组聚合方案的策略.根据分布式数据库多副本的特点,本文又提出了一种Hash分组聚合节点级的并行方案.最后,在开源数据库OceanBase进行了具体的实现.通过实验证明,本文提出的利用统计信息动态决策Hash分组聚合方案相比排序分组具有极大的效率提升.
Abstract:With the increase in demand for data statistics and analysis in new Internet applications, data grouping and aggregation have become amongst the most common operations in data analysis applications. This paper analyzes the operating principles of the Aggregation and GroupBy functions commonly used in analytical applications. Based on the disadvantages of sort grouping for general-transactional databases, two kinds of Hash GroupBy implementations are proposed; in addition,a strategy for dynamically determining the number of Hash buckets and Hash GroupBy schemes, based on statistical information, is proposed. Based on the characteristics of distributed clusters, implementation of the Hash GroupBy operator push down is proposed. Experiments have shown that the use of statistical information to dynamically determine the Hash group option improves efficiency.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413