%0 Journal Article
%T SCALABLE IMPLEMENTATIONS OF DESCRIPTIVE STATISTICS ON HADOOP
%J -
%D 2019
%X Big Data is one of the most trendy technologies of our time. The volume of data is increasing day by day, thanks to serial data generation technologies such as social media, sensor data, Internet of Things. The massive increase in the amount of data accumulated around the world requires different approaches to store, process and analyze the big data. A set of quantitative data has many features and the descriptive statistics can describe these features in a meaningful and manageable form without having to list every value in the dataset. However, the standard statistical techniques cannot suit big data due to the size, complexity and velocity of the data. Though there are many off-the-shelf statistical tools available to analyze quantitative data they are not always compatible with the big data file systems. In this paper, we describe our implementations of the descriptive statistics algorithms over big data and show the scalability of our experiments on a small Hadoop cluster with 196 threads. This study presents that descriptive statistics for large datasets can benefit from distributed computation features of a Hadoop cluster
%K Büyük Veri
%K Betimleyici ？statistik
%K Hadoop
%K MapReduce
%U http://dergipark.org.tr/nicel/issue/46083/544379