全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data

DOI: 10.4236/ojmsi.2024.122002, PP. 33-42

Keywords: Compositional Data, Linear Regression Model, Least Square Method, Robust Least Square Method, Synthetic Data, Aitchison Distance, Maximum Likelihood Estimation, Expectation-Maximization Algorithm, k-Nearest Neighbor, and Mean imputation

Full-Text   Cite this paper   Add to My Lib

Abstract:

Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm

References

[1]  Aitchison, J. (2002) Simplicial Inference. In: Viana, M.A.G. and Richards, D.S.P., Eds., Contemporary Mathematics Series, Vol. 287: Algebraic Methods in Statistics and Probability, American Mathematical Society, Providence, 1-22.
[2]  Aitchison, J. (1986) The Statistical Analysis of Compositional Data. Chapman & Hall, London.
[3]  Aitchison, J. (1989) Measures of Location of Compositional Data Sets. Mathematical Geology, 21, 787-790.
https://doi.org/10.1007/BF00893322
[4]  Martín-Fernández, J.A., Egozcue, J.J., Olea, R.A. et al. (2021) Units Recovery Methodsin Compositional Data Analysis. Natural Resources Research, 30, 3045-3058.
https://doi.org/10.1007/s11053-020-09659-7
[5]  Pawlowsky, V., Olea, R.A. and Davis, J.C. (1995) Estimation of Regionalized Compositions: A Comparison of Three Methods. Mathematical Geosciences, 27, 105-127.
https://doi.org/10.1007/BF02083570
[6]  Weltje, G.J. (1997) End-Member Modeling of Compositional Data: Numerical-Statistical Algorithms for Solving the Explicit Mixing Problem. Mathematical Geosciences, 29, 503-549.
https://doi.org/10.1007/BF02775085
[7]  Rehder, U. and Zier, S. (2001) Comment on “Logratio Analysis and Compositional distance by Aitchison et al. (2000)”. Journal of Mathematical Geology, 32, 741-763.
[8]  Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39, 1-22.
https://doi.org/10.1111/j.2517-6161.1977.tb01600.x

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133