%0 Journal Article
%T A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data
%A Yisa Adeniyi Abolade
%A Yichuan Zhao
%J Open Journal of Modelling and Simulation
%P 33-42
%@ 2327-4026
%D 2024
%I Scientific Research Publishing
%R 10.4236/ojmsi.2024.122002
%X Compositional data, such as relative information,
is a crucial aspect of machine learning and other related fields. It is typically
recorded as closed data or sums to a constant, like 100%. The statistical linear
model is the most used technique for identifying
hidden relationships between underlying random variables of interest. However, data
quality is a significant challenge in machine learning, especially when missing
data is present. The linear regression model is
a commonly used statistical modeling technique used in various applications to find relationships between variables of interest.
When estimating linear regression parameters which are useful for things like future
prediction and partial effects analysis of independent variables, maximum likelihood
estimation (MLE) is the method of choice. However, many datasets contain missing
observations, which can lead to costly and time-consuming data recovery. To address
this issue, the expectation-maximization (EM) algorithm has been suggested as a
solution for situations including missing data. The EM algorithm repeatedly finds
the best estimates of parameters in statistical models that depend on variables
or data that have not been observed. This is
called maximum likelihood or maximum a posteriori (MAP). Using the present estimate
as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated
log-likelihood, as determined in the E step, is the job of the maximization
(M) phase. This study looked at how well the EM algorithm worked on a made-up compositional
dataset with missing observations. It used both the robust least square version
and ordinary least square regression techniques. The efficacy of the EM algorithm
%K Compositional Data
%K Linear Regression Model
%K Least Square Method
%K Robust Least Square Method
%K Synthetic Data
%K Aitchison Distance
%K Maximum Likelihood Estimation
%K Expectation-Maximization Algorithm
%K k-Nearest Neighbor
%K and Mean imputation
%U http://www.scirp.org/journal/PaperInformation.aspx?PaperID=131654