全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Entering the Era of Data Science: Targeted Learning and the Integration of Statistics and Computational Data Analysis

DOI: 10.1155/2014/502678

Full-Text   Cite this paper   Add to My Lib

Abstract:

This outlook paper reviews the research of van der Laan’s group on Targeted Learning, a subfield of statistics that is concerned with the construction of data adaptive estimators of user-supplied target parameters of the probability distribution of the data and corresponding confidence intervals, aiming at only relying on realistic statistical assumptions. Targeted Learning fully utilizes the state of the art in machine learning tools, while still preserving the important identity of statistics as a field that is concerned with both accurate estimation of the true target parameter value and assessment of uncertainty in order to make sound statistical conclusions. We also provide a philosophical historical perspective on Targeted Learning, also relating it to the new developments in Big Data. We conclude with some remarks explaining the immediate relevance of Targeted Learning to the current Big Data movement. 1. Introduction In Section 2 we start out with reviewing some basic statistical concepts such as data probability distribution, statistical model, and target parameter, allowing us to define the field Targeted Learning, a subfield of statistics that develops data adaptive estimators of user supplied target parameters of data distributions based on high dimensional data under realistic assumptions (e.g., incorporating the state of the art in machine learning) while preserving statistical inference. This also allows us to clarify how Targeted Learning distinguishes from typical current practice in data analysis that relies on unrealistic assumptions and describe the key ingredients of targeted minimum loss based estimation (TMLE), a general tool to achieve the goals set out by Targeted Learning: a substitution estimator, construction of initial estimator through super-learning, targeting of the initial estimator to achieve asymptotic linearity with known influence curve by solving the efficient influence curve estimating equation, and statistical inference in terms of a normal limiting distribution. Targeted Learning resurrects the pillars of statistics such as the facts that a model represents actual knowledge about the data generating experiment and that a target parameter represents the feature of the data generating distribution we want to learn from the data. In this manner, Targeted Learning defines a truth and sets a scientific standard for estimation procedures, while current practice typically defines a parameter as a coefficient in a misspecified parametric model (e.g., logistic linear regression, repeated measures generalized linear

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413