全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于错误发现率的高维数据流在线监控方法
Online Monitoring Method of High-Dimensional Data Streams Based on False Discovery Rate

DOI: 10.12677/sa.2024.132031, PP. 307-314

Keywords: 错误发现率,对称数据聚合,高维数据流,统计过程控制
False Discovery Rate
, Symmetric Data Aggregation, High-Dimensional Data Streams, Statistical Process Control

Full-Text   Cite this paper   Add to My Lib

Abstract:

关于多数据流的监控,大多假设数据流之间是独立的。从统计过程控制的角度,给出了在线监控高维数据流的一般框架。鉴于数据的分布可能存在多样性,本文采用对称数据聚合方法建立了稳健的监控统计量,利用统计量的渐进对称性选取数据驱动的阈值,基于错误发现率对相关的非正态数据流进行在线监控。以AR (1)模型刻画数据流间的相关性,通过蒙特卡洛模拟,研究了所提出方法的错误发现率和功效水平。数值模拟结果表明所提出的方法具有较理想的性能。
Regarding the monitoring of multiple data streams, it is mostly assumed that the data streams are independent. A general framework for online monitoring of high-dimensional data streams is provided from the perspective of statistical process control. Given the potential diversity in data distribution, this paper adopts a symmetric data aggregation method to establish a robust monitoring statistic. The asymptotic symmetry of the statistic is used to select data-driven thresholds, and the relevant non-normal data streams are monitored online based on the false discovery rate. The AR (1) model was used to characterize the correlation between data streams, and the false discovery rate and power level of the proposed method were studied through Monte Carlo. The numerical simulation results indicate that the proposed method has ideal performance.

References

[1]  Bersimis, S., Psarakis, S. and Panaretos, J. (2007) Multivariate Statistical Process Control Charts: An Overview. Quality and Reliability Engineering International, 23, 517-543.
https://doi.org/10.1002/qre.829
[2]  Woodall, W.H. and Montgomery, D.C. (2014) Some Current Directions in the Theory and Application of Statistical Process Monitoring. Journal of Quality Technology, 46, 78-94.
https://doi.org/10.1080/00224065.2014.11917955
[3]  Noorossana, R., Saghaei, A. and Amiri, A. (2011) Statistical Analysis of Profile Monitoring. John Wiley & Sons, Inc., Hoboken.
https://doi.org/10.1002/9781118071984
[4]  Wang, A., Wang, K. and Tsung, F. (2014) Statistical Surface Monitoring by Spatial-Structure Modeling. Journal of Quality Technology, 46, 359-376.
https://doi.org/10.1080/00224065.2014.11917977
[5]  Mei, Y. (2010) Efficient Scalable Schemes for Monitoring a Large Number of Data Streams. Biometrika, 97, 419-433.
https://doi.org/10.1093/biomet/asq010
[6]  Spiegelhalter, D., Sherlaw-Johnson, C., Bardsley, M., Blunt, I., Wood, C. and Grigg, O. (2012) Statistical Methods for Healthcare Regulation: Rating, Screening and Surveillance (with Discussions). Journal of the Royal Statistical Society Series A, 175, 1-47.
https://doi.org/10.1111/j.1467-985X.2011.01010.x
[7]  Zou, C., Wang, Z., Zi, X., et al. (2015) An Efficient Online Monitoring Method for High-Dimensional Data Streams. Technometrics, 57, 374-387.
https://doi.org/10.1080/00401706.2014.940089
[8]  Kim, J., Abdella, G.M., Kim, S., et al. (2019) Control Charts for Variability Monitoring in High-Dimensional Processes. Computers & Industrial Engineering, 130, 309-316.
https://doi.org/10.1016/j.cie.2019.02.012
[9]  Qi, D., Li, Z. and Wang, Z. (2016) On-Line Monitoring Data Quality of High-Dimensional Data Streams. Journal of Statistical Computation and Simulation, 86, 2204-2216.
https://doi.org/10.1080/00949655.2015.1106542
[10]  Shen, X., Zou, C., Jiang, W. and Tsung, F. (2013) Monitoring Poisson Count Data with Probability Control Limits When Sample Sizes Are Time Varying. Naval Research Logistics, 60, 625-636.
https://doi.org/10.1002/nav.21557
[11]  Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B, 57, 289-300.
https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
[12]  Finner, H., Dickhaus, T. and Roters, M. (2007) Dependency and False Discovery Rate: Asymptotics. The Annals of Statistics, 35, 1432-1455.
https://doi.org/10.1214/009053607000000046
[13]  Fan, J. and Han, X. (2017) Estimation of the False Discovery Proportion with Unknown Dependence. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79, 1143-1164.
https://doi.org/10.1111/rssb.12204
[14]  He, Y., Zhang, X., Wang, P., et al. (2017) High Dimensional Gaussian Copula Graphical Model with FDR Control. Computational Statistics & Data Analysis, 113, 457-474.
https://doi.org/10.1016/j.csda.2016.06.012
[15]  Yuan, P., Kong, Y. and Li, G. (2023) FDR Control and Power Analysis for High-Dimensional Logistic Regression via StabKoff. Statistical Papers.
https://doi.org/10.1007/s00362-023-01501-5
[16]  Barras, L., Scaillet, O. and Wermers, R. (2010) False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas. The Journal of Finance, 65, 179-216.
https://doi.org/10.1111/j.1540-6261.2009.01527.x
[17]  Schwartzman, A., Dougherty, R.F. and Taylor, J.E. (2008) False Discovery Rate Analysis of Brain Diffusion Direction Maps. The Annals of Applied Statistics, 2, 153-175.
https://doi.org/10.1214/07-AOAS133
[18]  Sun, W., Reich, B.J., Tony, C.T., et al. (2015) False Discovery Control in Large-Scale Spatial Multiple Testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 77, 59-83.
https://doi.org/10.1111/rssb.12064
[19]  Du, L., Guo, X., Sun, W., et al. (2023) False Discovery Rate Control under General Dependence by Symmetrized Data Aggregation. Journal of the American Statistical Association, 118, 607-621.
https://doi.org/10.1080/01621459.2021.1945459
[20]  Wasserman, L. and Roeder, K. (2009) High Dimensional Variable Selection. Annals of Statistics, 37, 2178-2201.
https://doi.org/10.1214/08-AOS646

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413