全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments

DOI: 10.1155/2014/303728

Full-Text   Cite this paper   Add to My Lib

Abstract:

Generalized Estimating Equation (GEE) is a marginal model popularly applied for longitudinal/clustered data analysis in clinical trials or biomedical studies. We provide a systematic review on GEE including basic concepts as well as several recent developments due to practical challenges in real applications. The topics including the selection of “working” correlation structure, sample size and power calculation, and the issue of informative cluster size are covered because these aspects play important roles in GEE utilization and its statistical inference. A brief summary and discussion of potential research interests regarding GEE are provided in the end. 1. Introduction Generalized Estimating Equation (GEE) is a general statistical approach to fit a marginal model for longitudinal/clustered data analysis, and it has been popularly applied into clinical trials and biomedical studies [1–3]. One longitudinal data example can be taken from a study of orthodontic measurements on children including 11 girls and 16 boys. The response is the measurement of the distance (in millimeters) from the center of the pituitary to the pterygomaxillary fissure, which is repeatedly measured at ages 8, 10, 12, and 14 years. The primary goal is to investigate whether there exists significant gender difference in dental growth measures and the temporal trend as age increases [4]. For such data analysis, it is obvious that the responses from the same individual tend to be “more alike”; thus incorporating within-subject and between-subject variations into model fitting is necessary to improve efficiency of the estimation and the power [5]. There are several simple methods existing for repeated data analysis, that is, ANOVA/MANOVA for repeated measures, but the limitation is the incapability of incorporating covariates. There are two types of approaches, mixed-effect models and GEE [6, 7], which are traditional and are widely used in practice now. Of note is that these two methods have different tendencies in model fitting depending on the study objectives. In particular, the mixed-effect model is an individual-level approach by adopting random effects to capture the correlation between the observations of the same subject [7]. On the other hand, GEE is a population-level approach based on a quasilikelihood function and provides the population-averaged estimates of the parameters [8]. In this paper, we focus on the latter to provide a review and recent developments of GEE. As is well known, GEE has several defining features [9–11]. The variance-covariance matrix of responses

References

[1]  Z. Feng, P. Diehr, A. Peterson, and D. McLerran, “Selected statistical issues in group randomized trials,” Annual Review of Public Health, vol. 22, pp. 167–187, 2001.
[2]  G. Fitzmaurice, N. M. Larid, and J. H. Ware, Applied Longitudinal Data, John Wiley & Sons, 2004.
[3]  J. W. Hardin and J. M. Hilbe, Generalized Estimating Equations, Chapman and Hall/CRC Press, Boca Raton, Fla, USA, 2003.
[4]  R. F. Potthoff and S. N. Roy, “A generalized multivariate analysis of variance model useful especially for growth curve problems,” Biometrika, vol. 51, pp. 313–326, 1964.
[5]  L. M. Friedman, C. D. Furberg, and D. L. DeMets, Fundamentals of Clinical Trials, Springer, New York, NY, USA, 3nd edition, 1989.
[6]  K. Y. Liang and S. L. Zeger, “A comparison of two bias-corrected covariance estimators for generalized estimating equations,” Biometrika, vol. 73, pp. 13–22, 1986.
[7]  M. Crowder, “On the use of a working correlation matrix in using generalised linear models for repeated measures,” Biometrika, vol. 82, no. 2, pp. 407–410, 1995.
[8]  R. W. Wedderburn, “Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method,” Biometrika, vol. 61, pp. 439–447, 1974.
[9]  P. Diggle, P. Heagerty, K. Y. Liang, and S. L. Zeger, Analysis of Longitudinal Data, Oxford University Press, Oxford, UK, 2002.
[10]  G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs, Longitudinal Data Anlaysis, Chapman & Hall/CRC Press, 2008.
[11]  D. Hedeker and R. D. Gibbons, Analysis of Longitudinal Data, John Wiley & Sons, 2006.
[12]  P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall, London, UK, 1989.
[13]  N. R. Chaganty and H. Joe, “Range of correlation matrices for dependent Bernoulli random variables,” Biometrika, vol. 93, no. 1, pp. 197–206, 2006.
[14]  R. T. Sabo and N. R. Chaganty, “What can go wrong when ignoring correlation bounds in the use of generalized estimating equations,” Statistics in Medicine, vol. 29, no. 24, pp. 2501–2507, 2010.
[15]  B. C. Sutradhar and K. Das, “On the efficiency of regression estimators in generalised linear models for longitudinal data,” Biometrika, vol. 86, no. 2, pp. 459–465, 1999.
[16]  Y.-G. Wang and V. Carey, “Working correlation structure misspecification, estimation and covariate design: implications for generalised estimating equations performance,” Biometrika, vol. 90, no. 1, pp. 29–41, 2003.
[17]  S. R. Lipsitz, G. Molenberghs, G. M. Fitzmaurice, and J. Ibrahim, “GEE with Gaussian estimation of the correlations when data are incomplete,” Biometrics, vol. 56, no. 2, pp. 528–536, 2000.
[18]  Y.-G. Wang and V. J. Carey, “Unbiased estimating equations from working correlation models for irregularly timed repeated measures,” Journal of the American Statistical Association, vol. 99, no. 467, pp. 845–853, 2004.
[19]  A. Qu and B. G. Lindsay, “Building adaptive estimating equations when inverse of covariance estimation is difficult,” Journal of the Royal Statistical Society B: Statistical Methodology, vol. 65, no. 1, pp. 127–142, 2003.
[20]  S. R. Lipsitz and G. M. Fitzmaurice, “Estimating equations for measures of association between repeated binary responses,” Biometrics, vol. 52, no. 3, pp. 903–912, 1996.
[21]  Y. Lee and J. A. Nelder, “Conditional and marginal models: another view,” Statistical Science, vol. 19, no. 2, pp. 219–238, 2004.
[22]  Y. Lee and J. A. Nelder, “Likelihood inference for models with unobservables: another view,” Statistical Science, vol. 24, no. 3, pp. 255–269, 2009.
[23]  A. Qu, B. G. Lindsay, and B. Li, “Improving generalised estimating equations using quadratic inference functions,” Biometrika, vol. 87, no. 4, pp. 823–836, 2000.
[24]  G. Kauermann and R. J. Carroll, “A note on the efficiency of sandwich covariance matrix estimation,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1387–1396, 2001.
[25]  Y. G. Wang and L. Y. Hin, “Modeling strategies in longitudinal data analysis: covariate, variance function and correlation structure selection,” Computational Statistics and Data Analysis, vol. 54, no. 12, pp. 3359–3370, 2010.
[26]  W. Pan, “Goodness-of-fit tests for GEE with correlated binary data,” Scandinavian Journal of Statistics, vol. 29, no. 1, pp. 101–110, 2002.
[27]  A. M. Wood, I. R. White, and P. Royston, “How should variable selection be performed with multiply imputed data?” Statistics in Medicine, vol. 27, no. 17, pp. 3227–3246, 2008.
[28]  M. D. Begg and M. K. Parides, “Separation of individual-level and cluster-level covariate effects in regression analysis of correlated data,” Statistics in Medicine, vol. 22, no. 16, pp. 2591–2602, 2003.
[29]  L. Y. Hin, V. J. Carey, and Y. G. Wang, “Criteria for working-correlation-structure selection in GEE: assessment via simulation,” The American Statistician, vol. 61, no. 4, pp. 360–364, 2007.
[30]  J. X. Pan and G. Mackenzie, “On modelling mean-covariance structures in longitudinal studies,” Biometrika, vol. 90, no. 1, pp. 239–244, 2003.
[31]  M. Davidian and R. J. Carroll, “Variance function estimation,” Journal of the American Statistical Association, vol. 82, no. 400, pp. 1079–1091, 1987.
[32]  M. Pourahmadi, “Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation,” Biometrika, vol. 86, no. 3, pp. 677–690, 1999.
[33]  S. Konishi and G. Kitagawa, “Generalised information criteria in model selection,” Biometrika, vol. 83, no. 4, pp. 875–890, 1996.
[34]  B. Zhang, “Summarizing the goodness of fit o f generalized linear models for longitudinal data,” Statistics in Medicine, vol. 19, pp. 1265–1275, 2000.
[35]  A. Rotnitzky and N. P. Jewell, “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data,” Biometrika, vol. 77, no. 3, pp. 485–497, 1990.
[36]  J. Shults and N. R. Chaganty, “Analysis of serially correlated data using quasi-least squares,” Biometrics, vol. 54, no. 4, pp. 1622–1630, 1998.
[37]  V. J. Carey and Y.-G. Wang, “Working covariance model selection for generalized estimating equations,” Statistics in Medicine, vol. 30, no. 26, pp. 3117–3124, 2011.
[38]  W. Pan, “Akaike's information criterion in generalized estimating equations,” Biometrics, vol. 57, no. 1, pp. 120–125, 2001.
[39]  H. Akaike, “Information theory and an extension of the maximum likelihood principle,” in Proceedings of the 2nd International Symposium on Information Theory, vol. 15, pp. 267–281, 1973.
[40]  J. A. Nelder and Y. Lee, “Likelihood, quasi-likelihood and pseudolikelihood: some comparisons,” Journal of the Royal Statistical Society B, vol. 54, no. 1, pp. 273–284, 1992.
[41]  J. Cui, “QIC program and model selection in GEE analyses,” The Stata Journal, vol. 7, no. 2, pp. 209–220, 2007.
[42]  J. Cui and G. Qian, “Selection of working correlation structure and best model in GEE analyses of longitudinal data,” Communications in Statistics—Simulation and Computation, vol. 36, no. 4–6, pp. 987–996, 2007.
[43]  L. Y. Hin and Y. G. Wang, “Working-correlation-structure identification in generalized estimating equations,” Statistics in Medicine, vol. 28, no. 4, pp. 642–658, 2009.
[44]  J. A. Nelder and D. Pregibon, “An extended quasi-likelihood function,” Biometrika, vol. 74, no. 2, pp. 221–232, 1987.
[45]  M. Wang, M. Kong, and S. Datta, “Inference for marginal linear models for clustered longitudinal data with potentially informative cluster sizes,” Statistical Methods in Medical Research, vol. 20, no. 4, pp. 347–367, 2011.
[46]  E. Cantoni, J. M. Flemming, and E. Ronchetti, “Variable selection for marginal longitudinal generalized linear models,” Biometrics. Journal of the International Biometric Society, vol. 61, no. 2, pp. 507–514, 2005.
[47]  Y.-G. Wang and X. Lin, “Effects of variance-function misspecification in analysis of longitudinal data,” Biometrics, vol. 61, no. 2, pp. 413–421, 2005.
[48]  N. R. Chaganty and H. Joe, “Efficiency of generalized estimating equations for binary responses,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 66, no. 4, pp. 851–860, 2004.
[49]  M. Gosho, C. Hamada, and I. Yoshimura, “Criterion for the selection of a working correlation structure in the generalized estimating equation approach for longitudinal balanced data,” Communications in Statistics, vol. 40, no. 21, pp. 3839–3856, 2011.
[50]  M. Gosho, C. Hamada, and I. Yoshimura, “Selection of working correlation structure in weighted generalized estimating equation method for incomplete longitudinal data,” Communications in Statistics, vol. 43, no. 1, pp. 62–81, 2014.
[51]  M. J. Jang, Working correlation selection in generalized estimating equations [Dissertation], University of Iowa, 2011.
[52]  J. Chen and N. A. Lazar, “Selection of working correlation structure in generalized estimating equations via empirical likelihood,” Journal of Computational and Graphical Statistics, vol. 21, no. 1, pp. 18–41, 2012.
[53]  P. M. Westgate, “A bias-corrected covariance estimator for improved inference when using an unstructured correlation with quadratic inference functions,” Statistics and Probability Letters, vol. 83, no. 6, pp. 1553–1558, 2013.
[54]  P. M. Westgate, “Criterion for the simultaneous selection of a working correlation structure and either generalized estimating equations or the quadratic inference function approach,” Biometrical Journal, vol. 56, no. 3, pp. 461–476, 2014.
[55]  P. M. Westgate, “Improving the correlation structure selection approach for generalized estimating equations and balanced longitudinal data,” Statistics in Medicine, vol. 33, no. 13, pp. 2222–2237, 2014.
[56]  J. Ye, “On measuring and correcting the effects of data mining and model selection,” Journal of the American Statistical Association, vol. 93, no. 441, pp. 120–131, 1998.
[57]  J. J. Shuster, Practical Handbook of Sample Size Guidelines for Clinical Trials, CRC Press, Boca Raton, Fla, USA, 1993.
[58]  G. Liu and K.-Y. Liang, “Sample size calculations for studies with correlated observations,” Biometrics, vol. 53, no. 3, pp. 937–947, 1997.
[59]  W. J. Shih, “Sample size and power calculations for periodontal and other studies with clustered samples using the method of generalized estimating equations,” Biometrical Journal, vol. 39, no. 8, pp. 899–908, 1997.
[60]  S. R. Lipsitz and G. M. Fitzmaurice, “Sample size for repeated measures studies with binary responses,” Statistics in Medicine, vol. 13, no. 12, pp. 1233–1239, 1994.
[61]  W. Pan, “Sample size and power calculations with correlated binary data,” Controlled Clinical Trials, vol. 22, no. 3, pp. 211–227, 2001.
[62]  N. Breslow, “Tests of hypotheses in overdispersed Poisson regression and other quasi likelihood models,” Journal of the American Statistical Association, vol. 85, pp. 565–571, 1990.
[63]  E. W. Lee and N. Dubin, “Estimation and sample size considerations for clustered binary responses,” Statistics in Medicine, vol. 13, no. 12, pp. 1241–1252, 1994.
[64]  D. J. Sargent, J. A. Sloan, and S. S. Cha, “Sample size and design considerations for phase II clinical trials with correlated observations,” Controlled Clinical Trials, vol. 20, no. 3, pp. 242–252, 1999.
[65]  C. S. Li, “Semiparametric negative binomial regression models,” Communications in Statistics: Simulation and Computation, vol. 39, no. 3, pp. 475–486, 2010.
[66]  W. H. Greene, “Accounting for excess zeros and sample selection in Poisson and negative binomial regression models,” Tech. Rep., New York University, 1994.
[67]  P. Lambert, “Modeling of repeated series of count data measured at unequally spaced times,” Applied Statistics, vol. 45, pp. 31–38, 1996.
[68]  M. S. Pepe and G. L. Anderson, “A cautionary note on in ference for marginal regression models with longitudinal data and general correlated response data,” Communications in Statistics, Series B, vol. 23, pp. 939–951, 1994.
[69]  M. Wang and Q. Long, “Modified robust variance estimator for generalized estimating equations with improved small-sample performance,” Statistics in Medicine, vol. 30, no. 11, pp. 1278–1291, 2011.
[70]  M. Taljaard, A. D. McRae, C. Weijer et al., “Inadequate reporting of research ethics review and informed consent in cluster randomised trials: Review of random sample of published trials,” British Medical Journal, vol. 342, Article ID d2496, 2011.
[71]  L. A. Mancl and T. A. DeRouen, “A covariance estimator for GEE with improved small-sample properties,” Biometrics, vol. 57, no. 1, pp. 126–134, 2001.
[72]  M. P. Fay and B. I. Graubard, “Small-sample adjustments for Wald-type tests using sandwich estimators,” Biometrics, vol. 57, no. 4, pp. 1198–1206, 2001.
[73]  W. Pan, “On the robust variance estimator in generalised estimating equations,” Biometrika, vol. 88, no. 3, pp. 901–906, 2001.
[74]  W. Pan and M. M. Wall, “Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations,” Statistics in Medicine, vol. 21, no. 10, pp. 1429–1441, 2002.
[75]  X. Guo, W. Pan, J. E. Connett, P. J. Hannan, and S. A. French, “Small-sample performance of the robust score test and its modifications in generalized estimating equations,” Statistics in Medicine, vol. 24, no. 22, pp. 3479–3495, 2005.
[76]  D. M. Farewell, “Marginal analyses of longitudinal data with an informative pattern of observations,” Biometrika, vol. 97, no. 1, pp. 65–78, 2010.
[77]  J. D. Beck, T. Sharp, G. G. Koch, and S. Offenbacher, “A 5-year study of attachment loss and tooth loss in community-dwelling older adults,” Journal of Periodontal Research, vol. 32, no. 6, pp. 516–523, 1997.
[78]  S. J. Arbes Jr., H. ágústsdóttir, and G. D. Slade, “Environmental tobacco smoke and periodontal disease in the United States,” American Journal of Public Health, vol. 91, no. 2, pp. 253–257, 2001.
[79]  J. M. Robins, A. Rotnitzky, and L. P. Zhao, “Analysis of semiparametric regression models for repeated outcomes in the presence of missing data,” Journal of the American Statistical Association, vol. 90, pp. 106–121, 1995.
[80]  E. B. Hoffman, P. K. Sen, and C. R. Weinberg, “Within-cluster resampling,” Biometrika, vol. 88, no. 4, pp. 1121–1134, 2001.
[81]  J. M. Williamson, S. Datta, and G. A. Satten, “Marginal analyses of clustered data when cluster size is informative,” Biometrics, vol. 59, no. 1, pp. 36–42, 2003.
[82]  E. Benhin, J. N. Rao, and A. J. Scott, “Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes,” Biometrika, vol. 92, no. 2, pp. 435–450, 2005.
[83]  X. J. Cong, G. Yin, and Y. Shen, “Marginal analysis of correlated failure time data with informative cluster sizes,” Biometrics, vol. 63, no. 3, pp. 663–672, 2007.
[84]  T. C. Chiang and K. Y. Lee, “Efcient estimation methods for informative cluster size data,” Statistical Sinica, vol. 80, pp. 121–123, 2008.
[85]  M. Pavlou, S. R. Seaman, and A. J. Copas, “An examination of a method for marginal inference when the cluster size is informative,” Statistica Sinica, vol. 23, no. 2, pp. 791–801, 2013.
[86]  S. R. Seaman, M. Pavlou, and A. J. Copas, “Methods for observed-cluster inference when cluster size is informative: a review and clarifications,” Biometrics, vol. 70, no. 2, pp. 449–456, 2014.
[87]  Z. Chen, B. Zhang, and P. S. Albert, “A joint modeling approach to data with informative cluster size: robustness to the cluster size model,” Statistics in Medicine, vol. 30, no. 15, pp. 1825–1836, 2011.
[88]  Y. Huang and B. Leroux, “Informative cluster sizes for subcluster-level covariates and weighted generalized estimating equations,” Biometrics, vol. 67, no. 3, pp. 843–851, 2011.
[89]  B. F. Kurland, L. L. Johnson, B. L. Egleston, and P. H. Diehr, “Longitudinal data with follow-up truncated by death: match the analysis method to research aims,” Statistical Science, vol. 24, no. 2, pp. 211–222, 2009.
[90]  J. M. Neuhaus and C. E. McCulloch, “Estimation of covariate effects in generalized linear mixed models with informative cluster sizes,” Biometrika, vol. 98, no. 1, pp. 147–162, 2011.
[91]  S. R. Lipsitz, G. M. Fitzmaurice, E. J. Orav, and N. M. Laird, “Performance of generalized estimating equations in practical situations,” Biometrics, vol. 50, no. 1, pp. 270–278, 1994.
[92]  D. B. Hall and T. A. Severini, “Extended generalized estimating equations for clustered data,” Journal of the American Statistical Association, vol. 93, no. 444, pp. 1365–1375, 1998.
[93]  C.-W. Shen and Y.-H. Chen, “Model selection for generalized estimating equations accommodating dropout missingness,” Biometrics, vol. 68, no. 4, pp. 1046–1054, 2012.
[94]  C.-W. Shen and Y.-H. Chen, “Model selection of generalized estimating equations with multiply imputed longitudinal data,” Biometrical Journal, vol. 55, no. 6, pp. 899–911, 2013.
[95]  D. B. Rubin, “Inference and missing data,” Biometrika, vol. 63, no. 3, pp. 581–592, 1976.
[96]  R. J. Little and D. B. Rubin, Statistical Analysis with Missing Data, Wiley, New York, NY, USA.
[97]  P. Diggle, D. Farewell, and R. Henderson, “Analysis of longitudinal data with drop-out: objectives, assumptions and a proposal,” Journal of the Royal Statistical Society C, vol. 56, no. 5, pp. 499–550, 2007.
[98]  A. J. Copas and S. R. Seaman, “Bias from the use of generalized estimating equations to analyze incomplete longitudinal binary data,” Journal of Applied Statistics, vol. 37, no. 6, pp. 911–922, 2010.
[99]  L. Wang, J. Zhou, and A. Qu, “Penalized generalized estimating equations for high-dimensional longitudinal data analysis,” Biometrics, vol. 68, no. 2, pp. 353–360, 2012.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413