Objectives: We introduce a special form of the Generalized Poisson Distribution. The
distribution has one parameter, yet it has a variance that is larger than the
mean a phenomenon known as “over dispersion”. We discuss potential applications
of the distribution as a model of counts, and under the assumption of
independence we will perform statistical inference on the ratio of two means,
with generalization to testing the homogeneity of several means. Methods: Bayesian methods depend on the choice of the prior distributions of
the population parameters. In this paper, we describe a Bayesian approach for estimation and inference on the parameters of
several independent Inflated Poisson (IPD) distributions with two
possible priors, the first is the reciprocal of
the square root of the Poisson parameter and the other is a conjugate Gamma
prior. The parameters of Gamma distribution are estimated in the
empirical Bayesian framework using the maximum likelihood (ML) solution using
nonlinear mixed model (NLMIXED) in SAS. With these priors we construct the highest
posterior confidence intervals on the ratio of two IPD parameters and test the
homogeneity of several populations. Results: We encountered convergence problem in estimating the
hyperparameters of the posterior distribution using the NLMIXED. However,
direct maximization of the predictive density produced solutions to the maximum
likelihood equations. We apply the methodologies to RNA-SEQ read count data of
gene expression values.
References
[1]
Carlin, B.P. and Louis, T.A. (2000) Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall/CRC Press, Boca Raton.
[2]
Cox, D.R. (1983) Some Remarks on Overdispersion. Biometrika, 70, 269-274.
https://doi.org/10.1093/biomet/70.1.269
[3]
Hinde, J. and Demetrio, C.G.B. (1998) Overdispersion: Models and Estimation. Computational Statistics and Data Analysis, 27, 151-170.
https://doi.org/10.1016/S0167-9473(98)00007-3
[4]
Hayat, M.J. and Higgins, M. (2014) Understanding Poisson regression. Journal of Nursing Education, 53, 207-215. https://doi.org/10.3928/01484834-20140325-04
Joe, H. and Zhu, R. (2005) Generalized Poisson Distribution: the Property of Mixture of Poisson and Comparison with Negative Binomial Distribution. Biometrical Journal, 47, 219-229. https://doi.org/10.1002/bimj.200410102
[7]
Consul, P.C. and Jain, G.C. (1973) A Generalization of the Poisson Distribution. Technometrics, 15, 791-799. https://doi.org/10.1080/00401706.1973.10489112
Janardan, K.G. and Schaeffer, D.J. (1977) Models for the Analysis of Chromosomal Aberrations in Human Leukocytes. Biometrical Journal, 19, 599-612.
https://doi.org/10.1002/bimj.4710190804
[10]
Shoukri, M.M. and Mian, I.U.H. (1991) Some Aspects of Statistical Inference on the Lagrange (Generalized) Poisson Distribution. Communication in Statistics: Computations and Simulations, 20, 1115-1137. https://doi.org/10.1080/03610919108812999
[11]
Tanner, J.C. (1961) A Derivation of Borel Distribution. Biometrika, 48, 222-224.
https://doi.org/10.1093/biomet/48.1-2.222
[12]
Consul, P.C. and Shoukri, M.M. (1988) Some Chance Mechanisms Related to a Generalized Poisson Probability Model. American Journal of Mathematical and Management Sciences, 8, 181-202. https://doi.org/10.1080/01966324.1988.10737237
[13]
Srivastava, S. and Chen, L. (2010) A Two-Parameter Generalized Poisson Model to Improve the Analysis of RNA-Seq Data. Nucleic Acids Research, 38, e170.
https://doi.org/10.1093/nar/gkq670
[14]
Wilson, E.B. and Hilferty, M.M. (1931) The Distribution of Chi-Square. Proceedings of the National Academy of Sciences of the United States of America, 17, 684-688.
https://doi.org/10.1073/pnas.17.12.684
[15]
Consul, P.C. and Shenton, L.R. (1973). Use of Lagrange Expansion for Generating Discrete Generalized Probability Distributions. SIAM Journal of Applied Mathematics, 23, 239-248. https://doi.org/10.1137/0123026
[16]
Haight, F.A. and Breuer, M.A. (1960) The Borel-Tanner Distribution. Biometrika, 47, 143-150. https://doi.org/10.1093/biomet/47.1-2.143
[17]
Shoukri, M.M. and Aleid, M. (2022) Quasi-Negative Binomial: Properties, Parametric Estimation, Regression Model and Application to RNA-SEQ Data. Open Journal of Statistics, 12, 216-237. https://doi.org/10.4236/ojs.2022.122016
[18]
Koch, C.M., Chiu, S.F., Akbarpour, M., Bahart, A., Ridge, K.M., Bartom, E.T. and Winter, D.R. (2018) A Beginner’s Guide to Analysis of RNA Sequencing Data. American Journal of Respiratory Cell and Molecular Biology, 59, 145-157.
https://doi.org/10.1165/rcmb.2017-0430TR
[19]
Pan, W. (2002) A Comparative Review of Statistical Methods for Discovering Differentially Expressed Genes in Replicated Microarray Experiments. Bioinformatics, 18, 546-554. https://doi.org/10.1093/bioinformatics/18.4.546
[20]
Auer, P.L. and Doerge, R.W. (2011) A Two-Stage Poisson Model for Testing RNA-Seq Data. Statistical Applications in Genetics and Molecular Biology, 10, 1-26.
https://doi.org/10.2202/1544-6115.1627
[21]
Yoon, S. and Nam, D. (2017) Gene Dispersion Is the Key Determinant of the Read Count Bias in Differential Expression Analysis of RNA-Seq Data. BMC Genomics, 18, Article No. 408. https://doi.org/10.1186/s12864-017-3809-0
[22]
Robinson, M.D. and Smyth, G.K. (2008) Small-Sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data. Biostatistics, 9, 321-332.
https://doi.org/10.1093/biostatistics/kxm030
[23]
Badampudi, D. (2018) Decision-Making Support for Choosing among Different Component Origins. Blekinge Institute of Technology, Karlskrona.
[24]
Good, I.J. (1975) The Lagrange Distributions and Branching Processes. SIAM Journal on Applied Mathematics, 28, 270-275. https://doi.org/10.1137/0128022