Data modeling played a crucial role in a variety of research domains due to its widespread practical applications, especially when handling complex datasets. This study explored a specific discrete distribution, characterized by a single parameter, developed using the weighted combining discretization method. The statistical properties of this distribution were rigorously derived and expressed mathematically, covering essential aspects such as moments, skewness, kurtosis, covariance, index of dispersion, order statistics, entropies, mean residual life, residual coefficient of variation function, stress-strength models, and premium principles. These properties highlighted the model's suitability for analyzing right-skewed data with heavy tails, making it a powerful tool for probabilistic modeling in situations where data exhibited overdispersion and increasing failure rates. The research introduced a range of estimation techniques, including maximum product of spacings, method of moments, Anderson-Darling, right-tail Anderson-Darling, maximum likelihood, least squares, weighted least squares, Cramer-Von-Mises, and percentile, each explained in detail. A ranking simulation study was performed to assess the performance of these estimators, with ranking techniques used to determine the most effective estimator across various sample sizes. The study further applied the proposed model to real-world datasets, demonstrating its ability to address complex data scenarios and showcasing its superior performance in comparison to traditional models such as the geometric, Poisson, and negative binomial distributions. Overall, the results emphasized the proposed model's potential as a versatile and effective tool for modeling over-dispersed and skewed data, with promising implications for future research in diverse fields.
Citation: Mahmoud El-Morshedy, Mohamed S. Eliwa, Mohamed El-Dawoody, Hend S. Shahen. A weighted hybrid discrete probability model: Mathematical framework, statistical analysis, estimation techniques, simulation-based ranking, and goodness-of-fit evaluation for over-dispersed data[J]. Electronic Research Archive, 2025, 33(4): 2061-2091. doi: 10.3934/era.2025091
Data modeling played a crucial role in a variety of research domains due to its widespread practical applications, especially when handling complex datasets. This study explored a specific discrete distribution, characterized by a single parameter, developed using the weighted combining discretization method. The statistical properties of this distribution were rigorously derived and expressed mathematically, covering essential aspects such as moments, skewness, kurtosis, covariance, index of dispersion, order statistics, entropies, mean residual life, residual coefficient of variation function, stress-strength models, and premium principles. These properties highlighted the model's suitability for analyzing right-skewed data with heavy tails, making it a powerful tool for probabilistic modeling in situations where data exhibited overdispersion and increasing failure rates. The research introduced a range of estimation techniques, including maximum product of spacings, method of moments, Anderson-Darling, right-tail Anderson-Darling, maximum likelihood, least squares, weighted least squares, Cramer-Von-Mises, and percentile, each explained in detail. A ranking simulation study was performed to assess the performance of these estimators, with ranking techniques used to determine the most effective estimator across various sample sizes. The study further applied the proposed model to real-world datasets, demonstrating its ability to address complex data scenarios and showcasing its superior performance in comparison to traditional models such as the geometric, Poisson, and negative binomial distributions. Overall, the results emphasized the proposed model's potential as a versatile and effective tool for modeling over-dispersed and skewed data, with promising implications for future research in diverse fields.
| [1] |
T. Nakagawa, S. Osaki, The discrete Weibull distribution, IEEE Trans. Reliab., 24 (1975), 300–301. https://doi.org/10.1109/TR.1975.5214915 doi: 10.1109/TR.1975.5214915
|
| [2] |
K. B. Kulasekera, D. W. Tonkyn, A new discrete distribution, with applications to survival, dispersal and dispersion, Commun. Stat.- Simul. Comput., 21 (1992), 499–518. https://doi.org/10.1080/03610919208813032 doi: 10.1080/03610919208813032
|
| [3] |
H. Sato, M. Ikota, A. Sugimoto, H. Masuda, A new defect distribution metrology with a consistent discrete exponential formula and its applications, IEEE Trans. Semicond. Manuf., 12 (1999), 409–418. https://doi.org/10.1109/66.806118 doi: 10.1109/66.806118
|
| [4] | J. D. Smith, A review of Finn, Fischer, and Handler (Eds.), collaborative/therapeutic assessment: A casebook and guide, JPA, 95 (2012), 234–235. https://doi.org/10.1080/00223891.2012.730086 |
| [5] |
M. Roederer, A. Treister, W. Moore, L. A. Herzenberg, Probability binning comparison: A metric for quantitating univariate distribution differences, Cytometry, 45 (2001), 37–46. https://doi.org/10.1002/1097-0320(20010901)45:1<37::AID-CYTO1142>3.0.CO;2-E doi: 10.1002/1097-0320(20010901)45:1<37::AID-CYTO1142>3.0.CO;2-E
|
| [6] |
A. Barbiero, A. Hitaj, Discrete approximations of continuous probability distributions obtained by minimizing Cramer-von Mises-type distances, Stat. Papers, 64 (2023), 1669–1697. https://doi.org/10.1007/s00362-022-01356-2 doi: 10.1007/s00362-022-01356-2
|
| [7] | T. Ghosh, D. Roy, N. K. Chandra, Reliability approximation through the discretization of random variables using reversed hazard rate function, Int. J. Math. Comput. Stat. Nat. Phys. Eng., 7 (2013), 96–100. |
| [8] |
S. Chakraborty, Generating discrete analogues of continuous probability distributions-A survey of methods and constructions, J. Stat. Distrib. Appl., 2 (2015), 6. https://doi.org/10.1186/s40488-015-0028-6 doi: 10.1186/s40488-015-0028-6
|
| [9] | S. Kotsiantis, D. Kanellopoulos, Discretization techniques: A recent survey, GESTS Int. Trans. Comput. Sci. Eng., 32 (2006), 47–58. |
| [10] | G. Casella, R. L. Berger, Statistical Inference Vol. 70, Duxbury Press, 1990. Available from: https://philpapers.org/rec/CASSIV. |
| [11] | C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. Available from: https://link.springer.com/book/9780387310732#bibliographic-information. |
| [12] | A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, D. B. Rubin, Bayesian Data Analysis, CRC Press, 2013. https://doi.org/10.1201/b16018 |
| [13] | R. E. Barlow, Statistical theory of reliability and life testing, 1975. https://cir.nii.ac.jp/crid/1571980074720917504 |
| [14] | C. D. Lai, M. Xie, Stochastic Ageing and Dependence for Reliability, Springer, 2006. https://doi.org/10.1007/0-387-34232-X |
| [15] |
B. Singh, R. P. Singh, A. S. Nayal, A. Tyagi, Discrete inverted Nadarajah-Haghighi distribution: Properties and classical estimation with application to complete and censored data, Stat. Optim. Inf. Comput., 10 (2022), 1293–1313. https://doi.org/10.19139/soic-2310-5070-1365 doi: 10.19139/soic-2310-5070-1365
|
| [16] |
D. Roy, Discrete rayleigh distribution, IEEE Trans. Reliab., 53 (2004), 255–260. https://doi.org/10.1109/TR.2004.829161 doi: 10.1109/TR.2004.829161
|
| [17] | T. Hussain, M. Ahmad, Discrete inverse Rayleigh distribution, Pak. J. Stat., 30 (2014). |
| [18] | S. D. Poisson, Probabilité des Jugements en Matière Criminelle et en Matière Civile, Précédées des Règles Générales du Calcul des Probabilités, Paris, France: Bachelier, 1837. |
| [19] |
M. El-Morshedy, M. S. Eliwa, E. Altun, Discrete Burr-Hatke distribution with properties, estimation methods and regression model, IEEE Access, 8 (2020), 74359–74370. https://doi.org/10.1109/ACCESS.2020.2988431 doi: 10.1109/ACCESS.2020.2988431
|
| [20] |
H. Krishna, P. S. Pundir, Discrete Burr and discrete Pareto distributions, Stat. Methodol., 6 (2009), 177–188. https://doi.org/10.1016/j.stamet.2008.07.001 doi: 10.1016/j.stamet.2008.07.001
|
| [21] | D. J. Hand, F. Daly, K. J. McConway, A. D. Lunn, E. O. Ostrowski, A Hand Book of Small Data Sets, Chapman and Hall/CRC, 1993. https://doi.org/10.1201/9780429246579 |
| [22] | J. F. Lawless, Statistical Models and Methods for Lifetime Data, John Wiley & Sons, 2011. |
| [23] |
P. Damien, S. Walker, A Bayesian non-parametric comparison of two treatments, Scand. J. Stat., 29 (2002), 51–56. https://doi.org/10.1111/1467-9469.00891 doi: 10.1111/1467-9469.00891
|