×

Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. (English) Zbl 1403.62028

Summary: A hidden truncation hyperbolic (HTH) distribution is introduced and finite mixtures thereof are applied for clustering. A stochastic representation of the HTH distribution is given and a density is derived. A hierarchical representation is described, which aids in parameter estimation. Finite mixtures of HTH distributions are presented and their identifiability is proved. The convexity of the HTH distribution is discussed, which is important in clustering applications, and some theoretical results in this direction are presented. The relationship between the HTH distribution and other skewed distributions in the literature is discussed. Illustrations are provided – both of the HTH distribution and application of finite mixtures thereof for clustering.

MSC:

62E15 Exact distribution theory in statistics
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62H12 Estimation in multivariate analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Arellano-Valle, R. B.; Azzalini, A., On the unification of families of skew-normal distributions, Scand. J. Stat., 33, 561-574 (2006) · Zbl 1117.62051
[2] Arellano-Valle, R. B.; Bolfarine, H.; Lachos, V. H., Bayesian inference for skew-normal linear mixed models, J. Appl. Stat., 34, 663-682 (2007) · Zbl 1516.62125
[3] Arellano-Valle, R. B.; Genton, M. G., On fundamental skew distributions, J. Multivariate Anal., 96, 93-116 (2005) · Zbl 1073.62049
[4] Arnold, B. C.; Beaver, R. J., Hidden truncation models, Sankhyā Ser. A, 62, 23-35 (2000) · Zbl 0973.62041
[5] Arnold, B. C.; Beaver, R. J., Skewed multivariate models related to hidden truncation and/or selective reporting (with discussion), TEST, 11, 7-54 (2002) · Zbl 1033.62013
[6] Arnold, B. C.; Beaver, R. J.; Groeneveld, R. A.; Meeker, W. Q., The nontruncated marginal of a truncated bivariate normal distribution, Psychometrika, 58, 471-488 (1993) · Zbl 0794.62075
[7] Arslan, O., Variance-mean mixture of the multivariate skew normal distribution, Statist. Papers, 56, 353-378 (2015) · Zbl 1309.62043
[8] Azzalini, A.; Browne, R. P.; Genton, M. G.; McNicholas, P. D., On nomenclature for, and the relative merits of, two formulations of skew distributions, Statist. Probab. Lett., 110, 201-206 (2016) · Zbl 1376.60024
[9] Banfield, J. D.; Raftery, A. E., Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821 (1993) · Zbl 0794.62034
[10] Baricz, Á., On a product of modified Bessel functions, Proc. Amer. Math. Soc., 137, 189-193 (2009) · Zbl 1195.33008
[11] Barndorff-Nielsen, O., Hyperbolic distributions and distributions on hyperbolae, Scand. J. Stat., 5, 151-157 (1978) · Zbl 0386.60018
[12] Browne, R. P.; McNicholas, P. D., A mixture of generalized hyperbolic distributions, Canad. J. Statist., 43, 176-198 (2015) · Zbl 1320.62144
[13] Campbell, J. G.; Fraley, C.; Murtagh, F.; Raftery, A. E., Linear flaw detection in woven textiles using model-based clustering, Pattern Recognit. Lett., 18, 1539-1548 (1997)
[14] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit., 28, 781-793 (1995)
[15] Celeux, G.; Hurn, M.; Robert, C. P., Computational and inferential difficulties with mixture posterior distributions, J. Amer. Statist. Assoc., 95, 957-970 (2000) · Zbl 0999.62020
[16] Charytanowicz, M.; Niewczas, J.; Kulczycki, P.; Kowalski, P. A.; Lukasik, S.; Zak, S., Complete gradient clustering algorithm for features analysis of x-ray images, (Information Technologies in Biomedicine. Information Technologies in Biomedicine, Advances in Intelligent and Soft Computing, vol. 69 (2010), Springer: Springer Berlin/Heidelberg), 15-24
[17] Chen, J.; Tan, X.; Zhang, R., Inference for normal mixture in mean and variance, Statist. Sinica, 18, 443-465 (2008) · Zbl 1135.62018
[18] Cramér, H., Mathematical Methods of Statistics (1946), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 0063.01014
[19] Day, N. E., Estimating the components of a mixture of normal distributions, Biometrika, 56, 463-474 (1969) · Zbl 0183.48106
[20] Demarta, S.; McNeil, A. J., The \(t\) copula and related copulas, Int. Statist. Rev., 73, 111-129 (2005) · Zbl 1104.62060
[21] Dharmadhikari, S. W.; Joag-Dev, K., Unimodality, Convexity, and Applications (1988), Academic Press: Academic Press Boston, MA · Zbl 0646.62008
[22] Eberlein, E.; Keller, U., Hyperbolic distributions in finance, Bernoulli, 1, 281-299 (1995) · Zbl 0836.62107
[23] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation, J. Amer. Statist. Assoc., 97, 611-631 (2002) · Zbl 1073.62545
[24] Franczak, B. C.; Browne, R. P.; McNicholas, P. D., Mixtures of shifted asymmetric Laplace distributions, IEEE Trans. Pattern Anal. Mach. Intell., 36, 1149-1157 (2014)
[25] Galimberti, G.; Soffritti, G., Multivariate linear regression with non-normal errors: A solution based on mixture models, Statist. Comput., 21, 523-536 (2011) · Zbl 1221.62106
[26] Galimberti, G.; Soffritti, G., A multivariate linear regression analysis using finite mixtures of \(t\) distributions, Comput. Statist. Data Anal., 71, 138-150 (2014) · Zbl 1471.62070
[27] Good, I. J., The population frequencies of species and the estimation of population parameters, Biometrika, 40, 237-264 (1953) · Zbl 0051.37103
[28] Hathaway, R. J., A constrained formulation of maximum-likelihood estimation for normal mixture distributions, Ann. Statist., 13, 795-800 (1985) · Zbl 0576.62039
[29] Ho, H. J.; Lin, T.-I.; Chen, H.-Y.; Wang, W.-L., Some results on the truncated multivariate \(t\) distribution, J. Statist. Plann. Inference, 142, 25-40 (2012) · Zbl 1229.62068
[30] Holzmann, H.; Munk, A.; Gneiting, T., Identifiability of finite mixtures of elliptical distributions, Scand. J. Stat., 33, 753-763 (2006) · Zbl 1164.62354
[31] Hubert, L.; Arabie, P., Comparing partitions, J. Classification, 2, 193-218 (1985)
[32] Jørgensen, B., Statistical Properties of the Generalized Inverse Gaussian Distribution (1982), Springer-Verlag: Springer-Verlag New York · Zbl 0486.62022
[33] Karlis, D.; Meligkotsidou, L., Finite mixtures of multivariate Poisson distributions with application, J. Statist. Plann. Inference, 137, 1942-1960 (2007) · Zbl 1116.60006
[34] Kotz, S.; Kozubowski, T. J.; Podgorski, K., The Laplace Distribution and Generalizations: A Revisit With Applications To Communications, Economics, Engineering, and Finance (2001), Burkhäuser: Burkhäuser Boston · Zbl 0977.62003
[35] Lee, S. X.; McLachlan, G. J., EMMIXuskew: An R package for fitting mixtures of multivariate skew \(t\) distributions via the EM algorithm, J. Statist. Softw., 55, 1-22 (2013)
[36] Lee, S. X.; McLachlan, G. J., Model-based clustering and classification with non-normal mixture distributions, Statist, Methods Appl., 22, 427-454 (2013) · Zbl 1332.62209
[37] Lee, S. X.; McLachlan, G. J., Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results, Statist. Comput., 24, 181-202 (2014) · Zbl 1325.62107
[38] Leroux, B. G., Consistent estimation of a mixing distribution, Ann. Statist., 20, 1350-1360 (1992) · Zbl 0763.62015
[39] Lin, T.-I., Robust mixture modeling using multivariate skew-\(t\) distributions, Statist. Comput., 20, 343-356 (2010)
[40] Lin, T.-I., Maximum likelihood estimation for multivariate skew normal mixture models, J. Multivariate Anal., 100, 257-265 (2009) · Zbl 1152.62034
[41] Lin, T.-I.; Lee, J. C.; Hsieh, W. J., Robust mixture modeling using the skew-\(t\) distribution, Statist. Comput., 17, 81-92 (2007)
[42] McLachlan, G.; Peel, D., Finite Mixture Models (2000), Wiley: Wiley New York · Zbl 0963.62061
[43] McNeil, A. J.; Frey, R.; Embrechts, P., Quantitative Risk Management: Concepts, Techniques and Tools (2005), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 1089.91037
[44] McNicholas, P. D., Mixture Model-Based Classification (2016), Chapman & Hall/CRC Press: Chapman & Hall/CRC Press Boca Raton, FL
[45] McNicholas, P. D., Model-based clustering, J. Classification, 33, 331-373 (2016) · Zbl 1364.62155
[46] McNicholas, S. M.; McNicholas, P. D.; Browne, R. P., A mixture of variance-gamma factor analyzers, (Ahmed, S. E., Big and Complex Data Analysis: Methodologies and Applications (2017), Springer: Springer Cham), 369-385 · Zbl 1381.62187
[47] McNicholas, P. D.; Murphy, T. B., Parsimonious Gaussian mixture models, Statist. Comput., 18, 285-296 (2008)
[48] Meng, X.-L.; Rubin, D. B., Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80, 267-278 (1993) · Zbl 0778.62022
[49] Montanari, A.; Viroli, C., A skew-normal factor model for the analysis of student satisfaction towards university courses, J. Appl. Stat., 43, 473-487 (2010) · Zbl 1511.62401
[50] Morris, K.; McNicholas, P. D., Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures, Comput. Statist. Data Anal., 97, 133-150 (2016) · Zbl 1468.62144
[51] Murray, P. M.; Browne, R. P.; McNicholas, P. D., Mixtures of skew-\(t\) factor analyzers, Comput. Statist. Data Anal., 77, 326-335 (2014) · Zbl 1506.62132
[52] Pyne, S.; Hu, X.; Wang, K.; Rossin, E.; Lin, T.-I.; Maier, L. M.; Baecher-Allan, C.; McLachlan, G. J.; Tamayo, P.; Hafler, D. A.; De Jager, P. L.; Mesirow, J. P., Automated high-dimensional flow cytometric data analysis, Proc. Natl. Acad. Sci., 106, 8519-8524 (2009)
[53] Redner, R. A.; Walker, H. F., Mixture densities, maximum likelihood and the EM algorithm, SIAM Rev., 26, 195-239 (1984) · Zbl 0536.62021
[54] Sahu, S. K.; Dey, D. K.; Branco, M., A new class of multivariate skew distributions with application to Bayesian regression models, Canad. J. Statist., 31, 129-150 (2003) · Zbl 1039.62047
[55] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 461-464 (1978) · Zbl 0379.62005
[56] Stephens, M., Dealing with label switching in mixture models, J. R. Stat. Soc. Ser. B Stat. Methodol., 62, 795-809 (2000) · Zbl 0957.62020
[57] Teicher, H., Identifiability of finite mixtures, Ann. Math. Statist., 34, 1265-1269 (1963) · Zbl 0137.12704
[60] Ueda, N.; Nakano, R., Deterministic annealing em algorithm, Neural Netw., 11, 271-282 (1998)
[61] Wang, W.-L.; Lin, T.-I., Maximum likelihood inference for the multivariate mixture model, J. Multivariate Anal., 149, 54-64 (2016) · Zbl 1341.62138
[65] Yao, W., A profile likelihood method for normal mixture with unequal variance, J. Statist. Plann. Inference, 140, 2089-2098 (2010) · Zbl 1184.62029
[66] Yao, W., Bayesian mixture labeling and clustering, Commun. Stat. - Theory Methods, 41, 403-421 (2012) · Zbl 1244.62038
[67] Yao, W.; Lindsay, B. G., Bayesian mixture labeling by highest posterior density, J. Amer. Statist. Assoc., 104, 758-767 (2009) · Zbl 1388.62007
[68] Zhou, H.; Lange, K. L., On the bumpy road to the dominant mode, Scand. J. Statist., 37, 612-631 (2010) · Zbl 1226.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.