×

A quasi-Bayesian perspective to online clustering. (English) Zbl 1404.62068

The subject of the paper is an analysis of clustering algorithms for a high frequency stream of data. A new adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic estimation of the (unknown and changing) number of clusters is proposed. The resulting clusters are time-dependent. It is proved that the proposed approach is supported by minimax regret bounds. Based on reasoning similar to the one that led to the creation of a reversible-jump MCMC algorithm (cf. P. J. Green [Biometrika 82, No. 4, 711–732 (1995; Zbl 0861.62023)], Q. F. Gronau, H. Singmann and E. Wagenmakers [“bridgesampling: an R package for estimating normalizing constants”, https://doi.org/10.31222/osf.io/v94h6 (2017)] and see the RJMCMC software https://swmath.org/software/21805) an implementation (called PACBO – Probability Approximately Correct Bayesian On-line Clustering, cf. [D. McAllester and T. Akinbiyi, in: Empirical inference. Festschrift in honor of Vladimir N. Vapnik. Berlin: Springer. 95–103 (2013; Zbl 1325.62100)]) is proposed for which a convergence is a guarantee. Numerical experiments illustrate the potential of the procedure.
Software: PACBO (see https://cran.r-project.org/web/packages/PACBO/index.html; https://swmath.org/software/15756), RJMCMC (see https://swmath.org/software/21805).

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
65C05 Monte Carlo methods
62C20 Minimax procedures in statistical decision theory
68T05 Learning and adaptive systems in artificial intelligence

References:

[1] P. Alquier., Transductive and Inductive Adaptive Inference for Regression and Density Estimation. PhD thesis, Université Paris 6, 2006.
[2] P. Alquier and G. Biau. Sparse single-index model., Journal of Machine Learning Research, 14:243–280, 2013. · Zbl 1320.62177
[3] P. Alquier and B. Guedj. An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization., Mathematical Methods of Statistics, 2017. · Zbl 1381.62222 · doi:10.3103/S1066530717010045
[4] P. Alquier and K. Lounici. PAC-Bayesian theorems for sparse regression estimation with exponential weights., Electronic Journal of Statistics, 5:127–145, 2011. · Zbl 1274.62463 · doi:10.1214/11-EJS601
[5] J.-Y. Audibert., Une approche PAC-bayésienne de la théorie statistique de l’apprentissage. PhD thesis, Université Paris 6, 2004.
[6] J.-Y. Audibert. Fast learning rates in statistical inference through aggregation., The Annals of Statistics, 37(4) :1591–1646, 2009. · Zbl 1360.62167 · doi:10.1214/08-AOS623
[7] K. S. Azoury and M. K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions., Machine Learning, 43(3):211–246, 2001. · Zbl 0988.68173 · doi:10.1023/A:1010896012157
[8] W. Barbakh and C. Fyfe. Online clustering algorithms., International Journal of Neural Systems, 18(3):185–194, 2008.
[9] P. L. Bartlett, T. Linder, and G. Lugosi. The minimax distortion redundancy in empirical quantizer design., IEEE Transactions on Information Theory, 44(5) :1802–1813, 1998. · Zbl 0964.94015 · doi:10.1109/18.705560
[10] J.-P. Baudry, C. Maugis, and B. Michel. Slope heuristics: overview and implementation., Statistics and Computing, 22(2):455–470, 2012. · Zbl 1322.62007 · doi:10.1007/s11222-011-9236-1
[11] R. B. Calinski and J. Harabasz. A dendrite method for cluster analysis., Communications in Statistics, 3:1–27, 1974. · Zbl 0273.62010
[12] O. Catoni., Statistical Learning Theory and Stochastic Optimization. École d’Été de Probabilités de Saint-Flour 2001. Springer, 2004. · Zbl 1076.93002
[13] O. Catoni., PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of Lecture notes – Monograph Series. Institute of Mathematical Statistics, 2007. · Zbl 1277.62015
[14] N. Cesa-Bianchi. Analysis of two gradient-based algorithms for on-line regression., Journal of Computer and System Sciences, 59(3):392–411, 1999. · Zbl 0961.68148 · doi:10.1006/jcss.1999.1635
[15] N. Cesa-Bianchi and G. Lugosi., Prediction, Learning and Games. Cambridge University Press, New York, 2006. · Zbl 1114.91001
[16] N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent., IEEE Transactions on Neural Networks, 7(3):604–619, 1996.
[17] N. Cesa-Bianchi, D. Helmbold, N. Freund, Y. Haussler, and M. K. Warmuth. How to use expert advice., Journal of the ACM, 44(3):427–485, 1997. · Zbl 0890.68066 · doi:10.1145/258128.258179
[18] N. Cesa-Bianchi, Y. Mansour, and G. Stoltz. Improved second-order bounds for prediction with expert advice., Machine Learning, 66(2):321–352, 2007. ISSN 1573-0565. URL https://doi.org/10.1007/s10994-006-5001-7. · Zbl 1137.68525
[19] A. Choromanska and C. Monteleoni. Online clustering with experts. In, Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 227–235, 2012.
[20] I. Csiszár. I-divergence geometry of probability distributions and minimization problems., Annals of Probability, 3:146–158, 1975. · Zbl 0318.60013
[21] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting and sharp oracle inequalities. In, Learning theory (COLT 2007), Lecture Notes in Computer Science, pages 97–111, 2007. · Zbl 1203.62063
[22] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity., Machine Learning, 72:39–61, 2008. · Zbl 1470.62054
[23] A. S. Dalalyan and A. B. Tsybakov. Mirror averaging with sparsity priors., Bernoulli, 18(3):914–944, 2012a. · Zbl 1243.62008 · doi:10.3150/11-BEJ361
[24] A. S. Dalalyan and A. B. Tsybakov. Sparse regression learning by aggregation and Langevin Monte-Carlo., Journal of Computer and System Sciences, 78(5) :1423–1443, 2012b. · Zbl 1244.62054 · doi:10.1016/j.jcss.2011.12.023
[25] P. Dellaportas, J. J. Forster, and I. Ntzoufras. On Bayesian model and variable selection using MCMC., Statistics and Computing, 12(1):27–36, 2002. · Zbl 1247.62086 · doi:10.1023/A:1013164120801
[26] A. Fischer. On the number of groups in clustering., Statistics and Probability Letters, 81 :1771–1781, 2011. · Zbl 1225.62083 · doi:10.1016/j.spl.2011.07.005
[27] S. Gerchinovitz., Prédiction de suites individuelles et cadre statistique classique : étude de quelques liens autour de la régression parcimonieuse et des techniques d’agrégation. PhD thesis, Université Paris-Sud, 2011.
[28] A. D. Gordon., Classification, volume 82 of Monographs on Statistics and Applied Probability. Chapman Hall/CRC, Boca Raton, 1999. · Zbl 0929.62068
[29] P. J. Green. Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination., Biometrika, 82(4):711–732, 1995. · Zbl 0861.62023 · doi:10.1093/biomet/82.4.711
[30] B. Guedj and P. Alquier. PAC-Bayesian estimation and prediction in sparse additive models., Electronic Journal of Statistics, 7:264–291, 2013. · Zbl 1337.62075 · doi:10.1214/13-EJS771
[31] B. Guedj and S. Robbiano. PAC-Bayesian high dimensional bipartite ranking., Journal of Statistical Planning and Inference, 2017. · Zbl 1432.62183 · doi:10.1016/j.jspi.2017.10.010
[32] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: theory and practice., IEEE Transactions on Knowledge and Data Engineering, 15(3):511–528, 2003.
[33] J. A. Hartigan., Clustering Algorithms. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons, New York, 1975. · Zbl 0372.62040
[34] L. Kaufman and P. Rousseeuw., Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics. Wiley-Interscience, Hoboken, 1990. · Zbl 1345.62009
[35] J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors., Information and Computation, 132(1):1–63, 1997. · Zbl 0872.68158 · doi:10.1006/inco.1996.2612
[36] J. Kivinen and M. K. Warmuth. Averaging expert predictions. In, Computational Learning Theory: 4th European Conference (EuroCOLT ’99), pages 153–167. Springer, 1999.
[37] A. N. Kolmogorov and V. M. Tikhomirov. \(ϵ\)-entropy and \(ϵ\)-capacity of sets in function spaces., American Mathematical Society Translations, 17:277–364, 1961. · Zbl 0133.06703
[38] Samuel Kotz and Saralees Nadarajah., Multivariate T-Distributions and Their Applications. Cambridge University Press, 2004. · Zbl 1100.62059
[39] W. J. Krzanowski and Y. T. Lai. A criterion for determination the number of clusters in a data set., Biometrics, 44:23–34, 1988. · Zbl 0707.62122 · doi:10.2307/2531893
[40] L. Li., PACBO: PAC-Bayesian Online Clustering, 2016. URL https://CRAN.R-project.org/package=PACBO. R package version 0.1.0.
[41] E. Liberty, R. Sriharsha, and M. Sviridenko. An algorithm for online \(k\)-means clustering. In, Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 81–89. SIAM, 2016. · Zbl 1430.68455
[42] N. Littlestone and M. K. Warmuth. The weighted majority algorithm., Information and Computation, 108(2):212–216, 1994. · Zbl 0804.68121 · doi:10.1006/inco.1994.1009
[43] D. A. McAllester. Some PAC-Bayesian theorems., Machine Learning, 37(3):355–363, 1999a. · Zbl 0945.68157 · doi:10.1023/A:1007618624809
[44] D. A. McAllester. PAC-Bayesian model averaging. In, Proceedings of the 12th annual conference on Computational Learning Theory, pages 164–170. ACM, 1999b.
[45] G. W. Milligan and M. C. Cooper. An examination of procedures for determining the number of clusters in a data set., Psychometrika, 50:159–179, 1985.
[46] A. Petralias and P. Dellaportas. An MCMC model search algorithm for regression problems., Journal of Statistical Computation and Simulation, 83(9) :1722–1740, 2013. · Zbl 1453.62177
[47] C. P. Robert and G. Casella., Monte Carlo Statistical Methods. Springer, New York, 2004. · Zbl 1096.62003
[48] G. O. Roberts and J. S. Rosenthal. Harris Recurrence of Metropolis-Within-Gibbs and Trans-Dimensional Markov Chains., Annals of Applied Probability, 16(4) :2123–2139, 2006. · Zbl 1121.60076 · doi:10.1214/105051606000000510
[49] M. Seeger. PAC-Bayesian generalization bounds for gaussian processes., Journal of Machine Learning Research, 3:233–269, 2002. · Zbl 1088.68745 · doi:10.1162/153244303765208377
[50] M. Seeger., Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, 2003. · Zbl 1088.68745 · doi:10.1162/153244303765208377
[51] J. Shawe-Taylor and R. C. Williamson. A PAC analysis of a Bayes estimator. In, Proceedings of the 10th annual conference on Computational Learning Theory, pages 2–9. ACM, 1997.
[52] R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a dataset via the gap statistic., Journal of the Royal Statistical Society, 63:411–423, 2001. · Zbl 0979.62046 · doi:10.1111/1467-9868.00293
[53] V. Vovk. Competitive on-line statistics., International Statistical Review, 69(2):213–248, 2001. · Zbl 1211.62200 · doi:10.1111/j.1751-5823.2001.tb00457.x
[54] O. Wintenberger. Optimal learning with Bernstein online aggregation., Machine Learning, 106(1):119–141, 2017. · Zbl 1412.68196 · doi:10.1007/s10994-016-5592-6
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.