Semi-supervised learning using ensembles of multiple 1D-embedding-based label boosting. (English) Zbl 1334.68194
Summary: The paper continues the development of the multiple 1D-embedding-based (or, 1D multi-embedding) methods for semi-supervised learning, which is preliminarily introduced by the author [ibid. 14, No. 2, Article ID 1640002, 11 p. (2016; Zbl 1333.68236)]. This paper puts the development in a more general framework and creates a new method, which employs the ensemble technique to integrate multiple 1D embedding-based regularization and label boosting for semi-supervised learning (SSL). It combines parallel ensemble and serial ensemble. In each stage of parallel ensemble, the dataset is first smoothly mapped onto multiple 1D sequences. On each 1D embedded data, a classical regularization method is applied to construct a weak classifier. All of these weak classifiers are then integrated to an ensemble of 1D labeler (E1DL), which together with a nearest neighbor cluster (NNC) algorithm extracts a newborn labeled subset from the unlabeled set. The subset is believed to be correctly labeled with a high confidence, so that it joins with the original labeled set for the next learning stage. Repeating this process, we gradually obtain a boosted labeled set and the process will not stopped until the updated labeled set reaches a certain size. Finally, we use E1DL to build the target classifier, which labels all points of the dataset. In this paper, we also set the universal parameters for all experiments to make the algorithm as a parameter-free one. The validity of our method in the classification of the handwritten digits is confirmed by several experiments. Comparing to several other popular SSL methods, our results are very promising.
MSC:
68T05 | Learning and adaptive systems in artificial intelligence |
68T10 | Pattern recognition, speech recognition |
Keywords:
multiple 1D embedding; 1D multi-embedding; interpolation; regularization; label boosting; ensemble; semi-supervised learningCitations:
Zbl 1333.68236Software:
SGTlightReferences:
[1] | 1. S. Abney, Understanding the Yarowsky algorithm, Comput. Linguist.30(3) (2004) 365-395. genRefLink(16, ’S0219691316400014BIB001’, ’10.1162 · Zbl 1234.68396 |
[2] | 2. J. H. Ahlberg, E. N. Nilson and J. L. Walsh, The Theory of Splines and Their Applications (Academic Press, 1967). · Zbl 0158.15901 |
[3] | 3. H.-J. Bandelt and V. Chepoi, Metric graph theory and geometry: A survey, Contemp. Math.453(49-86) (2008). · Zbl 1169.05015 |
[4] | 4. M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput.15(6) (2003) 1373-1396. genRefLink(16, ’S0219691316400014BIB004’, ’10.1162 |
[5] | 5. M. Belkin and P. Niyogi, Using manifold structure for partially labeled classification, Adv. Neural Inform. Process. Syst.15 (2003) 929-936. |
[6] | 6. M. Belkin, P. Niyogi and V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res.7 (2006) 2399-2434. genRefLink(128, ’S0219691316400014BIB006’, ’000245390700005’); · Zbl 1222.68144 |
[7] | 7. R. Bellman, Adaptive Control Processes: A Guided Tour (Princeton University Press, 1961). · Zbl 0103.12901 |
[8] | 8. C. M. Bishop, M. Svensén and C. K. I. Williams, Developments of the generative topographic mapping, Neurocomputing21 (1998) 203-224. genRefLink(16, ’S0219691316400014BIB008’, ’10.1016 |
[9] | 9. J. Bondy and U. S. R. Murty, Graph Theory (Springer, 2008). genRefLink(16, ’S0219691316400014BIB009’, ’10.1007 |
[10] | 10. Breiman, Bagging predictors, Mach. Learn.24(2) (1996) 123-140. genRefLink(128, ’S0219691316400014BIB010’, ’A1996UZ38000003’); · Zbl 0858.68080 |
[11] | 11. S. Bubeck and U. von Luxburg, Nearest neighbor clustering: A baseline method for consistent clustering with arbitrary objective functions, J. Mach. Learn. Res.10 (2009) 657-698. genRefLink(128, ’S0219691316400014BIB011’, ’000270824500005’); · Zbl 1235.68134 |
[12] | 12. Y. Cao, J. Li, F. Zhao, R. Bian and X. Liu, Comparative study on face recognition based on svm of one-against-one and one-against-rest methods, in 8th Int. Conf. on Future Generation Communication and Networking (FGCN) (2014), pp. 104-107. |
[13] | 13. O. Chapelle, B. Schölkopf and A. Zien, Semi-Supervised Learning (MIT Press, 2006). genRefLink(16, ’S0219691316400014BIB013’, ’10.7551 |
[14] | 14. C. K. Chui and J. Wang, Randomized anisotropic transform for nonlinear dimensionality reduction, Int. J. Geomath.1(1) (2010) 23-50. genRefLink(16, ’S0219691316400014BIB014’, ’10.1007 |
[15] | 15. C. K. Chui and J. Z. Wang, Dimensionality reduction of hyper-spectral imagery data for feature classification, in Handbook of Geomathematics, eds. W. Freeden, Z. Nashed and T. Sonar (Springer, 2010). genRefLink(16, ’S0219691316400014BIB015’, ’10.1007 · Zbl 1197.86004 |
[16] | 16. R. K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, Vol. 9 (American Mathematical Society, 1996). genRefLink(16, ’S0219691316400014BIB016’, ’10.1090 |
[17] | 17. R. R. Coifman and M. Gavish, Harmonic analysis of digital data bases, in Wavelets and Multiscale Analysis: Theory and Applications, Applied and Numerical Harmonic Analysis (Springer, 2011), pp. 161-197. genRefLink(16, ’S0219691316400014BIB017’, ’10.1007 · Zbl 1250.68096 |
[18] | 18. R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal.21 (2006) 5-30. genRefLink(16, ’S0219691316400014BIB018’, ’10.1016 · Zbl 1095.68094 |
[19] | 19. T. F. Cox and M. A. A. Cox, Multidimensional Scaling (Chapman & Hall, 1994). |
[20] | 20. F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc.39(1) (2002) 1-49. genRefLink(16, ’S0219691316400014BIB020’, ’10.1090 |
[21] | 21. C. Dan, U. Meier and J. Schmidhuber, Multi-column deep neural network for image classification, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 3642-3649. |
[22] | 22. C. de Boor, A Pratical Guide to Splines (Springer, 1978). genRefLink(16, ’S0219691316400014BIB022’, ’10.1007 |
[23] | 23. C. Ding and X. He, K-means clustering via principal component analysis, in Proc. 21th Int. Conf. Machine Learning, Banff, Canada (2004). |
[24] | 24. D. L. Donoho and C. Grimes, Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data, Proc. Natl. Acad. Sci. USA100 (2003) 5591-5596. genRefLink(16, ’S0219691316400014BIB024’, ’10.1073 |
[25] | 25. Y. Freund and R. E. Schapire, A short indroduction to bootsting, J. Japanese Soc. Artifi. Intell.14(5) (1999) 771-780. |
[26] | 26. M. Gavish, B. Nadler and R. R. Coifman, Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning, in Proc. 27th Int. Conf. on Machine Learning (Omnipress, 2010), pp. 367-374. |
[27] | 27. G. R. Haffari and A. Sarkar, Analysis of semi-supervised learning with the Yarowsky algorithm, preprint (2012), arXiv:1206.5240. |
[28] | 28. J. J. Hull, A database for handwritten text recognition research, IEEE Trans. Pattern Anal. Mach. Intell.16(55) (1994) 550-554. genRefLink(16, ’S0219691316400014BIB028’, ’10.1109 |
[29] | 29. A. K. Jain, M. N. Murty and P. J. Flynn, Data clustering: A review, ACM Comput. Surv.31(3) (1999) 264-323. genRefLink(16, ’S0219691316400014BIB029’, ’10.1145 |
[30] | 30. T. Joachims, Transductive learning via spectral graph partitioning, in Proc. ICML-03, 20th Int. Conf. Machine Learning (ICML-03, 2003), pp. 290-297. |
[31] | 31. I. T. Jolliffe, Principal Component Analysis (Springer, 1986). genRefLink(16, ’S0219691316400014BIB031’, ’10.1007 · Zbl 1011.62064 |
[32] | 32. M. I. Jordan, Learning in graphical models, in Proc. NATO Advanced Study Institute on Learning in Graphical Models, Erice, Italy, September 27-October 7 (MIT Press, 1996). |
[33] | 33. W. Li, M. Zeiler, S. Zhang, Y. LeCun and R. Fergus, Regularization of neural network using dropconnect, J. Mach. Learn. Res.28 (2013) 1058-1066. |
[34] | 34. F. J. Provost and V. Kolluri, A survey of methods for scaling up inductive learning algorithms, Data Min. Knowl. Discov.3 (1999) 131-169. genRefLink(16, ’S0219691316400014BIB034’, ’10.1023 |
[35] | 35. I. Ram, I. Cohen and M. Elad, Facial image compression using patch-ordering-based adaptive wavelet transform, IEEE Signal Process. Lett.21(10) (2014) 1270-1274. genRefLink(16, ’S0219691316400014BIB035’, ’10.1109 |
[36] | 36. I. Ram, I. Cohen and M. Elad, Patch-ordering-based wavelet frame and its use in inverse problems, IEEE Trans. Image Process.23(7) (2014) 2779-2792. genRefLink(16, ’S0219691316400014BIB036’, ’10.1109 |
[37] | 37. I. Ram, M. Elad and I. Cohen, Generalized tree-based wavelet transform, IEEE Trans. Signal Process.59(9) (2011) 4199-4209. genRefLink(16, ’S0219691316400014BIB037’, ’10.1109 · Zbl 1392.94419 |
[38] | 38. I. Ram, M. Elad and I. Cohen, Redundant wavelets on graphs and high dimensional data clouds, IEEE Signal Process. Lett.19(5) (2012) 291-294. genRefLink(16, ’S0219691316400014BIB038’, ’10.1109 |
[39] | 39. I. Ram, M. Elad and I. Cohen, Image denoising using nl-means via smooth patch ordering, in IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2013), pp. 1350-1354. |
[40] | 40. I. Ram, M. Elad and I. Cohen, Image processing using smooth ordering of its patches, IEEE Trans. Image Process.22(7) (2013) 2764-2774. genRefLink(16, ’S0219691316400014BIB040’, ’10.1109 |
[41] | 41. C. H. Reinsch, Smoothing by spline functions, Numer. Math.10 (1967) 177-183. genRefLink(16, ’S0219691316400014BIB041’, ’10.1007 |
[42] | 42. M. Revow, C. K. I. Williams and G. Hinton, Using generative models for handwritten digit recognition, IEEE Trans. Pattern Anal. Mach. Intell.18(6) (1996) 592-606. genRefLink(16, ’S0219691316400014BIB042’, ’10.1109 |
[43] | 43. L. Rokach, Ensemble-based classifiers, Artif. Intell. Rev.33 (2010) 1-39. genRefLink(16, ’S0219691316400014BIB043’, ’10.1007 |
[44] | 44. K. Rose, Deterministic annealing for clustering, compression, classification, regression, and related optimization problems, Proc. IEEE86(11) (1998) 2210-2239. genRefLink(16, ’S0219691316400014BIB044’, ’10.1109 |
[45] | 45. S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science290(5500) (2000) 2323-2326. genRefLink(16, ’S0219691316400014BIB045’, ’10.1126 |
[46] | 46. G. Shakhnarovich, T. Darrell and P. Indyk, Nearest-Neighbor Methods in Learning and Vision, Theory and Practice (MIT, 2006). |
[47] | 47. J. B. Tenenbaum, V. de Silva and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science290(5500) (2000) 2319-2323. genRefLink(16, ’S0219691316400014BIB047’, ’10.1126 |
[48] | 48. V. Vapnik, Statistical Learning Theory (Wiley-Interscience, 1998). · Zbl 0935.62007 |
[49] | 49. J. Wang, Geometric Structure of High-Dimensional Data and Dimensionality Reduction (Springer, 2012). · Zbl 1250.68010 |
[50] | 50. J. Wang, Semi-supervised learning using multiple one-dimensional embedding based adaptive interpolation, Int. J. Wavelets, Multiresolut. Inform. Process.14(2) (2016) 11 pp. · Zbl 1333.68236 |
[51] | 51. K. Q. Weinberger, B. D. Packer and L. K. Saul, Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization, in Proc. of the 10th Int. Workshop on AI and Statistics (AISTATS, 2005), pp. 381-388. |
[52] | 52. M. Wozniak, M. Grana and E. Corchado, A survey of multiple classifier systems as hybrid systems, Inform. Fussion16 (2014) 3-17. genRefLink(16, ’S0219691316400014BIB052’, ’10.1016 |
[53] | 53. D. Yarowsky, Unsupervised word sense disambiguation rivaling supervised methods, in Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL, 1995), pp. 189-196. |
[54] | 54. Z. Y. Zhang and H. Y. Zha, Principal manifolds and nonlinear dimensionality reduction via local tangent space alignment, SIAM J. Sci. Comput.26(1) (2004) 313-338. genRefLink(16, ’S0219691316400014BIB054’, ’10.1137 |
[55] | 55. Z.-H. Zhou, J. Wu and W. Tang, Ensembling neural networks: Many could be better than all, Artif. Intell.137 (2002) 239-263. genRefLink(16, ’S0219691316400014BIB055’, ’10.1016 · Zbl 0995.68077 |
[56] | 56. X. Zhu, Semi-supervised learning literature survey, Computer Sciences TR-1530, University of Wisconsin-Madison (2008). |
[57] | 57. X. Zhu, Z. Ghahramani and J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in Proc. 20th Int. Conf. on Machine Learning (AAAI Press, 2003), pp. 912-919. |
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.