×

Cancer feature selection and classification using a binary quantum-behaved particle swarm optimization and support vector machine. (English) Zbl 1359.92055

Summary: This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO) for cancer feature gene selection, coupling support vector machine (SVM) for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV). Finally, the BQPSO coupling SVM (BQPSO/SVM), binary PSO coupling SVM (BPSO/SVM), and genetic algorithm coupling SVM (GA/SVM) are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

MSC:

92C50 Medical applications (general)
62P10 Applications of statistics to biology and medical sciences; meta analysis
68T05 Learning and adaptive systems in artificial intelligence

Software:

LIBSVM; rda; GeneSrF
Full Text: DOI

References:

[1] Pease, A. C.; Solas, D.; Sullivan, E. J.; Cronin, M. T.; Holmes, C. P.; Fodor, S. P. A., Light-generated oligonucleotide arrays for rapid DNA sequence analysis, Proceedings of the National Academy of Sciences of the United States of America, 91, 11, 5022-5026 (1994) · doi:10.1073/pnas.91.11.5022
[2] Zhao, Y.; Wang, G.; Zhang, X.; Yu, J. X.; Wang, Z., Learning phenotype structure using sequence model, IEEE Transactions on Knowledge and Data Engineering, 26, 3, 667-681 (2014) · doi:10.1109/tkde.2013.31
[3] Zhao, Y.; Wang, G.; Li, Y.; Wang, Z., Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules, Proceedings of the 11th IEEE International Conference on Data Mining (ICDM ’11), IEEE · doi:10.1109/icdm.2011.68
[4] Zhao, Y.; Yu, J. X.; Wang, G.; Chen, L.; Wang, B.; Yu, G., Maximal subspace co-regulated gene clustering, IEEE Transactions on Knowledge and Data Engineering, 20, 1, 83-98 (2008) · doi:10.1109/tkde.2007.190670
[5] Golub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J. P.; Coller, H.; Loh, M. L.; Downing, J. R.; Caligiuri, M. A.; Bloomfield, C. D.; Lander, E. S., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 5439, 531-537 (1999) · doi:10.1126/science.286.5439.531
[6] Abdi, H.; Williams, L. J., Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics, 2, 4, 433-459 (2010) · doi:10.1002/wics.101
[7] Liu, Z.; Chen, D.; Bensmail, H., Gene expression data classification with kernel principal component analysis, Journal of Biomedicine and Biotechnology, 2005, 2, 155-159 (2005) · doi:10.1155/JBB.2005.155
[8] Zheng, C.-H.; Huang, D.-S.; Kong, X.-Z.; Zhao, X.-M., Gene expression data classification using consensus independent component analysis, Genomics, Proteomics and Bioinformatics, 6, 2, 74-82 (2008) · doi:10.1016/s1672-0229(08)60022-4
[9] Chao, S.; Lihui, C., Feature dimension reduction for microarray data analysis using locally linear embedding, Proceedings of the 3rd Asia-Pacific Bioinformatics Conference
[10] Nguyen, D. V.; Rocke, D. M., Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 1, 39-50 (2002) · doi:10.1093/bioinformatics/18.1.39
[11] Zhang, S.; Jing, R., Dimension reduction based on modified maximum margin criterion for tumor classification, Proceedings of the 4th International Conference on Information and Computing (ICIC ’11) · doi:10.1109/icic.2011.148
[12] Guo, Y.; Hastie, T.; Tibshirani, R., Regularized linear discriminant analysis and its application in microarrays, Biostatistics, 8, 1, 86-100 (2007) · Zbl 1170.62382 · doi:10.1093/biostatistics/kxj035
[13] Conde, L.; Mateos, A.; Herrero, J.; Dopazo, J., Unsupervised reduction of the dimensionality followed by supervised learning with a perceptron improves the classification of conditions in DNA microarray gene expression data, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing (NNSP ’02) · doi:10.1109/nnsp.2002.1030019
[14] Mateos, A.; Herrero, J.; Tamames, J., Supervised neural networks for clustering conditions in DNA array data after reducing noise by clustering gene expression profiles, Methods of Microarray Data Analysis II, 91-103 (2002), Berlin, Germany: Springer, Berlin, Germany
[15] Kan, T.; Shimada, Y.; Sato, F.; Maeda, M.; Kawabe, A.; Kaganoi, J.; Itami, A.; Yamasaki, S.; Imamura, M., Gene expression profiling in human esophageal cancers using cDNA microarray, Biochemical and Biophysical Research Communications, 286, 4, 792-801 (2001) · doi:10.1006/bbrc.2001.5400
[16] Jafari, P.; Azuaje, F., An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors, BMC Medical Informatics and Decision Making, 6, 1, article 27 (2006) · doi:10.1186/1472-6947-6-27
[17] Hall, M. A., Correlation-Based Feature Selection for Machine Learning (1999), Hamilton, New Zealand: The University of Waikato, Hamilton, New Zealand
[18] Wang, Y.; Tetko, I. V.; Hall, M. A.; Frank, E.; Facius, A.; Mayer, K. F. X.; Mewes, H. W., Gene selection from microarray data for cancer classification—a machine learning approach, Computational Biology and Chemistry, 29, 1, 37-46 (2005) · Zbl 1095.92040 · doi:10.1016/j.compbiolchem.2004.11.001
[19] Zhang, J.; Liu, S.; Wang, Y., Gene association study with SVM, MLP and cross-validation for the diagnosis of diseases, Progress in Natural Science, 18, 6, 741-750 (2008) · doi:10.1016/j.pnsc.2007.11.022
[20] Li, Y.-X.; Liu, Q.-J.; Ruan, X.-G., A method for extracting knowledge from tumor gene expression data, Acta Electronica Sinica, 32, 9, 1479-1482 (2004)
[21] Lai, Y.; Wu, B.; Chen, L.; Zhao, H., A statistical method for identifying differential gene-gene co-expression patterns, Bioinformatics, 20, 17, 3146-3155 (2004) · doi:10.1093/bioinformatics/bth379
[22] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, The Journal of Machine Learning Research, 3, 11, 57-82 (2003)
[23] Díaz-Uriarte, R.; de Andrés, S. A., Gene selection and classification of microarray data using random forest, BMC Bioinformatics, 7, 1, article 3 (2006) · doi:10.1186/1471-2105-7-3
[24] Zhu, J.; Hastie, T., Classification of gene microarrays by penalized logistic regression, Biostatistics, 5, 3, 427-443 (2004) · Zbl 1154.62406 · doi:10.1093/biostatistics/kxg046
[25] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 46, 1-3, 389-422 (2002) · Zbl 0998.68111 · doi:10.1023/a:1012487302797
[26] Li, L.; Darden, T. A.; Weinberg, C. R.; Levine, A. J.; Pedersen, L. G., Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method, Combinatorial Chemistry and High Throughput Screening, 4, 8, 727-739 (2001) · doi:10.2174/1386207013330733
[27] Zhang, H.; Song, X.; Wang, H., Feature gene selection based on binary particle swarm optimization and support vector machine, Computers and Applied Chemistry, 24, 9, 1159-1162 (2007)
[28] Du, L.-Y., Analysis of decision tree algorithm based on data mining, Journal of Jilin Institute of Architecture & Civil Engineering, 5, 48-50 (2014)
[29] Liu, B.; Cui, Q.; Jiang, T.; Ma, S., A combinational feature selection and ensemble neural network method for classification of gene expression data, BMC Bioinformatics, 5, article 136 (2004) · doi:10.1186/1471-2105-5-136
[30] Liu, H.; Motoda, H., Feature Extraction, Construction and Selection: A Data Mining Perspective (1998), New York, NY, USA: Springer, New York, NY, USA · Zbl 0912.00012
[31] Alba, E.; García-Nieto, J.; Jourdan, L.; Talbi, E.-G., Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms, Proceedings of the IEEE Congress on Evolutionary Computation (CEC ’07), IEEE · doi:10.1109/cec.2007.4424483
[32] Huerta, E. B.; Duval, B.; Hao, J.-K., A hybrid GA/SVM approach for gene selection and classification of microarray data, Applications of Evolutionary Computing. Applications of Evolutionary Computing, Lecture Notes in Computer Science, 3907, 34-44 (2006), Berlin, Germany: Springer, Berlin, Germany
[33] Kennedy, J.; Eberhart, R., Particle swarm optimization, Proceedings of the 1995 IEEE International Conference on Neural Networks
[34] Sun, J.; Feng, B.; Xu, W., Particle swarm optimization with particles having quantum behavior, Proceedings of the Congress on Evolutionary Computation (CEC ’04)
[35] Sun, J.; Xu, W.; Feng, B., A global search strategy of Quantum-behaved Particle Swarm Optimization, Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems · doi:10.1109/ICCIS.2004.1460396
[36] Sun, J.; Fang, W.; Wu, X.; Palade, V.; Xu, W., Quantum-behaved particle swarm optimization: analysis of individual particle behavior and parameter selection, Evolutionary Computation, 20, 3, 349-393 (2012) · doi:10.1162/evco_a_00049
[37] Sun, J.; Wu, X.; Palade, V.; Fang, W.; Lai, C.-H.; Xu, W., Convergence analysis and improvements of quantum-behaved particle swarm optimization, Information Sciences, 193, 81-103 (2012) · doi:10.1016/j.ins.2012.01.005
[38] Xi, M. L.; Sun, J.; Wu, Y., Quantum-behaved particle swarm optimization with binary encoding, Control and Decision, 25, 1, 99-104 (2010)
[39] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 3, 273-297 (1995) · Zbl 0831.68098 · doi:10.1007/BF00994018
[40] Gu, B.; Sheng, V. S.; Wang, Z.; Ho, D.; Osman, S.; Li, S., Incremental learning for \(ν\)-Support Vector Regression, Neural Networks, 67, 140-150 (2015) · Zbl 1394.68286 · doi:10.1016/j.neunet.2015.03.013
[41] Gu, B.; Sheng, V. S., A robust regularization path algorithm for \(\nu \)-support vector classification, IEEE Transactions on Neural Networks and Learning Systems (2016) · doi:10.1109/tnnls.2016.2527796
[42] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R., Least angle regression, The Annals of Statistics, 32, 2, 407-499 (2004) · Zbl 1091.62054 · doi:10.1214/009053604000000067
[43] Chang, C.-C.; Lin, C.-J., LIBSVM: a Library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2, 3, article 27 (2011) · doi:10.1145/1961189.1961199
[44] Yang, A.-J.; Song, X.-Y., Bayesian variable selection for disease classification using gene expression data, Bioinformatics, 26, 2, 215-222 (2010) · doi:10.1093/bioinformatics/btp638
[45] Notterman, D. A.; Alon, U.; Sierk, A. J.; Levine, A. J., Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays, Cancer Research, 61, 7, 3124-3130 (2001)
[46] Shailubhai, K.; Yu, H. H.; Karunanandaa, K.; Wang, J. Y.; Eber, S. L.; Wang, Y.; Joo, N. S.; Kim, H. D.; Miedema, B. W.; Abbas, S. Z.; Boddupalli, S. S.; Currie, M. G.; Forte, L. R., Uroguanylin treatment suppresses polyp formation in the Apc(Min/+) mouse and induces apoptosis in human colon adenocarcinoma cells via cyclic GMP, Cancer Research, 60, 18, 5151-5157 (2000)
[47] Yam, J. W. P.; Chan, K. W.; Hsiao, W.-L. W., Suppression of the tumorigenicity of mutant p53-transformed rat embryo fibroblasts through expression of a newly cloned rat nonmuscle myosin heavy chain-B, Oncogene, 20, 1, 58-68 (2001) · doi:10.1038/sj.onc.1203982
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.