×

Multi-step methods for choosing the best set of variables in regression analysis. (English) Zbl 1200.62076

Summary: H. Konno and R. Yamamoto [ISE 07-01, Dept. Indust. Syst. Eng., Chuo Univ. (2007)] formulated the problem of choosing the best set of explanatory variables from a large number of candidate variables in a linear regression model as a mixed 0-1 integer linear programming problem and showed that it can be solved by state-of-the-art integer programming software. We propose multi-step methods for calculating a close to optimal solution of the problem which may not be solved by the single-step method presented by Konno and Yamamoto. It will be shown that a multi-step method can generate a nearly optimal solution within a fraction of computation time of the single step method. Also, we demonstrate that the best set of variables in terms of the squared error can be recovered under the normality assumption.

MSC:

62J05 Linear regression; mixed models
65C60 Computational problems in statistics (MSC2010)
90C09 Boolean programming
90C10 Integer programming
Full Text: DOI

References:

[1] Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics. Wiley, New York (1981) · Zbl 0549.62002
[2] Balas, E.: Disjunctive programming: properties of the convex hull of feasible points. Discrete Appl. Math. 89, 3–44 (1998) · Zbl 0921.90118 · doi:10.1016/S0166-218X(98)00136-X
[3] Crowder, H., Johnson, E.L., Padberg, M.: Solving large-scale zero-one linear programming problems. Oper. Res. 31, 803–834 (1983) · Zbl 0576.90065 · doi:10.1287/opre.31.5.803
[4] Efroymson, M.A.: Multiple regression analysis. In: Ralson, A., Wiff, H.S. (eds.) Mathematical Methods for Digital Computers. Wiley, New York (1960)
[5] Furnival, G.M., Wilson, R.W. Jr.: Regressions by leaps and bounds. Technometrics 16, 499–511 (1974) · Zbl 0294.62079 · doi:10.2307/1267601
[6] Galindo, J., Tamayo, P.: Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Comput. Econ. 15, 107–143 (2000) · Zbl 0969.91006 · doi:10.1023/A:1008699112516
[7] Judge, G.R., Hill, R.C., Griffiths, W.E., Lutkepohl, H., Lee, T.-C.: Introduction to the Theory and Practice of Econometrics. Wiley, New York (1998)
[8] Konno, H., Kawadai, N., Wu, D.: Estimation of failure probability using semi-definite logit model. Comput. Manag. Sci. 1, 59–73 (2004) · Zbl 1114.62353
[9] Konno, H., Koshizuka, T.: Mean-absolute deviation model. IIE Trans. 37, 893–900 (2005) · doi:10.1080/07408170591007786
[10] Konno, H., Yamamoto, R.: Choosing the best set of variables in regression analysis using integer programming. ISE 07-01 Department of Industrial and Systems Engineering, Chuo University, February 2007 · Zbl 1178.62069
[11] Konno, H., Yamazaki, H.: Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market. Manag. Sci. 37, 519–531 (1991) · doi:10.1287/mnsc.37.5.519
[12] Osborne, M.R.: On the computation of stepwise regression. Aust. Comput. J. 8, 61–68 (1976) · Zbl 0336.65009
[13] Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33, 60–100 (1991) · Zbl 0734.90060 · doi:10.1137/1033004
[14] Pardalos, P., Boginski, V., Vazacopoulos, A.: Data Mining in Biomedicine. Springer, Berlin (2007) · Zbl 1130.92034
[15] Wolsey, L.A.: Integer Programming. Wiley, New York (1998)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.