Abstract
In a recent article (Konno and Yamamoto in ISE 07-01, Department of Industrial and Systems Engineering, Chuo University, February 2007), one of the authors formulated the problem of choosing the best set of explanatory variables from a large number of candidate variables in a linear regression model as a mixed 0–1 integer linear programming problem and showed that it can be solved by the state-of-the-art integer programming software.
In this paper, we will propose multi-step methods for calculating a close to optimal solution of the problem which may not be solved by a single-step method presented in Konno and Yamamoto (ISE 07-01, Department of Industrial and Systems Engineering, Chuo University, February 2007). It will be shown that a multi-step method can generate a nearly optimal solution within a fraction of computation time of the single step method.
Also, we will demonstrate that the best set of variables in terms of the squared error can be recovered under normality assumption.
Similar content being viewed by others
References
Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics. Wiley, New York (1981)
Balas, E.: Disjunctive programming: properties of the convex hull of feasible points. Discrete Appl. Math. 89, 3–44 (1998)
Crowder, H., Johnson, E.L., Padberg, M.: Solving large-scale zero-one linear programming problems. Oper. Res. 31, 803–834 (1983)
Efroymson, M.A.: Multiple regression analysis. In: Ralson, A., Wiff, H.S. (eds.) Mathematical Methods for Digital Computers. Wiley, New York (1960)
Furnival, G.M., Wilson, R.W. Jr.: Regressions by leaps and bounds. Technometrics 16, 499–511 (1974)
Galindo, J., Tamayo, P.: Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Comput. Econ. 15, 107–143 (2000)
Judge, G.R., Hill, R.C., Griffiths, W.E., Lutkepohl, H., Lee, T.-C.: Introduction to the Theory and Practice of Econometrics. Wiley, New York (1998)
Konno, H., Kawadai, N., Wu, D.: Estimation of failure probability using semi-definite logit model. Comput. Manag. Sci. 1, 59–73 (2004)
Konno, H., Koshizuka, T.: Mean-absolute deviation model. IIE Trans. 37, 893–900 (2005)
Konno, H., Yamamoto, R.: Choosing the best set of variables in regression analysis using integer programming. ISE 07-01 Department of Industrial and Systems Engineering, Chuo University, February 2007
Konno, H., Yamazaki, H.: Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market. Manag. Sci. 37, 519–531 (1991)
Osborne, M.R.: On the computation of stepwise regression. Aust. Comput. J. 8, 61–68 (1976)
Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33, 60–100 (1991)
Pardalos, P., Boginski, V., Vazacopoulos, A.: Data Mining in Biomedicine. Springer, Berlin (2007)
Wolsey, L.A.: Integer Programming. Wiley, New York (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Konno, H., Takaya, Y. Multi-step methods for choosing the best set of variables in regression analysis. Comput Optim Appl 46, 417–426 (2010). https://doi.org/10.1007/s10589-008-9193-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-008-9193-6