Skip to main content
Log in

Logical analysis of numerical data

  • Published:
Mathematical Programming Submit manuscript

Abstract

“Logical analysis of data” (LAD) is a methodology developed since the late eighties, aimed at discovering hidden structural information in data sets. LAD was originally developed for analyzing binary data by using the theory of partially defined Boolean functions. An extension of LAD for the analysis of numerical data sets is achieved through the process of “binarization” consisting in the replacement of each numerical variable by binary “indicator” variables, each showing whether the value of the original variable is above or below a certain level. Binarization was successfully applied to the analysis of a variety of real life data sets. This paper develops the theoretical foundations of the binarization process studying the combinatorial optimization problems related to the minimization of the number of binary variables. To provide an algorithmic framework for the practical solution of such problems, we construct compact linear integer programming formulations of them. We develop polynomial time algorithms for some of these minimization problems, and prove NP-hardness of others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Angluin, Queries and concept learning,Machine Learning 2 (1988) 319–342.

    Google Scholar 

  2. E. Balas and A. Ho, Set covering algorithms using cutting planes, heuristics and subgradient optimization: A computational study,Mathematical Programming 12 (1980) 37–60.

    MATH  MathSciNet  Google Scholar 

  3. R. Bellman,Introduction to Matrix Analysis (McGraw-Hill, New York, 1960).

    MATH  Google Scholar 

  4. J.C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions,Information and Computation 123 (1995) 50–63.

    Article  MATH  MathSciNet  Google Scholar 

  5. E. Boros, V. Gurvich, P.L. Hammer, T. Ibaraki and A. Kogan, Decompositions of partially defined Boolean functions,Discrete Applied Mathematics 62 (1995) 51–75.

    Article  MATH  MathSciNet  Google Scholar 

  6. E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, Logical analysis of numerical data, RUTCOR Research Report RRR 04-97, RUTCOR, Rutgers University, 1997.

  7. E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik, An implementation of logical analysis of data, RUTCOR Research Report RRR 22-96, RUTCOR, Rutgers University, 1996.

  8. E. Boros, T. Ibaraki and K. Makino, Error-free and best-fit extensions of partially defined Boolean functions, RUTCOR Research Report RRR 14-95, RUTCOR, Rutgers University, 1995.

  9. E. Boros, T. Ibaraki and K. Makino, Extensions of partially defined Boolean functions with missing data, RUTCOR Research Report RRR 06-96, RUTCOR, Rutgers University, 1996.

  10. P. Clark and T. Niblett, The CN2 induction algorithm,Machine Learning 3 (1989) 261–283.

    Google Scholar 

  11. Y. Crama, P.L. Hammer and T. Ibaraki, Cause-effect relationships and partially defined Boolean functions,Annals of Operations Research 16 (1988) 299–326.

    Article  MathSciNet  Google Scholar 

  12. R. Dechter and J. Pearl, Structure identification in relational data,Artificial Intelligence 58 (1992) 237–270.

    Article  MATH  MathSciNet  Google Scholar 

  13. O. Ekin, P.L. Hammer and A. Kogan, On connected Boolean functions, RUTCOR Research Report RRR 36-96, RUTCOR, Rutgers University, 1996.

  14. M.R. Garey and D.S. Johnson,Computers and Intractability (Freeman, New York, 1979).

    MATH  Google Scholar 

  15. P.L. Hammer, Partially defined Boolean functions and cause-effect relationships, in:Proceedings of the International Conference on Multi-Attribute Decision Making Via OR-Based Expert Systems, University of Passau, Passau, Germany, 1986.

    Google Scholar 

  16. A.B. Hammer, P.L. Hammer and I. Muchnik, Logical analysis of Chinese productivity patterns, RUTCOR Report RR 1-96, RUTCOR, Rutgers University, 1996.

  17. S.J. Hong, R-MINI: An iterative approach for generating minimal rules from examples,IEEE Transactions on Knowledge and Data Engineering, to appear.

  18. O.L. Mangasarian, Mathematical programming in machine learning, in: G. Di Pillo and F. Giannessi, eds.,Nonlinear Optimization and Applications (Plenum Publishing, New York, 1996) 283–295.

    Google Scholar 

  19. I. Muchnik and L. Yampolsky, A pilot study of the concurrent validity of logical analysis of data, as applied to two Beck inventories, RUTCOR Technical Report RTR 02-95, RUTCOR, Rutgers University, 1995.

  20. S. Muroga,Threshold Logic and Its Applications (Wiley-Interscience, New York, 1971).

    MATH  Google Scholar 

  21. P.M. Murphy and D.W. Aha, UCI repository of machine learning databases: Machine readable data repository, Department of Computer Science, University of California, Irvine, 1994.

    Google Scholar 

  22. J.R. Quinlan, Induction of decision trees,Machine Learning 1 (1986) 81–106.

    Google Scholar 

  23. J.R. Quinlan and R. Rivest, Inferring decision trees using minimum description length principle,Information and Computation 80 (1989) 227–248.

    Article  MATH  MathSciNet  Google Scholar 

  24. L.G. Valiant, A theory of the learnable,Communications of the ACM 27 (1984) 1134–1142.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

The authors gratefully acknowledge the partial support by the Office of Naval Research (grants N00014-92-J1375 and N00014-92-J4083).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boros, E., Hammer, P.L., Ibaraki, T. et al. Logical analysis of numerical data. Mathematical Programming 79, 163–190 (1997). https://doi.org/10.1007/BF02614316

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02614316

Keywords

Navigation