×

Predicate logic based image grammars for complex pattern recognition. (English) Zbl 1235.68219

Summary: Predicate logic based reasoning approaches provide a means of formally specifying domain knowledge and manipulating symbolic information to explicitly reason about different concepts of interest. Extension of traditional binary predicate logics with the bilattice formalism permits the handling of uncertainty in reasoning, thereby facilitating their application to computer vision problems. In this paper, we propose using first order predicate logics, extended with a bilattice based uncertainty handling formalism, as a means of formally encoding pattern grammars, to parse a set of image features, and detect the presence of different patterns of interest. Detections from low level feature detectors are treated as logical facts and, in conjunction with logical rules, used to drive the reasoning. Positive and negative information from different sources, as well as uncertainties from detections, are integrated within the bilattice framework. We show that this approach can also generate proofs or justifications (in the form of parse trees) for each hypothesis it proposes thus permitting direct analysis of the final solution in linguistic form. Automated logical rule weight learning is an important aspect of the application of such systems in the computer vision domain. We propose a rule weight optimization method which casts the instantiated inference tree as a knowledge-based neural network, interprets rule uncertainties as link weights in the network, and applies a constrained, back-propagation algorithm to converge upon a set of rule weights that give optimal performance within the bilattice framework. Finally, we evaluate the proposed predicate logic based pattern grammar formulation via application to the problems of (a) detecting the presence of humans under partial occlusions and (b) detecting large complex man made structures as viewed in satellite imagery. We also evaluate the optimization approach on real as well as simulated data and show favorable results.

MSC:

68T10 Pattern recognition, speech recognition
68Q42 Grammars and rewriting systems
68T45 Machine vision and scene understanding

Software:

darch; PRISM; AdaBoost.MH
Full Text: DOI

References:

[1] CAVIAR Dataset (2003). http://homepages.inf.ed.ac.uk/rbf/caviar/ .
[2] Arieli, O., Cornelis, C., & Deschrijver, G. (2006). Preference modeling by rectangular bilattices. In Proc. 3rd international conference on modeling decisions for artificial intelligence (MDAI’06) (3885) (pp. 22–33). · Zbl 1235.68231
[3] Arieli, O., Cornelis, C., Deschrijver, G., & Kerre, E. (2005). Bilattice-based squares and triangles. In Symbolic and quantitative approaches to reasoning with uncertainty (pp. 563–575). · Zbl 1122.03310
[4] Binford, T. O., & Levitt, T. S. (2003). Evidential reasoning for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7).
[5] Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (pp. 1–22).
[6] Cussens, J. (1999). Loglinear models for first-order probabilistic reasoning. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence.
[7] Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR05 (pp. I: 886–893).
[8] Felzenszwalb, P. (2001). Learning models for object recognition. In CVPR01 (pp. I:1056–1062).
[9] Fern, A. (2005). A simple-transition model for structured sequences. In International joint conference on artificial intelligence.
[10] Fidler, S., & Leonardis, A. (2007). Towards scalable representations of object categories: Learning a hierarchy of parts. In Proc. IEEE conf. computer vision pattern recognition (CVPR).
[11] Fitting, M. C. (1990). Bilattices in logic programming. In 20th international symposium on multiple-valued logic, Charlotte (pp. 238–247). Los Alamitos: IEEE CS Press.
[12] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[13] Friedman, N., Getoor, L., Koller, D., & Pfefier, A. (1999). Learning probabilistic relational models. In Proceedings of the sixteenth international joint conference on artificial intelligence.
[14] Gavrila, D. (2000). Pedestrian detection from a moving vehicle. In ECCV00 (pp. II: 37–49).
[15] Gavrila, D., & Philomin, V. (1999). Real-time object detection for smart vehicles. In ICCV99 (pp. 87–93).
[16] Geman, S., & Johnson, M. (2003). Probability and statistics in computational linguistics, a brief review. In Mathematical foundations of speech and language processing (pp. 1–26). Berlin: Springer.
[17] Ginsberg, M. L. (1988). Multivalued logics: Uniform approach to inference in artificial intelligence. Computational Intelligence.
[18] Hinton, G. E., Osindero, S., Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. · Zbl 1106.68094 · doi:10.1162/neco.2006.18.7.1527
[19] Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In CVPR (pp. 2145–2152).
[20] Julesz, B. (1981). Textons, the elements of texture perception and their interactions. Nature, 290, 91–97. · doi:10.1038/290091a0
[21] Kersting, K., & De Raedt, L. (2001). Towards combining inductive logic programming with Bayesian networks. In Proceedings of the eleventh international conference on inductive logic programming. · Zbl 1006.68518
[22] Kokkinos, I., & Yuille, A. (2009). HOP: Hierarchical object parsing. In Proc. IEEE conf. computer vision pattern recognition (CVPR).
[23] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998a). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2325. · doi:10.1109/5.726791
[24] LeCun, Y., Bottou, G.O., Muller, K. (1998b). Efficient backprop. Neural networks: Tricks of the trade. Berlin: Springer.
[25] Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR05 (pp. I: 878–885).
[26] Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43. · Zbl 0972.68606
[27] Lin, L., Peng, S., Porway, J., Zhu, S., & Wang, Y. (2007a). An empirical study of object category recognition: Sequential testing with generalized samples. In ICCV07 (pp. 1–8).
[28] Lin, Z., Davis, L., Doermann, D., & DeMenthon, D. (2007b). Hierarchical part-template matching for human detection and segmentation (pp. 1–8).
[29] Mahoney, J. J., & Mooney, R. J. (1993). Combining neural and symbolic learning to revise probabilistic rule bases. In Hanson, S. J., Cowan, J. D., & Giles, C. L. (Eds.) Advances in neural information processing systems (Vol. 5, pp. 107–114). San Mateo: Morgan Kaufmann.
[30] Mann, W. B. (1995). Three dimensional object interpretation of monocular gray-scale images. Ph.D. thesis, Department of Electrical Engineering, Stanford University.
[31] Papageorgiou, C., Evgeniou, T., & Poggio, T. (1998). A trainable pedestrian detection system. In Intelligent Vehicles (pp. 241–246).
[32] Poggio, T., & Girosi, F. (1990). Regularization algorithms that are equivalent to multilayer networks. Science, 978–982. · Zbl 1226.92005
[33] Ponce, J., Chelberg, D., & Mann, W. (1989). Invariant properties of straight homogeneous generalized cylinders and their contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(9), 951–966. · doi:10.1109/34.35498
[34] Ramesh, V. Performance characterization of image understanding algorithms. Ph.D. thesis, University of Washington, Seattle.
[35] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation (pp. 318–362).
[36] Sato, T., & Kameya, Y. (1997). Prism: A symbolic statistical modeling language. In Proceedings of the fifteenth international joint conference on artificial intelligence.
[37] Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336. · Zbl 0945.68194 · doi:10.1023/A:1007614523901
[38] Schweizer, B., & Sklar, A. (1963). Associative functions and abstract semigroups. Publ. Math. Debrecen. · Zbl 0119.14001
[39] Shet, V., Harwood, D., & Davis, L. (2005). Vidmap: video monitoring of activity with prolog. In IEEE AVSS (pp. 224–229).
[40] Shet, V., Harwood, D., & Davis, L. (2006). Multivalued default logic for identity maintenance in visual surveillance. In ECCV (pp. IV: 119–132).
[41] Shet, V., Neumann, J., Ramesh, V., & Davis, L. (2007). Bilattice-based logical reasoning for human detection. In CVPR.
[42] Shet, V., Singh, M., Bahlmann, C., & Ramesh, V. (2009). Predicate logics based image grammars for complex pattern recognition. In First international workshop on stochastic image grammars. · Zbl 1235.68219
[43] Sochman, J., & Matas, J. (2005). Waldboost: Learning for time constrained sequential detection. In CVPR05 (pp. II: 150–156).
[44] Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of the eighteenth conference on uncertainty in artificial intelligence.
[45] Todorovic, S., & Ahuja, N. (2008). Learning subcategory relevances for category recognition. In CVPR08.
[46] Towell, G. G., Shavlik, J. W., & Noordewier, M. O. (1990). Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the eighth national conference on artificial intelligence (pp. 861–866).
[47] Tu, Z., & Zhu, S. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 657–673. · doi:10.1109/34.1000239
[48] Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer. · Zbl 0833.62008
[49] Varma, M., & Zisserman, A. (2005). A statistical approach to texture classification from single images. International Journal of Computer Vision: Special Issue on Texture Analysis and Synthesis.
[50] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE conference on computer vision and pattern recognition (CVPR’01).
[51] Viola, P., & Jones, M. J. (2001). Robust real-time object detection (Tech. Rep. CRL 2001/01). Cambridge Research Laboratory.
[52] Walker, L. L., & Malik, J. (2004). When is scene recognition just texture recognition. Vision Research, 44, 2301–2311. · doi:10.1016/j.visres.2003.10.009
[53] Wang, W., Pollak, I., Wong, T., Bouman, C., Harper, M. P., Member, S., Member, S., & Siskind, J. M. (2006). Hierarchical stochastic image grammars for classification and segmentation. IEEE Transactions on Image Processing, 15, 3033–3052. · doi:10.1109/TIP.2006.877496
[54] Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In ICCV, Beijing.
[55] Wu, B., & Nevatia, R. (2007). Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. International Journal of Computer Vision, 75(2), 247–266. · doi:10.1007/s11263-006-0027-7
[56] Zhu, L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008). Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In Computer vision–ECCV.
[57] Zhu, Q., Yeh, M., Cheng, K., & Avidan, S. (2006). Fast human detection using a cascade of histograms of oriented gradients. In CVPR06 (pp. II: 1491–1498).
[58] Zhu, S. C., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362. · Zbl 1198.68160 · doi:10.1561/0600000018
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.