×

Recovering relative depth from low-level features without explicit T-junction detection and interpretation. (English) Zbl 1270.68343

Summary: This work presents a novel computational model for relative depth order estimation from a single image based on low-level local features that encode perceptual depth cues such as convexity/concavity, inclusion, and T-junctions in a quantitative manner, considering information at different scales. These multi-scale features are based on a measure of how likely is a pixel to belong simultaneously to different objects (interpreted as connected components of level sets) and, hence, to be occluded in some of them, providing a hint on the local depth order relationships. They are directly computed on the discrete image data in an efficient manner, without requiring the detection and interpretation of edges or junctions. Its behavior is clarified and illustrated for some simple images. Then the recovery of the relative depth order on the image is achieved by global integration of these local features applying a non-linear diffusion filtering of bilateral type. The validity of the proposed features and the integration approach is demonstrated by experiments on real images and comparison with state-of-the-art monocular depth estimation techniques.

MSC:

68T45 Machine vision and scene understanding
94A08 Image processing (compression, reconstruction, etc.) in information and communication theory
92C55 Biomedical imaging and signal processing
Full Text: DOI

References:

[1] Alvarez, L., Gousseau, Y., & Morel, J. (1999a). Scales in natural images and a consequence on their bounded variation norm. Scale-Space Theories in Computer Vision, 247–258.
[2] Alvarez, L., Gousseau, Y., & Morel, J. (1999b). The size of objects in natural and artificial images. Advances in Imaging and Electron Physics, 111, 167–242. · doi:10.1016/S1076-5670(08)70218-0
[3] Amer, M., Raich, R., & Todorovic, S. (2010). Monocular extraction of 2.1 d sketch. In Proceedings of the international conference on image processing. · Zbl 1398.68632
[4] Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis Machine Intelligence, 33(5), 898–916. · doi:10.1109/TPAMI.2010.161
[5] Bordenave, C., Gousseau, Y., & Roueff, F. (2006). The dead leaves model: A general tessellation modeling occlusion. Advances in Applied Probability, 38(1), 31–46. · Zbl 1095.60004 · doi:10.1239/aap/1143936138
[6] Buades, A., Coll, B., & Morel, J. (2005). A non-local algorithm for image denoising. In IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005 (Vol. 2, pp. 60–65). IEEE. · Zbl 1108.94004
[7] Buades, A., Le, T., Morel, J., & Vese, L. (2010). Fast cartoon + texture image filters. IEEE Transactions on Image Processing, 19(8), 1978–1986. · Zbl 1371.94064 · doi:10.1109/TIP.2010.2046605
[8] Calderero, F., & Marques, F. (2010). Region merging techniques using information theory statistical measures. IEEE Transactions on Image Processing, 19(6), 1567–1586. · Zbl 1371.94069 · doi:10.1109/TIP.2010.2043008
[9] Caselles, V., Coll, B., & Morel, J. (1996). A kanizsa programme. In ICAOS’96 (pp. 356–359). · Zbl 1077.68924
[10] Caselles, V., Coll, B., & Morel, J. (1999). Topographic maps and local contrast changes in natural images. International Journal of Computer Vision, 33(1), 5–27. · doi:10.1023/A:1008144113494
[11] Caselles, V., & Monasse, P. (2010). Geometric description of images as topographic maps, Vol. 1984. New York: Springer. · Zbl 1191.68759
[12] Darrell, T., & Pentland, A. (1995). Cooperative robust estimation using layers of support. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5), 474–487. · doi:10.1109/34.391395
[13] Dimiccoli, M., Morel, J., & Salembier, P. (2008). Monocular depth by nonlinear diffusion. In Sixth Indian conference on computer vision, graphics & image processing, 2008. ICVGIP’08 (pp. 95–102). IEEE.
[14] Dimiccoli, M., & Salembier, P. (2009a). Exploiting t-junctions for depth segregation in single images. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009 (pp. 1229–1232). IEEE.
[15] Dimiccoli, M., & Salembier, P. (2009b). Hierarchical region-based representation for segmentation and filtering with depth in single images. In 16th IEEE international conference on Image processing (ICIP), 2009 (pp. 3533–3536). IEEE.
[16] Eisemann, E., & Durand, F. (2004). Flash photography enhancement via intrinsic relighting. In ACM transactions on graphics (TOG) (Vol. 23, pp. 673–678). ACM.
[17] Favaro, P., Soatto, S., Burger, M., & Osher, S. (2008). Shape from defocus via diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3), 518–531. · doi:10.1109/TPAMI.2007.1175
[18] Feldman, D., & Weinshall, D. (2008). Motion segmentation and depth ordering using an occlusion detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1171–1185. · doi:10.1109/TPAMI.2007.70766
[19] Fowlkes, C., Martin, D., & Malik, J. (2007). Local figure-ground cues are valid for natural images. Journal of Vision, 7(8), Article 2.
[20] Froyen, V., Feldman, J., & Singh, M. (2010) A bayesian framework for figure-ground interpretation. Advances in Neural Information Processing Systems, 23, 631–639.
[21] Froyen, V., Feldman, J., & Singh, M. (2010). Local propagation of border-ownership. Journal of Vision, 10(7), 1176–1176. · doi:10.1167/10.7.1176
[22] Froyen, V., Kogo, N., Feldman, J., Singh, M., & Wagemans, J. (2011). Integration of contour and skeleton based cues in the reconstruction of surface structure. Perception, 40(Supplement), 175a.
[23] Gao, R., Wu, T., Zhu, S., & Sang, N. (2007). Bayesian inference for layer representation with mixed markov random field. In Energy minimization methods in computer vision and pattern recognition (pp. 213–224). Springer.
[24] Gibson, J. (1986). The ecological approach to visual perception. Lawrence Erlbaum.
[25] Goldstein, E. B. (2002). Sensation and perception (6th ed.). Pacific Grove, CA: Wadsworth.
[26] Gousseau, Y., & Morel, J. (2001). Are natural images of bounded variation? SIAM Journal on Mathematical Analysis, 33(3), 634–648. · Zbl 1002.49030 · doi:10.1137/S0036141000371150
[27] Hoiem, D., Efros, A., & Hebert, M. (2011). Recovering occlusion boundaries from an image. International Journal of Computer Vision, 91(3), 328–346. · Zbl 1235.68268 · doi:10.1007/s11263-010-0400-4
[28] Howard, I. (2012). Perceiving in depth, volume 3: Other mechanisms of depth perception, Vol. 29. Oxford: Oxford University Press.
[29] Kanizsa, G. (1980). Grammatica del vedere: saggi su percezione e gestalt, ii mulino.
[30] Kim, S., & Feldman, J. (2009). Globally inconsistent figure/ground relations induced by a negative part. Journal of Vision, 9(10), Article 8.
[31] Kogo, N., Froyen, V., Feldman, J., Singh, M., & Wagemans, J. (2011a). Integration of local and global cues to reconstruct surface structure. Journal of Vision, 11(11), 1100–1100. · doi:10.1167/11.11.1100
[32] Kogo, N., Galli, A., & Wagemans, J. (2011b). Switching dynamics of border ownership: A stochastic model for bi-stable perception. Vision Research, 51, 2085–2098. · doi:10.1016/j.visres.2011.08.010
[33] Kogo, N., Strecha, C., Van Gool, L., & Wagemans, J. (2010). Surface construction by a 2-d differentiation-integration process: A neurocomputational model for perceived border ownership, depth, and lightness in Kanizsa figures. Psychological review, 117(2), 406. · doi:10.1037/a0019076
[34] Kopf, J., Cohen, M., Lischinski, D., & Uyttendaele, M. (2007). Joint bilateral upsampling. ACM Transactions on Graphics, 26(3), 96. · doi:10.1145/1276377.1276497
[35] Lee, S., & Sharma, S. (2011). Real-time disparity estimation algorithm for stereo camera systems. IEEE Transactions on Consumer Electronics, 57(3), 1018–1026. · doi:10.1109/TCE.2011.6018850
[36] Leichter, I., & Lindenbaum, M. (2009). Boundary ownership by lifting to 2.1 d. In IEEE 12th International Conference on computer vision, 2009 (pp. 9–16). IEEE.
[37] Lindeberg, T. (1994). Scale-space theory in computer vision. New York: Springer. · Zbl 0812.68040
[38] Liu, B., Gould, S., & Koller, D. (2010). Single image depth estimation from predicted semantic labels. In IEEE conference on computer vision and pattern recognition (CVPR), 2010 (pp. 1253–1260). IEEE.
[39] Maire, M. (2010). Simultaneous segmentation and figure/ground organization using angular embedding. Computer Vision-ECCV, 6312, 450–464.
[40] Marr, D. (1982). Vision: a computational approach. San Francisco: Freeman & Co.
[41] Metzger, W. (1975). Gesetze des sehens (die lehre vom sehen der formen und dinge des raumes und der bewegung). Frankfurt/M.: Kramer.
[42] Namboodiri, V., & Chaudhuri, S. (2008). Recovery of relative depth from a single observation using an uncalibrated (real-aperture) camera. In IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008 (pp. 1–6). IEEE.
[43] Nitzberg, M., & Mumford, D. (1990). The 2.1-d sketch. In Proceedings, third international conference on computer vision, 1990 (pp. 138–144). IEEE.
[44] Nitzberg, M., Mumford, D., & Shiota, T. (1993). Filtering, segmentation, and depth, Vol. 662. New York: Springer. · Zbl 0801.68171
[45] Palou, G., & Salembier, P. (2011). Occlusion-based depth ordering on monocular images with binary partition tree. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1093–1096). IEEE.
[46] Parida, L., Geiger, D., & Hummel, R. (1998). Junctions: Detection, classification, and reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(7), 687–698. · doi:10.1109/34.689300
[47] Paris, S., & Durand, F. (2009). A fast approximation of the bilateral filter using a signal processing approach. International Journal of Computer Vision, 81(1), 24–52. · doi:10.1007/s11263-007-0110-8
[48] Peterson, M., & Skow, E. (2008). Inhibitory competition between shape properties in figure-ground perception. Journal of Experimental Psychology: Human Perception and Performance, 34(2), 251. · doi:10.1037/0096-1523.34.2.251
[49] Petschnigg, G., Szeliski, R., Agrawala, M., Cohen, M., Hoppe, H., & Toyama, K. (2004). Digital photography with flash and no-flash image pairs. In ACM transactions on graphics (TOG) (Vol. 23, pp. 664–672). ACM.
[50] Pham, T., & Van Vliet, L. (2005). Separable bilateral filtering for fast video preprocessing. In IEEE international conference on multimedia and expo, 2005 (ICME 2005) (p. 4). IEEE.
[51] Rensink, R., & Enns, J. (1998). Early completion of occluded objects. Vision Research, 38(15–16), 2489–2505. · doi:10.1016/S0042-6989(98)00051-0
[52] Rubin, N. (2001). Figure and ground in the brain. Nature Neuroscience, 4, 857–858. · doi:10.1038/nn0901-857
[53] Saxena, A., Chung, S., & Ng, A. (2008). 3-D depth reconstruction from a single still image. International Journal of Computer Vision, 76(1), 53–69. · doi:10.1007/s11263-007-0071-y
[54] Serra, J. (1982). Image analysis and mathematical morphology, Vol. 1. London and New York: Academic Press. · Zbl 0565.92001
[55] Soille, P. (2003). Morphological image analysis: Principles and applications. New York: Springer. · Zbl 1012.68212
[56] Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In Sixth international conference on computer vision, 1998 (pp. 839–846). IEEE.
[57] Torralba, A., & Oliva, A. (2002). Depth estimation from image structure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1226–1238. · Zbl 1033.68876 · doi:10.1109/TPAMI.2002.1033214
[58] Vincent, L., & Soille, P. (1991). Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6), 583–598. · doi:10.1109/34.87344
[59] Von Gioi, R., Jakubowicz, J., Morel, J., & Randall, G. (2010). LSD: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 722–732. · doi:10.1109/TPAMI.2008.300
[60] Wang, J., & Adelson, E. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3(5), 625–638. · doi:10.1109/83.334981
[61] Williams, L., & Jacobs, D. (1997). Stochastic completion fields: A neural model of illusory contour shape and salience. Neural Computation, 9(4), 837–858.
[62] Yang, Q., Yang, R., Davis, J., & Nistér, D. (2007). Spatial-depth super resolution for range images. In IEEE conference on computer vision and pattern recognition, 2007 (CVPR’07) (pp. 1–8). IEEE.
[63] Yaroslavsky, L. (1985). Digital picture processing. An introduction, Vol. 1. New York: Springer.
[64] Yu, S. (2009). Angular embedding: from jarring intensity differences to perceived luminance. In IEEE conference on computer vision and pattern recognition, 2009 (CVPR 2009) (pp. 2302–2309). IEEE.
[65] Zhou, H., & Friedman, H. (2000). Coding of border ownership in monkey visual cortex. The Journal of Neuroscience, 20(17), 6594–6611.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.