Document Zbl 1270.68349

3D scene reconstruction from multiple spherical stereo pairs. (English) Zbl 1270.68349

Int. J. Comput. Vis. 104, No. 1, 94-116 (2013).

Summary: We propose a 3D environment modelling method using multiple pairs of high-resolution spherical images. Spherical images of a scene are captured using a rotating line scan camera. Reconstruction is based on stereo image pairs with a vertical displacement between camera views. A 3D mesh model for each pair of spherical images is reconstructed by stereo matching. For accurate surface reconstruction, we propose a PDE-based disparity estimation method which produces continuous depth fields with sharp depth discontinuities even in occluded and highly textured regions. A full environment model is constructed by fusion of partial reconstruction from spherical stereo pairs at multiple widely spaced locations. To avoid camera calibration steps for all camera locations, we calculate 3D rigid transforms between capture points using feature matching and register all meshes into a unified coordinate system. Finally a complete 3D model of the environment is generated by selecting the most reliable observations among overlapped surface measurements considering surface visibility, orientation and distance from the camera. We analyse the characteristics and behaviour of errors for spherical stereo imaging. Performance of the proposed algorithm is evaluated against ground-truth from the Middlebury stereo test bed and LIDAR scans. Results are also compared with conventional structure-from-motion algorithms. The final composite model is rendered from a wide range of viewpoints with high quality textures.

MSC:

68T45	Machine vision and scene understanding
68T10	Pattern recognition, speech recognition
35Q94	PDEs in connection with information and communication

Keywords:

3D reconstruction; environment modelling; disparity estimation; 3D registration and mesh integration

Software:

SURF

Cite Review PDF

Full Text: DOI Link

References:

[1]	Agarwal, S., Snavely, N., Simon, I., Seitz, S., & Szeliski, R. (2009). Building rome in a day. In Proceedings of ICCV, pp. 72–79.
[2]	Aiger, D., Mitra, N., & Cohen-Or, D. (2008). 4-points congruent sets for robust surface registration. In Proceedings of SIGGRAPH, pp. 1–10.
[3]	Akbarzadeh, A., Frahm, J.-M., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewenius, H., Yang, R., Welch, G., Towles, H., Nister, D., & Pollefeys, M. (2006). Towards urban 3d reconstruction from video. In Proceedings of 3DPVT, pp. 1–8.
[4]	Alvarez, L., Deriche, R., Papadopoulo, T., & Sánchez, J. (2007). Symmetrical dense optical flow estimation with oclussions detection. International Journal of Computer Vision, 75(3), 371–385. · Zbl 1477.68322 · doi:10.1007/s11263-007-0041-4
[5]	Alvarez, L., Deriche, R., Sánchez, J., & Weickert, J. (2002). Dense disparity map estimation respecting image discontinuities: A pde and scale-space based approach. Journal of Visual Communication and Image Representation, 13(1), 3–21. · doi:10.1006/jvci.2001.0482
[6]	Anguelov, D., Dulong, C., Filip, D., Frueh, C., Lafon, S., Lyon, R., et al. (2010). Google street view: Capturing the world at street level. IEEE Computer, 43(6), 32–38.
[7]	Asai, T., Kanbara, M., & Yokoya, N. (2005). 3d modeling of outdoor environments by integrating omnidirectional range and color images. In Proceedings of 3DIM, pp. 447–454.
[8]	Banno, A., & Ikeuchi, K. (2009). Disparity map refinement and 3d surface smoothing via directed anisotropic diffusion. In Proceedings of 3DIM.
[9]	Banno, A., & Ikeuchi, K. (2010). Omnidirectional texturing based on robust 3d registration through euclidean reconstruction from two spherical images. Computer Vision and Image Understanding, 114(4), 491–499. · doi:10.1016/j.cviu.2009.12.005
[10]	Bay, H., Ess, A., Tuytelaars, T., & Gool, L. (2008). Surf: Speeded up robust features. Computer Vision and Image Understanding, 110, 346–359. · doi:10.1016/j.cviu.2007.09.014
[11]	Ben-Ari, R., & Sochen, N. (2007). Variational stereo vision with sharp discontinuities and occlusion handling. In Proceedings of ICCV, pp. 1–7.
[12]	Benosman, R., & Devars, J. (1998). Panoramic stereovision sensor. In Proceedings of ICPR, pp. 767–769.
[13]	Besl, P., & McKay, N. (1992). A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 239–256. · doi:10.1109/34.121791
[14]	Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In Proceedings of ECCV, pp. 25–36. · Zbl 1098.68736
[15]	Burt, P. J. (1981). Fast filter transforms for image processing. Computer Vision, Graphics and Image Processing, 6, 20–51.
[16]	Chen, S. (1995). Quicktime vr–An image based approach to virtual environment navigation. In Proceedings of SIGGRAPH, pp. 29–38.
[17]	Chen, Y., & Medioni, G. (1992). Object modeling by registration of multiple range images. Image and Vision Computing, 10(3), 145–155.
[18]	Cornelis, N., Leibe, B., Cornelis, K., & Gool, L. (2008). 3d urban scene modeling integrating recognition and reconstruction. International Journal of Computer Vision, 78(2), 121–141. · doi:10.1007/s11263-007-0081-9
[19]	Dellaert, F., Seitz, S., Thorpe, C., & Thrun, S. (2000). Structure from motion without correspondence. In Proceedings of CVPR. · Zbl 1033.68083
[20]	Desouza, G., & Kak, A. (2002). Vision for mobile robot navigation: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 237–267. · doi:10.1109/34.982903
[21]	Feldman, D., & Weinshall, D. (2005). Realtime ibr with omnidirectional crossed-slits projection. In Proceedings of ICCV, pp. 839–845.
[22]	Fischler, M., & Bolles, R. (1982). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communication of the ACM, 24, 381–395. · doi:10.1145/358669.358692
[23]	Fisher, R. (2007). Registration and fusion of range images. Cvonline, http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/FISHER/REGIS/regis.html .
[24]	Frahm, J.-M., Fite-Georgel, P., Gallup, D., Johnson, T., Raguram, R., Wu, C., Jen, Y.-H., Dunn, E., Clipp, B., Lazebnik, S., & Pollefeys, M. (2010). Building rome on a cloudless day. In Proceedings of ECCV, pp. 368–381.
[25]	Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2009). Manhattan-world stereo. In Proceedings of CVPR.
[26]	Furukawa, Y., Curless, B., Seitz, S., & Szeliski, R. (2010). Towards internet-scale multi-view stereo. In Proceedings of CVPR.
[27]	Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376. · doi:10.1109/TPAMI.2009.161
[28]	Gargallo, P., & Sturm, P. (1988). Bayesian 3d modeling from images using multiple depth maps. In Proceedings of CVPR, pp. 885–891.
[29]	Geman, S., & McClure, D. (1985). Bayesian image analysis: An application to single photon emission tomography. In Proceedings of Statistical Computation Section, pp. 12–18.
[30]	Goesele, M., Snavely, N., Curless, B., Hoppe, H., & Seitz, S.M. (2007). Multi-view stereo for community photo collections. In Proceedings of ICCV, pp. 368–381.
[31]	Granger, S., Pennec, X., & Roche, X. (2001). Rigid point-surface registration using oriented points and an em variant of icp for computer guided oral implantology. In Proceedings of MICCAI, pp. 752–761. · Zbl 1041.68615
[32]	Haala, N., & Kada, M. (2005). Panoramic scenes for texture mapping of 3d city models. In Proceedings of PanoPhot.
[33]	Hilton, A. (2005). Scene modelling from sparse 3d data. Image and Vision Computing, 23(10), 900–920. · doi:10.1016/j.imavis.2005.05.018
[34]	Hilton, A., Stoddart, A., Illingworth, J., & Windeatt, T. (1998). Implicit surface based geometric fusion. Computer Vision and Image Understanding, 69(3), 273–291. · doi:10.1006/cviu.1998.0664
[35]	Hirschmüller, H., & Scharstein, D. (2008). Evaluation of stereo matching costs on images with radiometric differences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9), 1582–1599.
[36]	Ince, S., & Konrad, J. (2008). Occlusion-aware optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 1443–1451.
[37]	Johnson, C. (1988). Numerical solution of partial differential equations by the finite element method. Cambridge: Cambridge University Press.
[38]	Kang, S., & Szeliski, R. (1997). 3-d scene data recovery using omnidirectional multibaseline stereo. International Journal of Computer Vision, 25(2), 167–183. · doi:10.1023/A:1007971901577
[39]	Kazhdan, M., Bolitho, M., & Hoppe, H. (2006). Poisson surface reconstruction. In Proceedings of SGP, pp. 61–70.
[40]	Kim, H., & Hilton, A. (2009). 3d environment modelling using spherical stereo imaging. In Proceedings of 3DIM.
[41]	Kim, H., & Hilton, A. (2010). 3d modelling of static environments using multiple spherical stereo. In Proceedings of RMLE workshop in ECCV.
[42]	Kim, H., & Sohn, K. (2003a). Hierarchical depth estimation for image synthesis in mixed reality. In Proceedings of SPIE Electronic Imaging, pp. 544–553.
[43]	Kim, H., Sohn, K. (2003b). Hierarchical disparity estimation with energy-based regularization. In Proceedings of ICIP, pp. 373–376.
[44]	Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In Proceedings of ICPR.
[45]	Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. In Proceedings of ICCV.
[46]	Lemmens, M. (2007). Airborne lidar sensor. GIM International, 21(2), 13–17.
[47]	Lhuillier, M. (2008). Automatic scene structure and camera motion using a catadioptric system. Computer Vision and Image Understanding, 109(2), 186–203. · doi:10.1016/j.cviu.2007.05.004
[48]	Li, S. (2006). Real-time spherical stereo. In Proceedings of ICPR, pp. 1046–1049.
[49]	Mathias, M., Martinovic, A., Weissenberg, J., & Gool, L. J. V. (2011). Procedural 3d building reconstruction using shape grammars and detectors. In Proceedings of 3DIMPVT, pp. 304–311.
[50]	Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J.-M., Yang, R., et al. (2007). Real-time visibility-based fusion of depth maps. In Proceedings of ICCV.
[51]	Micusik, B., & Kosecka, J. (2009). Piecewise planar city 3d modeling from street view panoramic sequences. In Proceedings of CVPR, pp. 2906–2912.
[52]	Micusik, B., Martinec, D., & Pajdla, T. (2004). 3d metric reconstruction from uncalibrated omnidirectional images. In Proceedings of ACCV.
[53]	Min, D., & Sohn, K. (2008). Cost aggregation and occlusion handling with wls in stereo matching. IEEE Transactions on Image Processing, 17(8), 1431–1442. · doi:10.1109/TIP.2008.925372
[54]	Nagel, H., & Enkelmann, W. (1986). An investigation of smoothness constraints for the estimation of displacements vector fields from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 565–593. · doi:10.1109/TPAMI.1986.4767833
[55]	Nayar, S.K., & Karmarkar, A. (2000). 360 x 360 mosaics. In Proceedings of CVPR, pp. 2388–2388.
[56]	Pollefeys, M., Koch, R., Vergauwen, M., & Gool, L. (2000). Automated reconstruction of 3d scenes from sequences of images. ISPRS Journal of Photogrammetry and Remote Sensing, 55(4), 251–267. · doi:10.1016/S0924-2716(00)00023-X
[57]	Pollefeys, M., Nistér, D., Frahm, J., Akbarzadeh, A., Mordohai, P., Clipp, B., et al. (2008). Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78(2), 143–167.
[58]	Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the icp algorithm. In Proceedings of 3DIM, pp. 145–152.
[59]	Salman, N., & Yvinec, M. (2009). Surface reconstruction from multi-view stereo. In Proceedings of ACCV.
[60]	Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1), 7–42. · Zbl 1012.68731 · doi:10.1023/A:1014573219977
[61]	Simon, L., Teboul, O., Koutsourakis, P., & Paragios, N. (2011). Random exploration of the procedural space for single-view 3d modeling of buildings. International Journal of Computer Vision, 93(2), 253–271. · Zbl 1235.68288
[62]	Sizintsev, M. (2008). Hierarchical stereo with thin structures and transparency. In Proceedings of CRV, pp. 97–104.
[63]	Slesareva, N., Bruhn, A., & Weickert, J. (2005). Optic flow goes stereo: A variational method for estimating discontinuity- preserving dense disparity maps. In Proceedings of DAGM, pp. 33–40.
[64]	Snavely, N., Seitz, S., & Szeliski, R. (2006). Photo tourism: Exploring photo collections in 3d. In Proceedings of ACM SIGGRAPH, pp. 835–846.
[65]	Snavely, N., Seitz, S., & Szeliski, R. (2008). Modeling the world from internet photo collections. International Journal of Computer Vision, 80(2), 189–210. · doi:10.1007/s11263-007-0107-3
[66]	Soucy, M., & Laurendeau, D. (1995). A general surface approach to the integration of a set of range views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4), 344–358.
[67]	Strecha, C., Fransens, R., & Gool, L. J. V. (2004). Wide-baseline stereo from multiple views: A probabilistic account. In Proceedings of CVPR, pp. 552–559.
[68]	Strecha, C., Hansen, W., Gool, L., Fua, P., & Thoennessen, U. (2008). On benchmarking camera calibration and multi-view stereo for high resolution imagery. In Proceedings of CVPR, pp. 1–8.
[69]	Sun, D., Roth, S., Lewis, J., & Black, M. (2008). Learning optical flow. In Proceedings of ECCV, pp. 83–97.
[70]	Sun, J., Zheng, N., & Shum, H. (2003). Stereo matching using belief propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 787–800. · Zbl 1039.68730 · doi:10.1109/TPAMI.2003.1206509
[71]	Szeliski, R., & Scharstein, D. (2004). Sampling the disparity space image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3), 419–425. · doi:10.1109/TPAMI.2004.1262341
[72]	Teller, S., Antone, M., Bodnar, Z., Bosse, M., Coorg, S., Jethwa, M., et al. (2003). Calibrated, registered images of an extended urban area. International Journal of Computer Vision, 53(1), 93–107. · doi:10.1023/A:1023035826052
[73]	Tighe, J., Feldman, J., & Lazebnik, S. (2010). SuperParsing: Scalable Nonparametric Image Parsing with Superpixels. In Proceedings of ECCV.
[74]	Turk, G., & Levoy, M. (1994). Zippered polygon meshes from range images. In Proceedings of SIGGRAPH, pp. 311–318.
[75]	Vergauwen, M., & Gool, L. (2006). Web-based 3d reconstruction service. Machine Vision Applications, 17, 411–426. · doi:10.1007/s00138-006-0027-1
[76]	Vu, H., Keriven, R., Labatut, P., & Pons, J. (2009). Towards high-resolution large-scale multi-view stereo. In Proceedings of CVPR, pp. 1430–1437.
[77]	Weickert, J. (1997). A review of nonlinear diffusion filtering. Lecture Notes in Computer Science, 1252, 3–28.
[78]	Williams, J., & Bennamoun, M. (2001). Simultaneous registration of multiple corresponding point sets. Computer Vision and Image Understanding, 81(1), 117–142. · Zbl 1011.68541 · doi:10.1006/cviu.2000.0884
[79]	Yang, Q., Wang, L., Yang, R., Stewénius, H., & Nistér, D. (2008). Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 492–504. · doi:10.1109/TPAMI.2008.99
[80]	Yuille, A., & Poggio, T. (1984). A generalized ordering constraint for stereo correspondence. MIT A.I. Memo 777.
[81]	Zimmer, H., Bruhn, A., Valgaerts, L., Breuß, M., Weickert, J., Rosenhahn, B., & Seidel, H. (2008). Pde-based anisotropic disparity-driven stereo vision. In Proceedings of VMV, pp. 263–272.
[82]	Zomet, A., Feldman, D., Peleg, S., & Weinshall, D. (2003). Mosaicing new views: The crossed-slits projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6), 741–754. · doi:10.1109/TPAMI.2003.1201823

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.