Is Geometry Enough for Matching in Visual Localization?

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13670))

Included in the following conference series:

European Conference on Computer Vision

2496 Accesses
15 Citations

Abstract

In this paper, we propose to go beyond the well-established approach to vision-based localization that relies on visual descriptor matching between a query image and a 3D point cloud. While matching keypoints via visual descriptors makes localization highly accurate, it has significant storage demands, raises privacy concerns and requires update to the descriptors in the long-term. To elegantly address those practical challenges for large-scale localization, we present GoMatch, an alternative to visual-based matching that solely relies on geometric information for matching image keypoints to maps, represented as sets of bearing vectors. Our novel bearing vectors representation of 3D points, significantly relieves the cross-modal challenge in geometric-based matching that prevented prior work to tackle localization in a realistic environment. With additional careful architecture design, GoMatch improves over prior geometric-based matching work with a reduction of ($10.67\,\text {m}, 95.7^{\circ }$) and ($1.43\,\text {m}$, $34.7^{\circ }$) in average median pose errors on Cambridge Landmarks and 7-Scenes, while requiring as little as $1.5/1.7\%$ of storage capacity in comparison to the best visual-based matching methods. This confirms its potential and feasibility for real-world localization and opens the door to future efforts in advancing city-scale visual localization methods that do not require storing visual descriptors.

Q. Zhou and S. Agostinho—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploiting Spatial and Co-visibility Relations for Image-Based Localization

3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming

Direct Image to Point Cloud Descriptors Matching for 6-DOF Camera Localization in Dense 3D Point Clouds

Notes

1.
Storage as in non-volatile preservation of data, in contrast to volatile memory.

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Balntas, V., Li, S., Prisacariu, V.: RelocNet: continuous metric learning relocalisation using neural nets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 782–799. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_46
Chapter Google Scholar
Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: optimizing feature detection and description for a high-level task. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4948–4957 (2020)
Google Scholar
Blanton, H., Greenwell, C., Workman, S., Jacobs, N.: Extending absolute pose regression to multiple scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Google Scholar
Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7525–7534 (2019)
Google Scholar
Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: IEEE International Conference on Computer Vision (ICCV), pp. 4322–4331 (2019)
Google Scholar
Brown, M., Windridge, D., Guillemaut, J.-Y.: Globally optimal 2D-3D registration from points or lines without correspondences. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Campbell, D., Liu, L., Gould, S.: Solving the blind perspective-n-point problem end-to-end with robust differentiable geometric optimization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 244–261. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_15
Chapter Google Scholar
Campbell, D., Petersson, L., Kneip, L., Li, H.: Globally-optimal inlier set maximisation for simultaneous camera pose and feature correspondence. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Campbell, D., Petersson, L., Kneip, L., Li, H., Gould, S.: The alignment of the spheres: globally-optimal spherical mixture alignment for camera pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Camposeco, F., Cohen, A., Pollefeys, M., Sattler, T.: Hybrid scene compression for visual localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Cao, S., Snavely, N.: Minimal scene descriptions from structure from motion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Cavallari, T., Bertinetto, L., Mukhoti, J., Torr, P., Golodetz, S.: Let’s take this online: adapting scene coordinate regression network predictions for online RGB-D camera relocalisation. In: 2019 International Conference on 3D Vision (3DV), pp. 564–573 (2019)
Google Scholar
Chelani, K., Kahl, F., Sattler, T.: How privacy-preserving are line clouds? Recovering scene details from 3D lines. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15668–15678 (2021)
Google Scholar
Cheng, W., Lin, W., Chen, K., Zhang, X.: Cascaded parallel filtering for memory-efficient image-based localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates Inc. (2013)
Google Scholar
David, P., Dementhon, D., Duraiswami, R., Samet, H.: SoftPOSIT: simultaneous pose and correspondence determination. Int. J. Comput. Vis. 59(3), 259–284 (2004)
Article MATH Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR Workshops, pp. 224–236 (2018)
Google Scholar
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: IEEE International Conference on Computer Vision (ICCV), pp. 2871–2880 (2019)
Google Scholar
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates Inc. (2016)
Google Scholar
Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4829–4837 (2016)
Google Scholar
Dusmanu, M., Miksik, O., Schonberger, J.L., Pollefeys, M.: Cross-descriptor visual localization and mapping. In: IEEE International Conference on Computer Vision (ICCV), pp. 6058–6067 (2021)
Google Scholar
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Dusmanu, M., Schönberger, J.L., Sinha, S.N., Pollefeys, M.: Privacy-preserving image features via adversarial affine subspace embeddings. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Gao, X.-S., Hou, X.-R., Tang, J., Cheng, H.-F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930–943 (2003)
Article Google Scholar
Geppert, M., Larsson, V., Speciale, P., Schönberger, J.L., Pollefeys, M.: Privacy preserving structure-from-motion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 333–350. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_20
Chapter Google Scholar
Geppert, M., Larsson, V., Speciale, P., Schonberger, J.L., Pollefeys, M.: Privacy preserving localization and mapping from uncalibrated cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1809–1819 (2021)
Google Scholar
Germain, H., Bourmaud, G., Lepetit, V.: S2DNet: learning accurate correspondences for sparse-to-dense feature matching. In: European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., Schindler, K.: PREDATOR: registration of 3D point clouds with low overlap. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4267–4276 (2021)
Google Scholar
Ke, T., Roumeliotis, S.I.: An efficient algebraic solution to the perspective-three-point problem. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: IEEE International Conference on Robotics and Automation (ICRA) (2016)
Google Scholar
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Kneip, L., Scaramuzza, D., Siegwart, R.: A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: IEEE International Conference on Computer Vision (ICCV) Workshops (2017)
Google Scholar
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11983–11992 (2020)
Google Scholar
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Liu, L., Campbell, D., Li, H., Zhou, D., Song, X., Yang, R.: Learning 2D-3D correspondences to solve the blind perspective-n-point problem (2020)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Luo, Z., et al.: ASLFeat: learning local features of accurate shape and localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6589–6598 (2020)
Google Scholar
Mera-Trujillo, M., Smith, B., Fragoso, V.: Efficient scene compression for visual-based localization. In: 2020 International Conference on 3D Vision (3DV), pp. 1–10 (2020)
Google Scholar
Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2666–2674 (2018)
Google Scholar
Moreno-Noguer, F., Lepetit, V., Fua, P.: Pose priors for simultaneously solving alignment and correspondence. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 405–418. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_30
Chapter Google Scholar
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Article Google Scholar
Ng, T., et al.: NinjaDesc: content-concealing visual descriptors via adversarial learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12797–12807 (2022)
Google Scholar
Pittaluga, F., Koppal, S.J., Kang, S.B., Sinha, S.N.: Revealing scenes by inverting structure from motion reconstructions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 145–154 (2019)
Google Scholar
Radwan, N., Valada, A., Burgard, W.: VLocNet++: deep multitask learning for semantic visual localization and odometry. IEEE Robot. Autom. Lett. 3(4), 4407–4414 (2018)
Article Google Scholar
Sarlin, P.-E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Sarlin, P.-E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4938–4947 (2020)
Google Scholar
Sarlin, P.-E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3247–3257 (2021)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1744–1756 (2017)
Article Google Scholar
Sattler, T., et al.: Benchmarking 6DoF outdoor visual localization in changing conditions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8601–8610 (2018)
Google Scholar
Sattler, T., et al.: Are large-scale 3D models really necessary for accurate visual localization? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Schönberger, J.L., Frahm, J.-M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2733–2742 (2021)
Google Scholar
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2930–2937 (2013)
Google Scholar
Sinkhorn, R., Knopp, P.: Concerning nonnegative matrices and doubly stochastic matrices. Pac. J. Math. 21(2), 343–348 (1967)
Article MathSciNet MATH Google Scholar
Speciale, P., Schonberger, J.L., Kang, S.B., Sinha, S.N., Pollefeys, M.: Privacy preserving image-based localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5493–5503 (2019)
Google Scholar
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8922–8931 (2021)
Google Scholar
Sun, W., Jiang, W., Trulls, E., Tagliasacchi, A., Yi, K.M.: ACNe: attentive context normalization for robust permutation-equivariant learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Toft, C., et al.: Long-term visual localization revisited. IEEE Trans. Pattern Anal. Mach. Intell. 44(4), 2074–2088 (2022)
Article Google Scholar
Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1808–1817 (2015)
Google Scholar
Tran, N.-T., et al.: On-device scalable image-based localization via prioritized cascade search and fast one-many RANSAC. IEEE Trans. Image Process. 28(4), 1675–1690 (2019)
Article MathSciNet Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates Inc. (2017)
Google Scholar
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMs for structured feature correlation. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 757–774. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_44
Chapter Google Scholar
Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., Tan, P.: SANet: scene agnostic network for camera localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 42–51 (2019)
Google Scholar
Zhang, J., et al.: Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Zhou, Q., Sattler, T., Pollefeys, M., Leal-Taixe, L.: To learn or not to learn: visual localization from essential matrices. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3319–3326. IEEE (2020)
Google Scholar

Download references

Acknowledgments

This research was partially funded by the Humboldt Foundation through the Sofja Kovalevskaya Award.

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Qunjie Zhou, Aljoša Ošep & Laura Leal-Taixé
Universidade de Lisboa, Lisbon, Portugal
Sérgio Agostinho

Authors

Qunjie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Sérgio Agostinho
View author publications
You can also search for this author in PubMed Google Scholar
Aljoša Ošep
View author publications
You can also search for this author in PubMed Google Scholar
Laura Leal-Taixé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qunjie Zhou .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 362 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Q., Agostinho, S., Ošep, A., Leal-Taixé, L. (2022). Is Geometry Enough for Matching in Visual Localization?. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13670. Springer, Cham. https://doi.org/10.1007/978-3-031-20080-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-20080-9_24
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20079-3
Online ISBN: 978-3-031-20080-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Is Geometry Enough for Matching in Visual Localization?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Spatial and Co-visibility Relations for Image-Based Localization

3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming

Direct Image to Point Cloud Descriptors Matching for 6-DOF Camera Localization in Dense 3D Point Clouds

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 362 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Is Geometry Enough for Matching in Visual Localization?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Spatial and Co-visibility Relations for Image-Based Localization

3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming

Direct Image to Point Cloud Descriptors Matching for 6-DOF Camera Localization in Dense 3D Point Clouds

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 362 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation