ParseMVS: Learning Primitive-aware Surface Representations for Sparse Multi-view Stereopsis

Published: 10 October 2022


Multi-view stereopsis (MVS) recovers 3D surfaces by finding dense photo-consistent correspondences from densely sampled images. In this paper, we tackle the challenging MVS task from sparsely sampled views (up to an order of magnitude fewer images), which is more practical and cost-efficient in applications. The major challenge comes from the significant correspondence ambiguity introduced by the severe occlusions and the highly skewed patches. On the other hand, such ambiguity can be resolved by incorporating geometric cues from the global structure. In light of this, we propose ParseMVS, boosting sparse MVS by learning the P rimitive-A waR e S urface rE presentation. In particular, on top of being aware of global structure, our novel representation further allows for the preservation of fine details including geometry, texture, and visibility. More specifically, the whole scene is parsed into multiple geometric primitives. On each of them, the geometry is defined as the displacement along the primitives' normal directions, together with the texture and visibility along each view direction. An unsupervised neural network is trained to learn these factors by progressively increasing the photo-consistency and render-consistency among all input images. Since the surface properties are changed locally in the 2D space of each primitive, ParseMVS can preserve global primitive structures while optimizing local details, handling the 'incompleteness' and the 'inaccuracy' problems. We experimentally demonstrate that ParseMVS constantly outperforms the state-of-the-art surface reconstruction method in both completeness and the overall score under varying sampling sparsity, especially under the extreme sparse-MVS settings. Beyond that, ParseMVS also shows great potential in compression, robustness, and efficiency.

MP4 File (MM22-fp0726.mp4)
Presentation video. The video shows our effort to tackle the sparse MVS task, ParseMVS, which contains a primitive-based neural surface representation and an unsupervised learning pipeline for sparse MVS task, which enables achieving both complete and accurate reconstruction under the setting of sparse observation. Experiments on large-scale dataset prove the effectiveness of our method.


  1. ParseMVS: Learning Primitive-aware Surface Representations for Sparse Multi-view Stereopsis



    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    Published: 10 October 2022


    Author Tags

    1. multi-view stereopsis
    2. primitive
    3. sparse views


    • Shanghai Biren Technology Co., Ltd.
    • Ministry of Science and Technology of China
    • Natural Science Foundation of China (NSFC)
    • Shenzhen Key Laboratory of next generation interactive media innovative technology


    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia


