subscribe to arXiv mailings

Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution

Authors: Fajwel Fogel, Yohann Perron, Nikola Besic, Laurent Saint-André, Agnès Pellissier-Tanon, Martin Schwartz, Thomas Boudras, Ibrahim Fayad, Alexandre d'Aspremont, Loic Landrieu, Philippe Ciais

Abstract: Estimating canopy height and canopy height change at meter resolution from satellite imagery has numerous applications, such as monitoring forest health, logging activities, wood resources, and carbon stocks. However, many existing forest datasets are based on commercial or closed data sources, restricting the reproducibility and evaluation of new approaches. To address this gap, we introduce Open… ▽ More Estimating canopy height and canopy height change at meter resolution from satellite imagery has numerous applications, such as monitoring forest health, logging activities, wood resources, and carbon stocks. However, many existing forest datasets are based on commercial or closed data sources, restricting the reproducibility and evaluation of new approaches. To address this gap, we introduce Open-Canopy, the first open-access and country-scale benchmark for very high resolution (1.5 m) canopy height estimation. Covering more than 87,000 km$^2$ across France, Open-Canopy combines SPOT satellite imagery with high resolution aerial LiDAR data. We also propose Open-Canopy-$Δ$, the first benchmark for canopy height change detection between two images taken at different years, a particularly challenging task even for recent models. To establish a robust foundation for these benchmarks, we evaluate a comprehensive list of state-of-the-art computer vision models for canopy height estimation. The dataset and associated codes can be accessed at https://github.com/fajwel/Open-Canopy. △ Less

Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

Comments: 22 pages, 8 figures, Submitted to NeurIPS 2024 Datasets and Benchmarks Track

arXiv:2404.18873 [pdf, other]

OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Authors: Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis, Constantin Aronssohn, Nacim Bouia, Stephanie Fu, Romain Loiseau, Van Nguyen Nguyen, Charles Raude, Elliot Vincent, Lintao XU, Hongyu Zhou, Loic Landrieu

Abstract: Determining the location of an image anywhere on Earth is a complex visual task, which makes it particularly relevant for evaluating computer vision algorithms. Yet, the absence of standard, large-scale, open-access datasets with reliably localizable images has limited its potential. To address this issue, we introduce OpenStreetView-5M, a large-scale, open-access dataset comprising over 5.1 milli… ▽ More Determining the location of an image anywhere on Earth is a complex visual task, which makes it particularly relevant for evaluating computer vision algorithms. Yet, the absence of standard, large-scale, open-access datasets with reliably localizable images has limited its potential. To address this issue, we introduce OpenStreetView-5M, a large-scale, open-access dataset comprising over 5.1 million geo-referenced street view images, covering 225 countries and territories. In contrast to existing benchmarks, we enforce a strict train/test separation, allowing us to evaluate the relevance of learned geographical features beyond mere memorization. To demonstrate the utility of our dataset, we conduct an extensive benchmark of various state-of-the-art image encoders, spatial representations, and training strategies. All associated codes and models can be found at https://github.com/gastruc/osv5m. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.08351 [pdf, other]

OmniSat: Self-Supervised Modality Fusion for Earth Observation

Authors: Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu

Abstract: The diversity and complementarity of sensors available for Earth Observations (EO) calls for developing bespoke self-supervised multimodal learning approaches. However, current multimodal EO datasets and models typically focus on a single data type, either mono-date images or time series, which limits their impact. To address this issue, we introduce OmniSat, a novel architecture able to merge div… ▽ More The diversity and complementarity of sensors available for Earth Observations (EO) calls for developing bespoke self-supervised multimodal learning approaches. However, current multimodal EO datasets and models typically focus on a single data type, either mono-date images or time series, which limits their impact. To address this issue, we introduce OmniSat, a novel architecture able to merge diverse EO modalities into expressive features without labels by exploiting their alignment. To demonstrate the advantages of our approach, we create two new multimodal datasets by augmenting existing ones with new modalities. As demonstrated for three downstream tasks -- forestry, land cover classification, and crop mapping -- OmniSat can learn rich representations without supervision, leading to state-of-the-art performances in semi- and fully supervised settings. Furthermore, our multimodal pretraining scheme improves performance even when only one modality is available for inference. The code and dataset are available at https://github.com/gastruc/OmniSat. △ Less

Submitted 17 July, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Journal ref: ECCV 2024

arXiv:2403.20142 [pdf, other]

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

Authors: Sidi Wu, Yizi Chen, Samuel Mermet, Lorenz Hurni, Konrad Schindler, Nicolas Gonthier, Loic Landrieu

Abstract: Most image-to-image translation models postulate that a unique correspondence exists between the semantic classes of the source and target domains. However, this assumption does not always hold in real-world scenarios due to divergent distributions, different class sets, and asymmetrical information representation. As conventional GANs attempt to generate images that match the distribution of the… ▽ More Most image-to-image translation models postulate that a unique correspondence exists between the semantic classes of the source and target domains. However, this assumption does not always hold in real-world scenarios due to divergent distributions, different class sets, and asymmetrical information representation. As conventional GANs attempt to generate images that match the distribution of the target domain, they may hallucinate spurious instances of classes absent from the source domain, thereby diminishing the usefulness and reliability of translated images. CycleGAN-based methods are also known to hide the mismatched information in the generated images to bypass cycle consistency objectives, a process known as steganography. In response to the challenge of non-bijective image translation, we introduce StegoGAN, a novel model that leverages steganography to prevent spurious features in generated images. Our approach enhances the semantic consistency of the translated images without requiring additional postprocessing or supervision. Our experimental evaluations demonstrate that StegoGAN outperforms existing GAN-based models across various non-bijective image-to-image translation tasks, both qualitatively and quantitatively. Our code and pretrained models are accessible at https://github.com/sian-wusidi/StegoGAN. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2401.06704 [pdf, other]

Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering

Authors: Damien Robert, Hugo Raguet, Loic Landrieu

Abstract: We introduce a highly efficient method for panoptic segmentation of large 3D point clouds by redefining this task as a scalable graph clustering problem. This approach can be trained using only local auxiliary tasks, thereby eliminating the resource-intensive instance-matching step during training. Moreover, our formulation can easily be adapted to the superpoint paradigm, further increasing its e… ▽ More We introduce a highly efficient method for panoptic segmentation of large 3D point clouds by redefining this task as a scalable graph clustering problem. This approach can be trained using only local auxiliary tasks, thereby eliminating the resource-intensive instance-matching step during training. Moreover, our formulation can easily be adapted to the superpoint paradigm, further increasing its efficiency. This allows our model to process scenes with millions of points and thousands of objects in a single inference. Our method, called SuperCluster, achieves a new state-of-the-art panoptic segmentation performance for two indoor scanning datasets: $50.1$ PQ ($+7.8$) for S3DIS Area~5, and $58.7$ PQ ($+25.2$) for ScanNetV2. We also set the first state-of-the-art for two large-scale mobile mapping benchmarks: KITTI-360 and DALES. With only $209$k parameters, our model is over $30$ times smaller than the best-competing method and trains up to $15$ times faster. Our code and pretrained models are available at https://github.com/drprojects/superpoint_transformer. △ Less

Submitted 7 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted at 3DV 2024, Oral presentation

arXiv:2310.13336 [pdf, other]

FLAIR: a Country-Scale Land Cover Semantic Segmentation Dataset From Multi-Source Optical Imagery

Authors: Anatol Garioud, Nicolas Gonthier, Loic Landrieu, Apolline De Wit, Marion Valette, Marc Poupée, Sébastien Giordano, Boris Wattrelos

Abstract: We introduce the French Land cover from Aerospace ImageRy (FLAIR), an extensive dataset from the French National Institute of Geographical and Forest Information (IGN) that provides a unique and rich resource for large-scale geospatial analysis. FLAIR contains high-resolution aerial imagery with a ground sample distance of 20 cm and over 20 billion individually labeled pixels for precise land-cove… ▽ More We introduce the French Land cover from Aerospace ImageRy (FLAIR), an extensive dataset from the French National Institute of Geographical and Forest Information (IGN) that provides a unique and rich resource for large-scale geospatial analysis. FLAIR contains high-resolution aerial imagery with a ground sample distance of 20 cm and over 20 billion individually labeled pixels for precise land-cover classification. The dataset also integrates temporal and spectral data from optical satellite time series. FLAIR thus combines data with varying spatial, spectral, and temporal resolutions across over 817 km2 of acquisitions representing the full landscape diversity of France. This diversity makes FLAIR a valuable resource for the development and evaluation of novel methods for large-scale land-cover semantic segmentation and raises significant challenges in terms of computer vision, data fusion, and geospatial analysis. We also provide powerful uni- and multi-sensor baseline models that can be employed to assess algorithm's performance and for downstream applications. Through its extent and the quality of its annotation, FLAIR aims to spur improvements in monitoring and understanding key anthropogenic development indicators such as urban growth, deforestation, and soil artificialization. Dataset and codes can be accessed at https://ignf.github.io/FLAIR/ △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 - Datasets & Benchmarks Track

arXiv:2306.08045 [pdf, other]

Efficient 3D Semantic Segmentation with Superpoint Transformer

Authors: Damien Robert, Hugo Raguet, Loic Landrieu

Abstract: We introduce a novel superpoint-based transformer architecture for efficient semantic segmentation of large-scale 3D scenes. Our method incorporates a fast algorithm to partition point clouds into a hierarchical superpoint structure, which makes our preprocessing 7 times faster than existing superpoint-based approaches. Additionally, we leverage a self-attention mechanism to capture the relationsh… ▽ More We introduce a novel superpoint-based transformer architecture for efficient semantic segmentation of large-scale 3D scenes. Our method incorporates a fast algorithm to partition point clouds into a hierarchical superpoint structure, which makes our preprocessing 7 times faster than existing superpoint-based approaches. Additionally, we leverage a self-attention mechanism to capture the relationships between superpoints at multiple scales, leading to state-of-the-art performance on three challenging benchmark datasets: S3DIS (76.0% mIoU 6-fold validation), KITTI-360 (63.5% on Val), and DALES (79.6%). With only 212k parameters, our approach is up to 200 times more compact than other state-of-the-art models while maintaining similar performance. Furthermore, our model can be trained on a single GPU in 3 hours for a fold of the S3DIS dataset, which is 7x to 70x fewer GPU-hours than the best-performing methods. Our code and models are accessible at github.com/drprojects/superpoint_transformer. △ Less

Submitted 12 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: Accepted at ICCV 2023. Camera-ready version with Appendix. Code available at github.com/drprojects/superpoint_transformer

arXiv:2304.09704 [pdf, other]

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Authors: Romain Loiseau, Elliot Vincent, Mathieu Aubry, Loic Landrieu

Abstract: We propose an unsupervised method for parsing large 3D scans of real-world scenes with easily-interpretable shapes. This work aims to provide a practical tool for analyzing 3D scenes in the context of aerial surveying and mapping, without the need for user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned… ▽ More We propose an unsupervised method for parsing large 3D scans of real-world scenes with easily-interpretable shapes. This work aims to provide a practical tool for analyzing 3D scenes in the context of aerial surveying and mapping, without the need for user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned prototypical 3D shapes. The resulting reconstruction is visually interpretable and can be used to perform unsupervised instance and low-shot semantic segmentation of complex scenes. We demonstrate the usefulness of our model on a novel dataset of seven large aerial LiDAR scans from diverse real-world scenarios. Our approach outperforms state-of-the-art unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. Our code and dataset are available at https://romainloiseau.fr/learnable-earth-parser/ △ Less

Submitted 28 March, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

arXiv:2301.13656 [pdf, other]

A Survey and Benchmark of Automatic Surface Reconstruction from Point Clouds

Authors: Raphael Sulzer, Renaud Marlet, Bruno Vallet, Loic Landrieu

Abstract: We present a comprehensive survey and benchmark of both traditional and learning-based methods for surface reconstruction from point clouds. This task is particularly challenging for real-world acquisitions due to factors like noise, outliers, non-uniform sampling, and missing data. Traditional approaches often simplify the problem by imposing handcrafted priors on either the input point clouds or… ▽ More We present a comprehensive survey and benchmark of both traditional and learning-based methods for surface reconstruction from point clouds. This task is particularly challenging for real-world acquisitions due to factors like noise, outliers, non-uniform sampling, and missing data. Traditional approaches often simplify the problem by imposing handcrafted priors on either the input point clouds or the resulting surface, a process that can necessitate tedious hyperparameter tuning. Conversely, deep learning models have the capability to directly learn the properties of input point clouds and desired surfaces from data. We study the influence of these handcrafted and learned priors on the precision and robustness of surface reconstruction techniques. We evaluate various time-tested and contemporary methods in a standardized manner. When both trained and evaluated on point clouds with identical characteristics, the learning-based models consistently produce superior surfaces compared to their traditional counterparts$\unicode{x2013}$even in scenarios involving novel shape categories. However, traditional methods demonstrate greater resilience to the diverse array of point cloud anomalies commonly found in real-world 3D acquisitions. For the benefit of the research community, we make our code and datasets available, inviting further enhancements to learning-based surface reconstruction. This can be accessed at https://github.com/raphaelsulzer/dsr-benchmark . △ Less

Submitted 16 April, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: 20 pages

arXiv:2208.03311 [pdf, other]

A Model You Can Hear: Audio Identification with Playable Prototypes

Authors: Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loic Landrieu

Abstract: Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated t… ▽ More Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear △ Less

Submitted 5 August, 2022; originally announced August 2022.

arXiv:2206.08194 [pdf, other]

Online Segmentation of LiDAR Sequences: Dataset and Algorithm

Authors: Romain Loiseau, Mathieu Aubry, Loïc Landrieu

Abstract: Roof-mounted spinning LiDAR sensors are widely used by autonomous vehicles. However, most semantic datasets and algorithms used for LiDAR sequence segmentation operate on $360^\circ$ frames, causing an acquisition latency incompatible with real-time applications. To address this issue, we first introduce HelixNet, a $10$ billion point dataset with fine-grained labels, timestamps, and sensor rotati… ▽ More Roof-mounted spinning LiDAR sensors are widely used by autonomous vehicles. However, most semantic datasets and algorithms used for LiDAR sequence segmentation operate on $360^\circ$ frames, causing an acquisition latency incompatible with real-time applications. To address this issue, we first introduce HelixNet, a $10$ billion point dataset with fine-grained labels, timestamps, and sensor rotation information necessary to accurately assess the real-time readiness of segmentation algorithms. Second, we propose Helix4D, a compact and efficient spatio-temporal transformer architecture specifically designed for rotating LiDAR sequences. Helix4D operates on acquisition slices corresponding to a fraction of a full sensor rotation, significantly reducing the total latency. Helix4D reaches accuracy on par with the best segmentation algorithms on HelixNet and SemanticKITTI with a reduction of over $5\times$ in terms of latency and $50\times$ in model size. The code and data are available at: https://romainloiseau.fr/helixnet △ Less

Submitted 21 July, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: Code and data are available at: https://romainloiseau.fr/helixnet

arXiv:2204.11620 [pdf, other]

Multi-Layer Modeling of Dense Vegetation from Aerial LiDAR Scans

Authors: Ekaterina Kalinicheva, Loic Landrieu, Clément Mallet, Nesrine Chehata

Abstract: The analysis of the multi-layer structure of wild forests is an important challenge of automated large-scale forestry. While modern aerial LiDARs offer geometric information across all vegetation layers, most datasets and methods focus only on the segmentation and reconstruction of the top of canopy. We release WildForest3D, which consists of 29 study plots and over 2000 individual trees across 47… ▽ More The analysis of the multi-layer structure of wild forests is an important challenge of automated large-scale forestry. While modern aerial LiDARs offer geometric information across all vegetation layers, most datasets and methods focus only on the segmentation and reconstruction of the top of canopy. We release WildForest3D, which consists of 29 study plots and over 2000 individual trees across 47 000m2 with dense 3D annotation, along with occupancy and height maps for 3 vegetation layers: ground vegetation, understory, and overstory. We propose a 3D deep network architecture predicting for the first time both 3D point-wise labels and high-resolution layer occupancy rasters simultaneously. This allows us to produce a precise estimation of the thickness of each vegetation layer as well as the corresponding watertight meshes, therefore meeting most forestry purposes. Both the dataset and the model are released in open access: https://github.com/ekalinicheva/multi_layer_vegetation. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: Earth Vision Workshop, CVPR 2022

arXiv:2204.07548 [pdf, other]

Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation

Authors: Damien Robert, Bruno Vallet, Loic Landrieu

Abstract: Recent works on 3D semantic segmentation propose to exploit the synergy between images and point clouds by processing each modality with a dedicated network and projecting learned 2D features onto 3D points. Merging large-scale point clouds and images raises several challenges, such as constructing a mapping between points and pixels, and aggregating features between multiple views. Current method… ▽ More Recent works on 3D semantic segmentation propose to exploit the synergy between images and point clouds by processing each modality with a dedicated network and projecting learned 2D features onto 3D points. Merging large-scale point clouds and images raises several challenges, such as constructing a mapping between points and pixels, and aggregating features between multiple views. Current methods require mesh reconstruction or specialized sensors to recover occlusions, and use heuristics to select and aggregate available images. In contrast, we propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions. Our method can combine standard 2D and 3D networks and outperforms both 3D models operating on colorized point clouds and hybrid 2D/3D networks without requiring colorization, meshing, or true depth maps. We set a new state-of-the-art for large-scale indoor/outdoor semantic segmentation on S3DIS (74.7 mIoU 6-Fold) and on KITTI-360 (58.3 mIoU). Our full pipeline is accessible at https://github.com/drprojects/DeepViewAgg, and only requires raw 3D scans and a set of images and poses. △ Less

Submitted 7 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: Accepted to CVPR 2022 with an Oral presentation and Best Paper candidate; camera ready version. 17 pages, 11 figures. Code and data available at https://github.com/drprojects/DeepViewAgg

Journal ref: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5575-5584. 2022

arXiv:2202.01810 [pdf, other]

Deep Surface Reconstruction from Point Clouds with Visibility Information

Authors: Raphael Sulzer, Loic Landrieu, Alexandre Boulch, Renaud Marlet, Bruno Vallet

Abstract: Most current neural networks for reconstructing surfaces from point clouds ignore sensor poses and only operate on raw point locations. Sensor visibility, however, holds meaningful information regarding space occupancy and surface orientation. In this paper, we present two simple ways to augment raw point clouds with visibility information, so it can directly be leveraged by surface reconstruction… ▽ More Most current neural networks for reconstructing surfaces from point clouds ignore sensor poses and only operate on raw point locations. Sensor visibility, however, holds meaningful information regarding space occupancy and surface orientation. In this paper, we present two simple ways to augment raw point clouds with visibility information, so it can directly be leveraged by surface reconstruction networks with minimal adaptation. Our proposed modifications consistently improve the accuracy of generated surfaces as well as the generalization ability of the networks to unseen shape domains. Our code and data is available at https://github.com/raphaelsulzer/dsrv-data. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: 13 pages

arXiv:2201.08051 [pdf, other]

Predicting Vegetation Stratum Occupancy from Airborne LiDAR Data with Deep Learning

Authors: Ekaterina Kalinicheva, Loic Landrieu, Clément Mallet, Nesrine Chehata

Abstract: We propose a new deep learning-based method for estimating the occupancy of vegetation strata from airborne 3D LiDAR point clouds. Our model predicts rasterized occupancy maps for three vegetation strata corresponding to lower, medium, and higher cover. Our weakly-supervised training scheme allows our network to only be supervised with vegetation occupancy values aggregated over cylindrical plots… ▽ More We propose a new deep learning-based method for estimating the occupancy of vegetation strata from airborne 3D LiDAR point clouds. Our model predicts rasterized occupancy maps for three vegetation strata corresponding to lower, medium, and higher cover. Our weakly-supervised training scheme allows our network to only be supervised with vegetation occupancy values aggregated over cylindrical plots containing thousands of points. Such ground truth is easier to produce than pixel-wise or point-wise annotations. Our method outperforms handcrafted and deep learning baselines in terms of precision by up to 30%, while simultaneously providing visual and interpretable predictions. We provide an open-source implementation along with a dataset of 199 agricultural plots to train and evaluate weakly supervised occupancy regression algorithms. △ Less

Submitted 20 January, 2022; originally announced January 2022.

arXiv:2112.13583 [pdf]

Vegetation Stratum Occupancy Prediction from Airborne LiDAR 3D Point Clouds

Authors: Ekaterina Kalinicheva, Loic Landrieu, Clément Mallet, Nesrine Chehata

Abstract: We propose a new deep learning-based method for estimating the occupancy of vegetation strata from 3D point clouds captured from an aerial platform. Our model predicts rasterized occupancy maps for three vegetation strata: lower, medium, and higher strata. Our training scheme allows our network to only being supervized with values aggregated over cylindrical plots, which are easier to produce than… ▽ More We propose a new deep learning-based method for estimating the occupancy of vegetation strata from 3D point clouds captured from an aerial platform. Our model predicts rasterized occupancy maps for three vegetation strata: lower, medium, and higher strata. Our training scheme allows our network to only being supervized with values aggregated over cylindrical plots, which are easier to produce than pixel-wise or point-wise annotations. Our method outperforms handcrafted and deep learning baselines in terms of precision while simultaneously providing visual and interpretable predictions. We provide an open-source implementation of our method along along a dataset of 199 agricultural plots to train and evaluate occupancy regression algorithms. △ Less

Submitted 27 December, 2021; originally announced December 2021.

Journal ref: SilviLaser 2021 Conference

arXiv:2112.07558 [pdf, other]

Multi-Modal Temporal Attention Models for Crop Mapping from Satellite Time Series

Authors: Vivien Sainte Fare Garnot, Loic Landrieu, Nesrine Chehata

Abstract: Optical and radar satellite time series are synergetic: optical images contain rich spectral information, while C-band radar captures useful geometrical information and is immune to cloud cover. Motivated by the recent success of temporal attention-based methods across multiple crop mapping tasks, we propose to investigate how these models can be adapted to operate on several modalities. We implem… ▽ More Optical and radar satellite time series are synergetic: optical images contain rich spectral information, while C-band radar captures useful geometrical information and is immune to cloud cover. Motivated by the recent success of temporal attention-based methods across multiple crop mapping tasks, we propose to investigate how these models can be adapted to operate on several modalities. We implement and evaluate multiple fusion schemes, including a novel approach and simple adjustments to the training procedure, significantly improving performance and efficiency with little added complexity. We show that most fusion schemes have advantages and drawbacks, making them relevant for specific settings. We then evaluate the benefit of multimodality across several tasks: parcel classification, pixel-based segmentation, and panoptic parcel segmentation. We show that by leveraging both optical and radar time series, multimodal temporal attention-based models can outmatch single-modality models in terms of performance and resilience to cloud cover. To conduct these experiments, we augment the PASTIS dataset with spatially aligned radar image time series. The resulting dataset, PASTIS-R, constitutes the first large-scale, multimodal, and open-access satellite time series dataset with semantic and instance annotations. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: Under review

arXiv:2110.08187 [pdf, other]

Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series

Authors: Félix Quinton, Loic Landrieu

Abstract: While annual crop rotations play a crucial role for agricultural optimization, they have been largely ignored for automated crop type mapping. In this paper, we take advantage of the increasing quantity of annotated satellite data to propose the first deep learning approach modeling simultaneously the inter- and intra-annual agricultural dynamics of parcel classification. Along with simple trainin… ▽ More While annual crop rotations play a crucial role for agricultural optimization, they have been largely ignored for automated crop type mapping. In this paper, we take advantage of the increasing quantity of annotated satellite data to propose the first deep learning approach modeling simultaneously the inter- and intra-annual agricultural dynamics of parcel classification. Along with simple training adjustments, our model provides an improvement of over 6.3 mIoU points over the current state-of-the-art of crop classification. Furthermore, we release the first large-scale multi-year agricultural dataset with over 300,000 annotated parcels. △ Less

Submitted 16 November, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Published in Remote Sensing

ACM Class: I.2.10

arXiv:2109.01605 [pdf, other]

Representing Shape Collections with Alignment-Aware Linear Models

Authors: Romain Loiseau, Tom Monnier, Mathieu Aubry, Loïc Landrieu

Abstract: In this paper, we revisit the classical representation of 3D point clouds as linear shape models. Our key insight is to leverage deep learning to represent a collection of shapes as affine transformations of low-dimensional linear shape models. Each linear model is characterized by a shape prototype, a low-dimensional shape basis and two neural networks. The networks take as input a point cloud an… ▽ More In this paper, we revisit the classical representation of 3D point clouds as linear shape models. Our key insight is to leverage deep learning to represent a collection of shapes as affine transformations of low-dimensional linear shape models. Each linear model is characterized by a shape prototype, a low-dimensional shape basis and two neural networks. The networks take as input a point cloud and predict the coordinates of a shape in the linear basis and the affine transformation which best approximate the input. Both linear models and neural networks are learned end-to-end using a single reconstruction loss. The main advantage of our approach is that, in contrast to many recent deep approaches which learn feature-based complex shape representations, our model is explicit and every operation occurs in 3D space. As a result, our linear shape models can be easily visualized and annotated, and failure cases can be visually understood. While our main goal is to introduce a compact and interpretable representation of shape collections, we show it leads to state of the art results for few-shot segmentation. △ Less

Submitted 17 December, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: Accepted to 3DV 2021. 17 pages, 10 figures. Code and data are available at: https://romainloiseau.github.io/deep-linear-shapes

arXiv:2107.07933 [pdf, other]

Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks

Authors: Vivien Sainte Fare Garnot, Loic Landrieu

Abstract: Unprecedented access to multi-temporal satellite imagery has opened new perspectives for a variety of Earth observation tasks. Among them, pixel-precise panoptic segmentation of agricultural parcels has major economic and environmental implications. While researchers have explored this problem for single images, we argue that the complex temporal patterns of crop phenology are better addressed wit… ▽ More Unprecedented access to multi-temporal satellite imagery has opened new perspectives for a variety of Earth observation tasks. Among them, pixel-precise panoptic segmentation of agricultural parcels has major economic and environmental implications. While researchers have explored this problem for single images, we argue that the complex temporal patterns of crop phenology are better addressed with temporal sequences of images. In this paper, we present the first end-to-end, single-stage method for panoptic segmentation of Satellite Image Time Series (SITS). This module can be combined with our novel image sequence encoding network which relies on temporal self-attention to extract rich and adaptive multi-scale spatio-temporal features. We also introduce PASTIS, the first open-access SITS dataset with panoptic annotations. We demonstrate the superiority of our encoder for semantic segmentation against multiple competing architectures, and set up the first state-of-the-art of panoptic segmentation of SITS. Our implementation and PASTIS are publicly available. △ Less

Submitted 27 June, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

Comments: Accepted at ICCV2021, PASTIS Dataset available at https://github.com/VSainteuf/pastis-benchmark, PyTorch implementation at https://github.com/VSainteuf/utae-paps

MSC Class: 68T45; 68T07 ACM Class: I.4.6; I.2.6; J.2

arXiv:2107.06130 [pdf, other]

doi 10.1111/cgf.14364

Scalable Surface Reconstruction with Delaunay-Graph Neural Networks

Authors: Raphael Sulzer, Loic Landrieu, Renaud Marlet, Bruno Vallet

Abstract: We introduce a novel learning-based, visibility-aware, surface reconstruction method for large-scale, defect-laden point clouds. Our approach can cope with the scale and variety of point cloud defects encountered in real-life Multi-View Stereo (MVS) acquisitions. Our method relies on a 3D Delaunay tetrahedralization whose cells are classified as inside or outside the surface by a graph neural netw… ▽ More We introduce a novel learning-based, visibility-aware, surface reconstruction method for large-scale, defect-laden point clouds. Our approach can cope with the scale and variety of point cloud defects encountered in real-life Multi-View Stereo (MVS) acquisitions. Our method relies on a 3D Delaunay tetrahedralization whose cells are classified as inside or outside the surface by a graph neural network and an energy model solvable with a graph cut. Our model, making use of both local geometric attributes and line-of-sight visibility information, is able to learn a visibility model from a small amount of synthetic training data and generalizes to real-life acquisitions. Combining the efficiency of deep learning methods and the scalability of energy based models, our approach outperforms both learning and non learning-based reconstruction algorithms on two publicly available reconstruction benchmarks. Our code and data is available at https://github.com/raphaelsulzer/dgnn. △ Less

Submitted 1 February, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: The presentation of this work at SGP 2021 is available at https://youtu.be/KIrCDGhS10o

Report number: 40-Issue 5

Journal ref: Computer Graphics Forum 2021

arXiv:2010.04642 [pdf, other]

Torch-Points3D: A Modular Multi-Task Frameworkfor Reproducible Deep Learning on 3D Point Clouds

Authors: Thomas Chaton, Nicolas Chaulet, Sofiane Horache, Loic Landrieu

Abstract: We introduce Torch-Points3D, an open-source framework designed to facilitate the use of deep networks on3D data. Its modular design, efficient implementation, and user-friendly interfaces make it a relevant tool for research and productization alike. Beyond multiple quality-of-life features, our goal is to standardize a higher level of transparency and reproducibility in 3D deep learning research,… ▽ More We introduce Torch-Points3D, an open-source framework designed to facilitate the use of deep networks on3D data. Its modular design, efficient implementation, and user-friendly interfaces make it a relevant tool for research and productization alike. Beyond multiple quality-of-life features, our goal is to standardize a higher level of transparency and reproducibility in 3D deep learning research, and to lower its barrier to entry. In this paper, we present the design principles of Torch-Points3D, as well as extensive benchmarks of multiple state-of-the-art algorithms and inference schemes across several datasets and tasks. The modularity of Torch-Points3D allows us to design fair and rigorous experimental protocols in which all methods are evaluated in the same conditions. The Torch-Points3D repository :https://github.com/nicolas-chaulet/torch-points3d △ Less

Submitted 9 October, 2020; originally announced October 2020.

MSC Class: 68T07; 68T45 ACM Class: I.4.8; I.4.6; I.2.6; I.2.10

arXiv:2007.03047 [pdf, other]

Leveraging Class Hierarchies with Metric-Guided Prototype Learning

Authors: Vivien Sainte Fare Garnot, Loic Landrieu

Abstract: In many classification tasks, the set of target classes can be organized into a hierarchy. This structure induces a semantic distance between classes, and can be summarised under the form of a cost matrix, which defines a finite metric on the class set. In this paper, we propose to model the hierarchical class structure by integrating this metric in the supervision of a prototypical network. Our m… ▽ More In many classification tasks, the set of target classes can be organized into a hierarchy. This structure induces a semantic distance between classes, and can be summarised under the form of a cost matrix, which defines a finite metric on the class set. In this paper, we propose to model the hierarchical class structure by integrating this metric in the supervision of a prototypical network. Our method relies on jointly learning a feature-extracting network and a set of class prototypes whose relative arrangement in the embedding space follows an hierarchical metric. We show that this approach allows for a consistent improvement of the error rate weighted by the cost matrix when compared to traditional methods and other prototype-based strategies. Furthermore, when the induced metric contains insight on the data structure, our method improves the overall precision as well. Experiments on four different public datasets - from agricultural time series classification to depth image semantic segmentation - validate our approach. △ Less

Submitted 29 November, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: Published at BMVC2021

arXiv:2007.00586 [pdf, other]

Lightweight Temporal Self-Attention for Classifying Satellite Image Time Series

Authors: Vivien Sainte Fare Garnot, Loic Landrieu

Abstract: The increasing accessibility and precision of Earth observation satellite data offers considerable opportunities for industrial and state actors alike. This calls however for efficient methods able to process time-series on a global scale. Building on recent work employing multi-headed self-attention mechanisms to classify remote sensing time sequences, we propose a modification of the Temporal At… ▽ More The increasing accessibility and precision of Earth observation satellite data offers considerable opportunities for industrial and state actors alike. This calls however for efficient methods able to process time-series on a global scale. Building on recent work employing multi-headed self-attention mechanisms to classify remote sensing time sequences, we propose a modification of the Temporal Attention Encoder. In our network, the channels of the temporal inputs are distributed among several compact attention heads operating in parallel. Each head extracts highly-specialized temporal features which are in turn concatenated into a single representation. Our approach outperforms other state-of-the-art time series classification algorithms on an open-access satellite image dataset, while using significantly fewer parameters and with a reduced computational complexity. △ Less

Submitted 8 July, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

arXiv:1911.07757 [pdf, other]

Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention

Authors: Vivien Sainte Fare Garnot, Loic Landrieu, Sebastien Giordano, Nesrine Chehata

Abstract: Satellite image time series, bolstered by their growing availability, are at the forefront of an extensive effort towards automated Earth monitoring by international institutions. In particular, large-scale control of agricultural parcels is an issue of major political and economic importance. In this regard, hybrid convolutional-recurrent neural architectures have shown promising results for the… ▽ More Satellite image time series, bolstered by their growing availability, are at the forefront of an extensive effort towards automated Earth monitoring by international institutions. In particular, large-scale control of agricultural parcels is an issue of major political and economic importance. In this regard, hybrid convolutional-recurrent neural architectures have shown promising results for the automated classification of satellite image time series.We propose an alternative approach in which the convolutional layers are advantageously replaced with encoders operating on unordered sets of pixels to exploit the typically coarse resolution of publicly available satellite images. We also propose to extract temporal features using a bespoke neural architecture based on self-attention instead of recurrent networks. We demonstrate experimentally that our method not only outperforms previous state-of-the-art approaches in terms of precision, but also significantly decreases processing time and memory requirements. Lastly, we release a large open-access annotated dataset as a benchmark for future work on satellite image time series. △ Less

Submitted 18 November, 2019; originally announced November 2019.

arXiv:1905.04014 [pdf, other]

Supervized Segmentation with Graph-Structured Deep Metric Learning

Authors: Loic Landrieu, Mohamed Boussaha

Abstract: We present a fully-supervized method for learning to segment data structured by an adjacency graph. We introduce the graph-structured contrastive loss, a loss function structured by a ground truth segmentation. It promotes learning vertex embeddings which are homogeneous within desired segments, and have high contrast at their interface. Thus, computing a piecewise-constant approximation of such e… ▽ More We present a fully-supervized method for learning to segment data structured by an adjacency graph. We introduce the graph-structured contrastive loss, a loss function structured by a ground truth segmentation. It promotes learning vertex embeddings which are homogeneous within desired segments, and have high contrast at their interface. Thus, computing a piecewise-constant approximation of such embeddings produces a graph-partition close to the objective segmentation. This loss is fully backpropagable, which allows us to learn vertex embeddings with deep learning algorithms. We evaluate our methods on a 3D point cloud oversegmentation task, defining a new state-of-the-art by a large margin. These results are based on the published work of Landrieu and Boussaha 2019. △ Less

Submitted 24 June, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1904.02113

arXiv:1905.02316 [pdf, other]

Parallel Cut Pursuit For Minimization of the Graph Total Variation

Authors: Hugo Raguet, Loic Landrieu

Abstract: We present a parallel version of the cut-pursuit algorithm for minimizing functionals involving the graph total variation. We show that the decomposition of the iterate into constant connected components, which is at the center of this method, allows for the seamless parallelization of the otherwise costly graph-cut based refinement stage. We demonstrate experimentally the efficiency of our method… ▽ More We present a parallel version of the cut-pursuit algorithm for minimizing functionals involving the graph total variation. We show that the decomposition of the iterate into constant connected components, which is at the center of this method, allows for the seamless parallelization of the otherwise costly graph-cut based refinement stage. We demonstrate experimentally the efficiency of our method in a wide variety of settings, from simple denoising on huge graphs to more complex inverse problems with nondifferentiable penalties. We argue that our approach combines the efficiency of graph-cuts based optimizers with the versatility and ease of parallelization of traditional proximal △ Less

Submitted 7 May, 2019; originally announced May 2019.

arXiv:1904.02113 [pdf, other]

Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning

Authors: Loic Landrieu, Mohamed Boussaha

Abstract: We propose a new supervized learning framework for oversegmenting 3D point clouds into superpoints. We cast this problem as learning deep embeddings of the local geometry and radiometry of 3D points, such that the border of objects presents high contrasts. The embeddings are computed using a lightweight neural network operating on the points' local neighborhood. Finally, we formulate point cloud o… ▽ More We propose a new supervized learning framework for oversegmenting 3D point clouds into superpoints. We cast this problem as learning deep embeddings of the local geometry and radiometry of 3D points, such that the border of objects presents high contrasts. The embeddings are computed using a lightweight neural network operating on the points' local neighborhood. Finally, we formulate point cloud oversegmentation as a graph partition problem with respect to the learned embeddings. This new approach allows us to set a new state-of-the-art in point cloud oversegmentation by a significant margin, on a dense indoor dataset (S3DIS) and a sparse outdoor one (vKITTI). Our best solution requires over five times fewer superpoints to reach similar performance than previously published methods on S3DIS. Furthermore, we show that our framework can be used to improve superpoint-based semantic segmentation algorithms, setting a new state-of-the-art for this task as well. △ Less

Submitted 3 April, 2019; originally announced April 2019.

Comments: CVPR2019

arXiv:1901.10503 [pdf, other]

Time-Space tradeoff in deep learning models for crop classification on satellite multi-spectral image time series

Authors: Vivien Sainte Fare Garnot, Loic Landrieu, Sebastien Giordano, Nesrine Chehata

Abstract: In this article, we investigate several structured deep learning models for crop type classification on multi-spectral time series. In particular, our aim is to assess the respective importance of spatial and temporal structures in such data. With this objective, we consider several designs of convolutional, recurrent, and hybrid neural networks, and assess their performance on a large dataset of… ▽ More In this article, we investigate several structured deep learning models for crop type classification on multi-spectral time series. In particular, our aim is to assess the respective importance of spatial and temporal structures in such data. With this objective, we consider several designs of convolutional, recurrent, and hybrid neural networks, and assess their performance on a large dataset of freely available Sentinel-2 imagery. We find that the best-performing approaches are hybrid configurations for which most of the parameters (up to 90%) are allocated to modeling the temporal structure of the data. Our results thus constitute a set of guidelines for the design of bespoke deep learning models for crop type classification. △ Less

Submitted 29 January, 2019; originally announced January 2019.

Comments: Currently under review

Journal ref: International Geoscience and Remote Sensing Symposium 2019

arXiv:1802.04383 [pdf]

Cut-Pursuit Algorithm for Regularizing Nonsmooth Functionals with Graph Total Variation

Authors: Hugo Raguet, Loïc Landrieu

Abstract: We present an extension of the cut-pursuit algorithm, introduced by Landrieu and Obozinski (2017), to the graph total-variation regularization of functions with a separable nondifferentiable part. We propose a modified algorithmic scheme as well as adapted proofs of convergence. We also present a heuristic approach for handling the cases in which the values associated to each vertex of the graph a… ▽ More We present an extension of the cut-pursuit algorithm, introduced by Landrieu and Obozinski (2017), to the graph total-variation regularization of functions with a separable nondifferentiable part. We propose a modified algorithmic scheme as well as adapted proofs of convergence. We also present a heuristic approach for handling the cases in which the values associated to each vertex of the graph are multidimensional. The performance of our algorithm, which we demonstrate on difficult, ill-conditioned large-scale inverse and learning problems, is such that it may in practice extend the scope of application of the total-variation regularization. △ Less

Submitted 19 May, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

MSC Class: 90C25; 90C06; 94A08; 68T45

arXiv:1711.09869 [pdf, other]

Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs

Authors: Loic Landrieu, Martin Simonovsky

Abstract: We propose a novel deep learning-based framework to tackle the challenge of semantic segmentation of large-scale point clouds of millions of points. We argue that the organization of 3D point clouds can be efficiently captured by a structure called superpoint graph (SPG), derived from a partition of the scanned scene into geometrically homogeneous elements. SPGs offer a compact yet rich representa… ▽ More We propose a novel deep learning-based framework to tackle the challenge of semantic segmentation of large-scale point clouds of millions of points. We argue that the organization of 3D point clouds can be efficiently captured by a structure called superpoint graph (SPG), derived from a partition of the scanned scene into geometrically homogeneous elements. SPGs offer a compact yet rich representation of contextual relationships between object parts, which is then exploited by a graph convolutional network. Our framework sets a new state of the art for segmenting outdoor LiDAR scans (+11.9 and +8.8 mIoU points for both Semantic3D test sets), as well as indoor scans (+12.4 mIoU points for the S3DIS dataset). △ Less

Submitted 28 March, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

Comments: Accepted to CVPR 2018; camera ready version. Major updates to [v1]: Improved performance on S3DIS (from +5.8 to +12.4 mIoU) and extended ablation study in Appendix

Showing 1–31 of 31 results for author: Landrieu, L