subscribe to arXiv mailings

Research Experience of an Undergraduate Student in Computer Vision and Robotics

Authors: Ayush V. Gowda, Juan D. Yepes, Daniel Raviv

Abstract: This paper focuses on the educational journey of a computer engineering undergraduate student venturing into the domain of computer vision and robotics. It explores how optical flow and its applications can be used to detect moving objects when a camera undergoes translational motion, highlighting the challenges encountered and the strategies used to overcome them. Furthermore, the paper discusses… ▽ More This paper focuses on the educational journey of a computer engineering undergraduate student venturing into the domain of computer vision and robotics. It explores how optical flow and its applications can be used to detect moving objects when a camera undergoes translational motion, highlighting the challenges encountered and the strategies used to overcome them. Furthermore, the paper discusses not only the technical skills acquired by the student but also interpersonal skills as related to teamwork and diversity. In this paper, we detail the learning process, including the acquisition of technical and problem-solving skills, as well as out-of-the-box thinking. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 20 Pages

arXiv:2407.03794 [pdf, other]

CardioSpectrum: Comprehensive Myocardium Motion Analysis with 3D Deep Learning and Geometric Insights

Authors: Shahar Zuler, Shai Tejman-Yarden, Dan Raviv

Abstract: The ability to map left ventricle (LV) myocardial motion using computed tomography angiography (CTA) is essential to diagnosing cardiovascular conditions and guiding interventional procedures. Due to their inherent locality, conventional neural networks typically have difficulty predicting subtle tangential movements, which considerably lessens the level of precision at which myocardium three-dime… ▽ More The ability to map left ventricle (LV) myocardial motion using computed tomography angiography (CTA) is essential to diagnosing cardiovascular conditions and guiding interventional procedures. Due to their inherent locality, conventional neural networks typically have difficulty predicting subtle tangential movements, which considerably lessens the level of precision at which myocardium three-dimensional (3D) mapping can be performed. Using 3D optical flow techniques and Functional Maps (FMs), we present a comprehensive approach to address this problem. FMs are known for their capacity to capture global geometric features, thus providing a fuller understanding of 3D geometry. As an alternative to traditional segmentation-based priors, we employ surface-based two-dimensional (2D) constraints derived from spectral correspondence methods. Our 3D deep learning architecture, based on the ARFlow model, is optimized to handle complex 3D motion analysis tasks. By incorporating FMs, we can capture the subtle tangential movements of the myocardium surface precisely, hence significantly improving the accuracy of 3D mapping of the myocardium. The experimental results confirm the effectiveness of this method in enhancing myocardium motion analysis. This approach can contribute to improving cardiovascular diagnosis and treatment. Our code and additional resources are available at: https://shaharzuler.github.io/CardioSpectrumPage △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: This paper has been early accepted to MICCAI 2024

ACM Class: I.2.10; I.4.9

arXiv:2406.01040 [pdf, other]

Synthetic Data Generation for 3D Myocardium Deformation Analysis

Authors: Shahar Zuler, Dan Raviv

Abstract: Accurate analysis of 3D myocardium deformation using high-resolution computerized tomography (CT) datasets with ground truth (GT) annotations is crucial for advancing cardiovascular imaging research. However, the scarcity of such datasets poses a significant challenge for developing robust myocardium deformation analysis models. To address this, we propose a novel approach to synthetic data genera… ▽ More Accurate analysis of 3D myocardium deformation using high-resolution computerized tomography (CT) datasets with ground truth (GT) annotations is crucial for advancing cardiovascular imaging research. However, the scarcity of such datasets poses a significant challenge for developing robust myocardium deformation analysis models. To address this, we propose a novel approach to synthetic data generation for enriching cardiovascular imaging datasets. We introduce a synthetic data generation method, enriched with crucial GT 3D optical flow annotations. We outline the data preparation from a cardiac four-dimensional (4D) CT scan, selection of parameters, and the subsequent creation of synthetic data from the same or other sources of 3D cardiac CT data for training. Our work contributes to overcoming the limitations imposed by the scarcity of high-resolution CT datasets with precise annotations, thereby facilitating the development of accurate and reliable myocardium deformation analysis algorithms for clinical applications and diagnostics. Our code is available at: http://www.github.com/shaharzuler/cardio_volume_skewer △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2311.11130 [pdf, other]

Invariant-based Mapping of Space During General Motion of an Observer

Authors: Juan D. Yepes, Daniel Raviv

Abstract: This paper explores visual motion-based invariants, resulting in a new instantaneous domain where: a) the stationary environment is perceived as unchanged, even as the 2D images undergo continuous changes due to camera motion, b) obstacles can be detected and potentially avoided in specific subspaces, and c) moving objects can potentially be detected. To achieve this, we make use of nonlinear func… ▽ More This paper explores visual motion-based invariants, resulting in a new instantaneous domain where: a) the stationary environment is perceived as unchanged, even as the 2D images undergo continuous changes due to camera motion, b) obstacles can be detected and potentially avoided in specific subspaces, and c) moving objects can potentially be detected. To achieve this, we make use of nonlinear functions derived from measurable optical flow, which are linked to geometric 3D invariants. We present simulations involving a camera that translates and rotates relative to a 3D object, capturing snapshots of the camera projected images. We show that the object appears unchanged in the new domain over time. We process real data from the KITTI dataset and demonstrate how to segment space to identify free navigational regions and detect obstacles within a predetermined subspace. Additionally, we present preliminary results, based on the KITTI dataset, on the identification and segmentation of moving objects, as well as the visualization of shape constancy. This representation is straightforward, relying on functions for the simple de-rotation of optical flow. This representation only requires a single camera, it is pixel-based, making it suitable for parallel processing, and it eliminates the necessity for 3D reconstruction techniques. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2310.09632 [pdf, other]

Time-based Mapping of Space Using Visual Motion Invariants

Authors: Juan D. Yepes, Daniel Raviv

Abstract: This paper focuses on visual motion-based invariants that result in a representation of 3D points in which the stationary environment remains invariant, ensuring shape constancy. This is achieved even as the images undergo constant change due to camera motion. Nonlinear functions of measurable optical flow, which are related to geometric 3D invariants, are utilized to create a novel representation… ▽ More This paper focuses on visual motion-based invariants that result in a representation of 3D points in which the stationary environment remains invariant, ensuring shape constancy. This is achieved even as the images undergo constant change due to camera motion. Nonlinear functions of measurable optical flow, which are related to geometric 3D invariants, are utilized to create a novel representation. We refer to the resulting optical flow-based invariants as 'Time-Clearance' and the well-known 'Time-to-Contact' (TTC). Since these invariants remain constant over time, it becomes straightforward to detect moving points that do not adhere to the expected constancy. We present simulations of a camera moving relative to a 3D object, snapshots of its projected images captured by a rectilinearly moving camera, and the object as it appears unchanged in the new domain over time. In addition, Unity-based simulations demonstrate color-coded transformations of a projected 3D scene, illustrating how moving objects can be readily identified. This representation is straightforward, relying on simple optical flow functions. It requires only one camera, and there is no need to determine the magnitude of the camera's velocity vector. Furthermore, the representation is pixel-based, making it suitable for parallel processing. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 3 pages

arXiv:2310.09627 [pdf, other]

Detecting Moving Objects Using a Novel Optical-Flow-Based Range-Independent Invariant

Authors: Daniel Raviv, Juan D. Yepes, Ayush Gowda

Abstract: This paper focuses on a novel approach for detecting moving objects during camera motion. We present an optical-flow-based transformation that yields a consistent 2D invariant image output regardless of time instants, range of points in 3D, and the speed of the camera. In other words, this transformation generates a lookup image that remains invariant despite the changing projection of the 3D scen… ▽ More This paper focuses on a novel approach for detecting moving objects during camera motion. We present an optical-flow-based transformation that yields a consistent 2D invariant image output regardless of time instants, range of points in 3D, and the speed of the camera. In other words, this transformation generates a lookup image that remains invariant despite the changing projection of the 3D scene and camera motion. In the new domain, projections of 3D points that deviate from the values of the predefined lookup image can be clearly identified as moving relative to the stationary 3D environment, making them seamlessly detectable. The method does not require prior knowledge of the direction of motion or speed of the camera, nor does it necessitate 3D point range information. It is well-suited for real-time parallel processing, rendering it highly practical for implementation. We have validated the effectiveness of the new domain through simulations and experiments, demonstrating its robustness in scenarios involving rectilinear camera motion, both in simulations and with real-world data. This approach introduces new ways for moving objects detection during camera motion, and also lays the foundation for future research in the context of moving object detection during six-degrees-of-freedom camera motion. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 3 pages

arXiv:2211.14198 [pdf, other]

Temporal Super-Resolution using Multi-Channel Illumination Source

Authors: Khen Cohen, Dan Raviv, David Mendlovic

Abstract: While sensing in high temporal resolution is necessary for wide range of application, it is still limited nowadays due to cameras sampling rate. In this work we try to increase the temporal resolution beyond the Nyquist frequency, which is limited by the sampling rate of the sensor. This work establishes a novel approach for Temporal-Super-Resolution that uses the object reflecting properties from… ▽ More While sensing in high temporal resolution is necessary for wide range of application, it is still limited nowadays due to cameras sampling rate. In this work we try to increase the temporal resolution beyond the Nyquist frequency, which is limited by the sampling rate of the sensor. This work establishes a novel approach for Temporal-Super-Resolution that uses the object reflecting properties from an active illumination source to go beyond this limit. Following theoretical derivation, we demonstrate how we can increase the temporal spectral detected range by a factor of 6 and possibly even more. Our method is supported by simulations and experiments and we demonstrate as an application, how we use our method to improve in about factor two the accuracy of object motion estimation. △ Less

Submitted 25 November, 2022; originally announced November 2022.

arXiv:2211.06695 [pdf, other]

Illumination-Based Color Reconstruction for the Dynamic Vision Sensor

Authors: Khen Cohen, Omer Hershko, Homer Levy, David Mendlovic, Dan Raviv

Abstract: This work demonstrates a novel, state of the art method to reconstruct colored images via the Dynamic Vision Sensor (DVS). The DVS is an image sensor that indicates only a binary change in brightness, with no information about the captured wavelength (color), or intensity level. We present a novel method to reconstruct a full spatial resolution colored image with the DVS and an active colored ligh… ▽ More This work demonstrates a novel, state of the art method to reconstruct colored images via the Dynamic Vision Sensor (DVS). The DVS is an image sensor that indicates only a binary change in brightness, with no information about the captured wavelength (color), or intensity level. We present a novel method to reconstruct a full spatial resolution colored image with the DVS and an active colored light source. We analyze the DVS response and present two reconstruction algorithms: Linear based and Convolutional Neural Network Based. In addition, we demonstrate our algorithm robustness to changes in environmental conditions such as illumination and distance. Finally, comparing with previous works, we show how we reach the state of the art results. △ Less

Submitted 12 November, 2022; originally announced November 2022.

arXiv:2210.04108 [pdf, other]

Visual Looming from Motion Field and Surface Normals

Authors: Juan Yepes, Daniel Raviv

Abstract: Looming, traditionally defined as the relative expansion of objects in the observer's retina, is a fundamental visual cue for perception of threat and can be used to accomplish collision free navigation. In this paper we derive novel solutions for obtaining visual looming quantitatively from the 2D motion field resulting from a six-degree-of-freedom motion of an observer relative to a local surfac… ▽ More Looming, traditionally defined as the relative expansion of objects in the observer's retina, is a fundamental visual cue for perception of threat and can be used to accomplish collision free navigation. In this paper we derive novel solutions for obtaining visual looming quantitatively from the 2D motion field resulting from a six-degree-of-freedom motion of an observer relative to a local surface in 3D. We also show the relationship between visual looming and surface normals. We present novel methods to estimate visual looming from spatial derivatives of optical flow without the need for knowing range. Simulation results show that estimations of looming are very close to ground truth looming under some assumptions of surface orientations. In addition, we present results of visual looming using real data from the KITTI dataset. Advantages and limitations of the methods are discussed as well. △ Less

Submitted 20 October, 2022; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: Includes supplemental material

arXiv:2202.10972 [pdf, other]

Estimation of Looming from LiDAR

Authors: Juan D. Yepes, Daniel Raviv

Abstract: Looming, traditionally defined as the relative expansion of objects in the observer's retina, is a fundamental visual cue for perception of threat and can be used to accomplish collision free navigation. The measurement of the looming cue is not only limited to vision, and can also be obtained from range sensors like LiDAR (Light Detection and Ranging). In this article we present two methods that… ▽ More Looming, traditionally defined as the relative expansion of objects in the observer's retina, is a fundamental visual cue for perception of threat and can be used to accomplish collision free navigation. The measurement of the looming cue is not only limited to vision, and can also be obtained from range sensors like LiDAR (Light Detection and Ranging). In this article we present two methods that process raw LiDAR data to estimate the looming cue. Using looming values we show how to obtain threat zones for collision avoidance tasks. The methods are general enough to be suitable for any six-degree-of-freedom motion and can be implemented in real-time without the need for fine matching, point-cloud registration, object classification or object segmentation. Quantitative results using the KITTI dataset shows advantages and limitations of the methods. △ Less

Submitted 15 March, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

arXiv:2201.11379 [pdf, other]

Deep Confidence Guided Distance for 3D Partial Shape Registration

Authors: Dvir Ginzburg, Dan Raviv

Abstract: We present a novel non-iterative learnable method for partial-to-partial 3D shape registration. The partial alignment task is extremely complex, as it jointly tries to match between points and identify which points do not appear in the corresponding shape, causing the solution to be non-unique and ill-posed in most cases. Until now, two principal methodologies have been suggested to solve this p… ▽ More We present a novel non-iterative learnable method for partial-to-partial 3D shape registration. The partial alignment task is extremely complex, as it jointly tries to match between points and identify which points do not appear in the corresponding shape, causing the solution to be non-unique and ill-posed in most cases. Until now, two principal methodologies have been suggested to solve this problem: sample a subset of points that are likely to have correspondences or perform soft alignment between the point clouds and try to avoid a match to an occluded part. These heuristics work when the partiality is mild or when the transformation is small but fails for severe occlusions or when outliers are present. We present a unique approach named Confidence Guided Distance Network (CGD-net), where we fuse learnable similarity between point embeddings and spatial distance between point clouds, inducing an optimized solution for the overlapping points while ignoring parts that only appear in one of the shapes. The point feature generation is done by a self-supervised architecture that repels far points to have different embeddings, therefore succeeds to align partial views of shapes, even with excessive internal symmetries or acute rotations. We compare our network to recently presented learning-based and axiomatic methods and report a fundamental boost in performance. △ Less

Submitted 27 January, 2022; originally announced January 2022.

arXiv:2201.09693 [pdf, other]

Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps

Authors: Leo Segre, Or Hirschorn, Dvir Ginzburg, Dan Raviv

Abstract: Image translation across domains for unpaired datasets has gained interest and great improvement lately. In medical imaging, there are multiple imaging modalities, with very different characteristics. Our goal is to use cross-modality adaptation between CT and MRI whole cardiac scans for semantic segmentation. We present a segmentation network using synthesised cardiac volumes for extremely limite… ▽ More Image translation across domains for unpaired datasets has gained interest and great improvement lately. In medical imaging, there are multiple imaging modalities, with very different characteristics. Our goal is to use cross-modality adaptation between CT and MRI whole cardiac scans for semantic segmentation. We present a segmentation network using synthesised cardiac volumes for extremely limited datasets. Our solution is based on a 3D cross-modality generative adversarial network to share information between modalities and generate synthesized data using unpaired datasets. Our network utilizes semantic segmentation to improve generator shape consistency, thus creating more realistic synthesised volumes to be used when re-training the segmentation network. We show that improved segmentation can be achieved on small datasets when using spatial augmentations to improve a generative adversarial network. These augmentations improve the generator capabilities, thus enhancing the performance of the Segmentor. Using only 16 CT and 16 MRI cardiovascular volumes, improved results are shown over other segmentation methods while using the suggested architecture. △ Less

Submitted 4 February, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2112.04489 [pdf, other]

Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning

Authors: Alessa Hering, Lasse Hansen, Tony C. W. Mok, Albert C. S. Chung, Hanna Siebert, Stephanie Häger, Annkristin Lange, Sven Kuckertz, Stefan Heldmann, Wei Shao, Sulaiman Vesal, Mirabela Rusu, Geoffrey Sonn, Théo Estienne, Maria Vakalopoulou, Luyi Han, Yunzhi Huang, Pew-Thian Yap, Mikael Brudfors, Yaël Balbastre, Samuel Joutard, Marc Modat, Gal Lifshitz, Dan Raviv, Jinxin Lv , et al. (28 additional authors not shown)

Abstract: Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing… ▽ More Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing approaches. The Learn2Reg challenge addresses these limitations by providing a multi-task medical image registration data set for comprehensive characterisation of deformable registration algorithms. A continuous evaluation will be possible at https://learn2reg.grand-challenge.org. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation. We established an easily accessible framework for training and validation of 3D registration methods, which enabled the compilation of results of over 65 individual method submissions from more than 20 unique teams. We used a complementary set of metrics, including robustness, accuracy, plausibility, and runtime, enabling unique insight into the current state-of-the-art of medical image registration. This paper describes datasets, tasks, evaluation methods and results of the challenge, as well as results of further analysis of transferability to new datasets, the importance of label supervision, and resulting bias. While no single approach worked best across all tasks, many methodological aspects could be identified that push the performance of medical image registration to new state-of-the-art performance. Furthermore, we demystified the common belief that conventional registration methods have to be much slower than deep-learning-based methods. △ Less

Submitted 7 October, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

arXiv:2110.08636 [pdf, other]

doi 10.1109/3DV53792.2021.00141

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction

Authors: Itai Lang, Dvir Ginzburg, Shai Avidan, Dan Raviv

Abstract: We present a new method for real-time non-rigid dense correspondence between point clouds based on structured shape construction. Our method, termed Deep Point Correspondence (DPC), requires a fraction of the training data compared to previous techniques and presents better generalization capabilities. Until now, two main approaches have been suggested for the dense correspondence problem. The fir… ▽ More We present a new method for real-time non-rigid dense correspondence between point clouds based on structured shape construction. Our method, termed Deep Point Correspondence (DPC), requires a fraction of the training data compared to previous techniques and presents better generalization capabilities. Until now, two main approaches have been suggested for the dense correspondence problem. The first is a spectral-based approach that obtains great results on synthetic datasets but requires mesh connectivity of the shapes and long inference processing time while being unstable in real-world scenarios. The second is a spatial approach that uses an encoder-decoder framework to regress an ordered point cloud for the matching alignment from an irregular input. Unfortunately, the decoder brings considerable disadvantages, as it requires a large amount of training data and struggles to generalize well in cross-dataset evaluations. DPC's novelty lies in its lack of a decoder component. Instead, we use latent similarity and the input coordinates themselves to construct the point cloud and determine correspondence, replacing the coordinate regression done by the decoder. Extensive experiments show that our construction scheme leads to a performance boost in comparison to recent state-of-the-art correspondence methods. Our code is publicly available at https://github.com/dvirginz/DPC. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Comments: 3DV 2021

arXiv:2105.02714 [pdf, other]

Deep Weighted Consensus: Dense correspondence confidence maps for 3D shape registration

Authors: Dvir Ginzburg, Dan Raviv

Abstract: We present a new paradigm for rigid alignment between point clouds based on learnable weighted consensus which is robust to noise as well as the full spectrum of the rotation group. Current models, learnable or axiomatic, work well for constrained orientations and limited noise levels, usually by an end-to-end learner or an iterative scheme. However, real-world tasks require us to deal with larg… ▽ More We present a new paradigm for rigid alignment between point clouds based on learnable weighted consensus which is robust to noise as well as the full spectrum of the rotation group. Current models, learnable or axiomatic, work well for constrained orientations and limited noise levels, usually by an end-to-end learner or an iterative scheme. However, real-world tasks require us to deal with large rotations as well as outliers and all known models fail to deliver. Here we present a different direction. We claim that we can align point clouds out of sampled matched points according to confidence level derived from a dense, soft alignment map. The pipeline is differentiable, and converges under large rotations in the full spectrum of SO(3), even with high noise levels. We compared the network to recently presented methods such as DCP, PointNetLK, RPM-Net, PRnet, and axiomatic methods such as ICP and Go-ICP. We report here a fundamental boost in performance. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2104.04724 [pdf, other]

Occlusion Guided Self-supervised Scene Flow Estimation on 3D Point Clouds

Authors: Bojun Ouyang, Dan Raviv

Abstract: Understanding the flow in 3D space of sparsely sampled points between two consecutive time frames is the core stone of modern geometric-driven systems such as VR/AR, Robotics, and Autonomous driving. The lack of real, non-simulated, labeled data for this task emphasizes the importance of self- or un-supervised deep architectures. This work presents a new self-supervised training method and an arch… ▽ More Understanding the flow in 3D space of sparsely sampled points between two consecutive time frames is the core stone of modern geometric-driven systems such as VR/AR, Robotics, and Autonomous driving. The lack of real, non-simulated, labeled data for this task emphasizes the importance of self- or un-supervised deep architectures. This work presents a new self-supervised training method and an architecture for the 3D scene flow estimation under occlusions. Here we show that smart multi-layer fusion between flow prediction and occlusion detection outperforms traditional architectures by a large margin for occluded and non-occluded scenarios. We report state-of-the-art results on Flyingthings3D and KITTI datasets for both the supervised and self-supervised training. △ Less

Submitted 16 October, 2021; v1 submitted 10 April, 2021; originally announced April 2021.

Comments: Accepted at 3DV 2021 (Poster)

arXiv:2012.10685 [pdf, other]

Unsupervised Scale-Invariant Multispectral Shape Matching

Authors: Idan Pazi, Dvir Ginzburg, Dan Raviv

Abstract: Alignment between non-rigid stretchable structures is one of the most challenging tasks in computer vision, as the invariant properties are hard to define, and there is no labeled data for real datasets. We present unsupervised neural network architecture based upon the spectral domain of scale-invariant geometry. We build on top of the functional maps architecture, but show that learning local fe… ▽ More Alignment between non-rigid stretchable structures is one of the most challenging tasks in computer vision, as the invariant properties are hard to define, and there is no labeled data for real datasets. We present unsupervised neural network architecture based upon the spectral domain of scale-invariant geometry. We build on top of the functional maps architecture, but show that learning local features, as done until now, is not enough once the isometry assumption breaks. We demonstrate the use of multiple scale-invariant geometries for solving this problem. Our method is agnostic to local-scale deformations and shows superior performance for matching shapes from different domains when compared to existing spectral state-of-the-art solutions. △ Less

Submitted 28 August, 2022; v1 submitted 19 December, 2020; originally announced December 2020.

arXiv:2012.08248 [pdf, other]

Geometry Enhancements from Visual Content: Going Beyond Ground Truth

Authors: Liran Azaria, Dan Raviv

Abstract: This work presents a new cyclic architecture that extracts high-frequency patterns from images and re-insert them as geometric features. This procedure allows us to enhance the resolution of low-cost depth sensors capturing fine details on the one hand and being loyal to the scanned ground truth on the other. We present state-of-the-art results for depth super-resolution tasks and as well as visua… ▽ More This work presents a new cyclic architecture that extracts high-frequency patterns from images and re-insert them as geometric features. This procedure allows us to enhance the resolution of low-cost depth sensors capturing fine details on the one hand and being loyal to the scanned ground truth on the other. We present state-of-the-art results for depth super-resolution tasks and as well as visually attractive, enhanced generated 3D models. △ Less

Submitted 5 March, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2012.03212 [pdf, other]

Skeleon-Based Typing Style Learning For Person Identification

Authors: Lior Gelberg, David Mendlovic, Dan Raviv

Abstract: We present a novel architecture for person identification based on typing-style, constructed of adaptive non-local spatio-temporal graph convolutional network. Since type style dynamics convey meaningful information that can be useful for person identification, we extract the joints positions and then learn their movements' dynamics. Our non-local approach increases our model's robustness to noisy… ▽ More We present a novel architecture for person identification based on typing-style, constructed of adaptive non-local spatio-temporal graph convolutional network. Since type style dynamics convey meaningful information that can be useful for person identification, we extract the joints positions and then learn their movements' dynamics. Our non-local approach increases our model's robustness to noisy input data while analyzing joints locations instead of RGB data provides remarkable robustness to alternating environmental conditions, e.g., lighting, noise, etc. We further present two new datasets for typing style based person identification task and extensive evaluation that displays our model's superior discriminative and generalization abilities, when compared with state-of-the-art skeleton-based models. △ Less

Submitted 6 December, 2020; originally announced December 2020.

Comments: 9 pages

arXiv:2012.03121 [pdf, other]

It's All Around You: Range-Guided Cylindrical Network for 3D Object Detection

Authors: Meytal Rapoport-Lavie, Dan Raviv

Abstract: Modern perception systems in the field of autonomous driving rely on 3D data analysis. LiDAR sensors are frequently used to acquire such data due to their increased resilience to different lighting conditions. Although rotating LiDAR scanners produce ring-shaped patterns in space, most networks analyze their data using an orthogonal voxel sampling strategy. This work presents a novel approach for… ▽ More Modern perception systems in the field of autonomous driving rely on 3D data analysis. LiDAR sensors are frequently used to acquire such data due to their increased resilience to different lighting conditions. Although rotating LiDAR scanners produce ring-shaped patterns in space, most networks analyze their data using an orthogonal voxel sampling strategy. This work presents a novel approach for analyzing 3D data produced by 360-degree depth scanners, utilizing a more suitable coordinate system, which is aligned with the scanning pattern. Furthermore, we introduce a novel notion of range-guided convolutions, adapting the receptive field by distance from the ego vehicle and the object's scale. Our network demonstrates powerful results on the nuScenes challenge, comparable to current state-of-the-art architectures. The backbone architecture introduced in this work can be easily integrated onto other pipelines as well. △ Less

Submitted 5 December, 2020; originally announced December 2020.

arXiv:2011.14880 [pdf, other]

Occlusion Guided Scene Flow Estimation on 3D Point Clouds

Authors: Bojun Ouyang, Dan Raviv

Abstract: 3D scene flow estimation is a vital tool in perceiving our environment given depth or range sensors. Unlike optical flow, the data is usually sparse and in most cases partially occluded in between two temporal samplings. Here we propose a new scene flow architecture called OGSF-Net which tightly couples the learning for both flow and occlusions between frames. Their coupled symbiosis results in a… ▽ More 3D scene flow estimation is a vital tool in perceiving our environment given depth or range sensors. Unlike optical flow, the data is usually sparse and in most cases partially occluded in between two temporal samplings. Here we propose a new scene flow architecture called OGSF-Net which tightly couples the learning for both flow and occlusions between frames. Their coupled symbiosis results in a more accurate prediction of flow in space. Unlike a traditional multi-action network, our unified approach is fused throughout the network, boosting performances for both occlusion detection and flow estimation. Our architecture is the first to gauge the occlusion in 3D scene flow estimation on point clouds. In key datasets such as Flyingthings3D and KITTI, we achieve the state-of-the-art results. △ Less

Submitted 19 April, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: Aaccepted at CVPR 2021 Workshop on Autonomous Driving

arXiv:2011.14814 [pdf, other]

doi 10.1109/TPAMI.2023.3327156

Cost Function Unrolling in Unsupervised Optical Flow

Authors: Gal Lifshitz, Dan Raviv

Abstract: Steepest descent algorithms, which are commonly used in deep learning, use the gradient as the descent direction, either as-is or after a direction shift using preconditioning. In many scenarios calculating the gradient is numerically hard due to complex or non-differentiable cost functions, specifically next to singular points. In this work we focus on the derivation of the Total Variation semi-n… ▽ More Steepest descent algorithms, which are commonly used in deep learning, use the gradient as the descent direction, either as-is or after a direction shift using preconditioning. In many scenarios calculating the gradient is numerically hard due to complex or non-differentiable cost functions, specifically next to singular points. In this work we focus on the derivation of the Total Variation semi-norm commonly used in unsupervised cost functions. Specifically, we derive a differentiable proxy to the hard L1 smoothness constraint in a novel iterative scheme which we refer to as Cost Unrolling. Producing more accurate gradients during training, our method enables finer predictions of a given DNN model through improved convergence, without modifying its architecture or increasing computational complexity. We demonstrate our method in the unsupervised optical flow task. Replacing the L1 smoothness constraint with our unrolled cost during the training of a well known baseline, we report improved results on both MPI Sintel and KITTI 2015 unsupervised optical flow benchmarks. Particularly, we report EPE reduced by up to 15.82% on occluded pixels, where the smoothness constraint is dominant, enabling the detection of much sharper motion edges. △ Less

Submitted 26 May, 2024; v1 submitted 30 November, 2020; originally announced November 2020.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 46, Issue: 2, February 2024)

arXiv:2011.14723 [pdf, other]

doi 10.1109/3DV53792.2021.00141

Dual Geometric Graph Network (DG2N) -- Iterative network for deformable shape alignment

Authors: Dvir Ginzburg, Dan Raviv

Abstract: We provide a novel new approach for aligning geometric models using a dual graph structure where local features are mapping probabilities. Alignment of non-rigid structures is one of the most challenging computer vision tasks due to the high number of unknowns needed to model the correspondence. We have seen a leap forward using DNN models in template alignment and functional maps, but those metho… ▽ More We provide a novel new approach for aligning geometric models using a dual graph structure where local features are mapping probabilities. Alignment of non-rigid structures is one of the most challenging computer vision tasks due to the high number of unknowns needed to model the correspondence. We have seen a leap forward using DNN models in template alignment and functional maps, but those methods fail for inter-class alignment where nonisometric deformations exist. Here we propose to rethink this task and use unrolling concepts on a dual graph structure - one for a forward map and one for a backward map, where the features are pulled back matching probabilities from the target into the source. We report state of the art results on stretchable domains alignment in a rapid and stable solution for meshes and cloud of points. △ Less

Submitted 27 March, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

arXiv:2011.10147 [pdf, other]

FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation

Authors: Yair Kittenplon, Yonina C. Eldar, Dan Raviv

Abstract: Estimating the 3D motion of points in a scene, known as scene flow, is a core problem in computer vision. Traditional learning-based methods designed to learn end-to-end 3D flow often suffer from poor generalization. Here we present a recurrent architecture that learns a single step of an unrolled iterative alignment procedure for refining scene flow predictions. Inspired by classical algorithms,… ▽ More Estimating the 3D motion of points in a scene, known as scene flow, is a core problem in computer vision. Traditional learning-based methods designed to learn end-to-end 3D flow often suffer from poor generalization. Here we present a recurrent architecture that learns a single step of an unrolled iterative alignment procedure for refining scene flow predictions. Inspired by classical algorithms, we demonstrate iterative convergence toward the solution using strong regularization. The proposed method can handle sizeable temporal deformations and suggests a slimmer architecture than competitive all-to-all correlation approaches. Trained on FlyingThings3D synthetic data only, our network successfully generalizes to real scans, outperforming all existing methods by a large margin on the KITTI self-supervised benchmark. △ Less

Submitted 4 April, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

Comments: CVPR 2021

arXiv:2009.05489 [pdf, other]

doi 10.1007/s10032-021-00384-2

MRZ code extraction from visa and passport documents using convolutional neural networks

Authors: Yichuan Liu, Hailey James, Otkrist Gupta, Dan Raviv

Abstract: Detecting and extracting information from Machine-Readable Zone (MRZ) on passports and visas is becoming increasingly important for verifying document authenticity. However, computer vision methods for performing similar tasks, such as optical character recognition (OCR), fail to extract the MRZ given digital images of passports with reasonable accuracy. We present a specially designed model based… ▽ More Detecting and extracting information from Machine-Readable Zone (MRZ) on passports and visas is becoming increasingly important for verifying document authenticity. However, computer vision methods for performing similar tasks, such as optical character recognition (OCR), fail to extract the MRZ given digital images of passports with reasonable accuracy. We present a specially designed model based on convolutional neural networks that is able to successfully extract MRZ information from digital images of passports of arbitrary orientation and size. Our model achieved 100% MRZ detection rate and 98.36% character recognition macro-f1 score on a passport and visa dataset. △ Less

Submitted 20 July, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

Comments: This is a preprint of https://doi.org/10.1007/s10032-021-00384-2

arXiv:2009.05158 [pdf, other]

OCR Graph Features for Manipulation Detection in Documents

Authors: Hailey Joren, Otkrist Gupta, Dan Raviv

Abstract: Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driv… ▽ More Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task. △ Less

Submitted 14 September, 2020; v1 submitted 10 September, 2020; originally announced September 2020.

arXiv:2005.02160 [pdf, other]

Printing and Scanning Attack for Image Counter Forensics

Authors: Hailey Joren, Otkrist Gupta, Dan Raviv

Abstract: Examining the authenticity of images has become increasingly important as manipulation tools become more accessible and advanced. Recent work has shown that while CNN-based image manipulation detectors can successfully identify manipulations, they are also vulnerable to adversarial attacks, ranging from simple double JPEG compression to advanced pixel-based perturbation. In this paper we explore… ▽ More Examining the authenticity of images has become increasingly important as manipulation tools become more accessible and advanced. Recent work has shown that while CNN-based image manipulation detectors can successfully identify manipulations, they are also vulnerable to adversarial attacks, ranging from simple double JPEG compression to advanced pixel-based perturbation. In this paper we explore another method of highly plausible attack: printing and scanning. We demonstrate the vulnerability of two state-of-the-art models to this type of attack. We also propose a new machine learning model that performs comparably to these state-of-the-art models when trained and validated on printed and scanned images. Of the three models, our proposed model outperforms the others when trained and validated on images from a single printer. To facilitate this exploration, we create a dataset of over 6,000 printed and scanned image blocks. Further analysis suggests that variation between images produced from different printers is significant, large enough that good validation accuracy on images from one printer does not imply similar validation accuracy on identical images from a different printer. △ Less

Submitted 24 June, 2020; v1 submitted 26 April, 2020; originally announced May 2020.

Comments: 10 pages, 5 figures, 7 tables

arXiv:1912.01249 [pdf, other]

doi 10.1007/978-3-030-58558-7_3

Cyclic Functional Mapping: Self-supervised correspondence between non-isometric deformable shapes

Authors: Dvir Ginzburg, Dan Raviv

Abstract: We present the first utterly self-supervised network for dense correspondence mapping between non-isometric shapes. The task of alignment in non-Euclidean domains is one of the most fundamental and crucial problems in computer vision. As 3D scanners can generate highly complex and dense models, the mission of finding dense mappings between those models is vital. The novelty of our solution is base… ▽ More We present the first utterly self-supervised network for dense correspondence mapping between non-isometric shapes. The task of alignment in non-Euclidean domains is one of the most fundamental and crucial problems in computer vision. As 3D scanners can generate highly complex and dense models, the mission of finding dense mappings between those models is vital. The novelty of our solution is based on a cyclic mapping between metric spaces, where the distance between a pair of points should remain invariant after the full cycle. As the same learnable rules that generate the point-wise descriptors apply in both directions, the network learns invariant structures without any labels while coping with non-isometric deformations. We show here state-of-the-art-results by a large margin for a variety of tasks compared to known self-supervised and supervised methods. △ Less

Submitted 3 December, 2019; originally announced December 2019.

arXiv:1702.01293 [pdf, other]

Latent Hinge-Minimax Risk Minimization for Inference from a Small Number of Training Samples

Authors: Dolev Raviv, Margarita Osadchy

Abstract: Deep Learning (DL) methods show very good performance when trained on large, balanced data sets. However, many practical problems involve imbalanced data sets, or/and classes with a small number of training samples. The performance of DL methods as well as more traditional classifiers drops significantly in such settings. Most of the existing solutions for imbalanced problems focus on customizing… ▽ More Deep Learning (DL) methods show very good performance when trained on large, balanced data sets. However, many practical problems involve imbalanced data sets, or/and classes with a small number of training samples. The performance of DL methods as well as more traditional classifiers drops significantly in such settings. Most of the existing solutions for imbalanced problems focus on customizing the data for training. A more principled solution is to use mixed Hinge-Minimax risk [19] specifically designed to solve binary problems with imbalanced training sets. Here we propose a Latent Hinge Minimax (LHM) risk and a training algorithm that generalizes this paradigm to an ensemble of hyperplanes that can form arbitrary complex, piecewise linear boundaries. To extract good features, we combine LHM model with CNN via transfer learning. To solve multi-class problem we map pre-trained category-specific LHM classifiers to a multi-class neural network and adjust the weights with very fast tuning. LHM classifier enables the use of unlabeled data in its training and the mapping allows for multi-class inference, resulting in a classifier that performs better than alternatives when trained on a small number of training samples. △ Less

Submitted 4 February, 2017; originally announced February 2017.

arXiv:1603.06829 [pdf, other]

Multi-velocity neural networks for gesture recognition in videos

Authors: Otkrist Gupta, Dan Raviv, Ramesh Raskar

Abstract: We present a new action recognition deep neural network which adaptively learns the best action velocities in addition to the classification. While deep neural networks have reached maturity for image understanding tasks, we are still exploring network topologies and features to handle the richer environment of video clips. Here, we tackle the problem of multiple velocities in action recognition,… ▽ More We present a new action recognition deep neural network which adaptively learns the best action velocities in addition to the classification. While deep neural networks have reached maturity for image understanding tasks, we are still exploring network topologies and features to handle the richer environment of video clips. Here, we tackle the problem of multiple velocities in action recognition, and provide state-of-the-art results for gesture recognition, on known and new collected datasets. We further provide the training steps for our semi-supervised network, suited to learn from huge unlabeled datasets with only a fraction of labeled examples. △ Less

Submitted 22 March, 2016; originally announced March 2016.

arXiv:1603.06531 [pdf, other]

Deep video gesture recognition using illumination invariants

Authors: Otkrist Gupta, Dan Raviv, Ramesh Raskar

Abstract: In this paper we present architectures based on deep neural nets for gesture recognition in videos, which are invariant to local scaling. We amalgamate autoencoder and predictor architectures using an adaptive weighting scheme coping with a reduced size labeled dataset, while enriching our models from enormous unlabeled sets. We further improve robustness to lighting conditions by introducing a ne… ▽ More In this paper we present architectures based on deep neural nets for gesture recognition in videos, which are invariant to local scaling. We amalgamate autoencoder and predictor architectures using an adaptive weighting scheme coping with a reduced size labeled dataset, while enriching our models from enormous unlabeled sets. We further improve robustness to lighting conditions by introducing a new adaptive filer based on temporal local scale normalization. We provide superior results over known methods, including recent reported approaches based on neural nets. △ Less

Submitted 21 March, 2016; originally announced March 2016.

arXiv:1511.06147 [pdf, other]

Coreset-Based Adaptive Tracking

Authors: Abhimanyu Dubey, Nikhil Naik, Dan Raviv, Rahul Sukthankar, Ramesh Raskar

Abstract: We propose a method for learning from streaming visual data using a compact, constant size representation of all the data that was seen until a given moment. Specifically, we construct a 'coreset' representation of streaming data using a parallelized algorithm, which is an approximation of a set with relation to the squared distances between this set and all other points in its ambient space. We l… ▽ More We propose a method for learning from streaming visual data using a compact, constant size representation of all the data that was seen until a given moment. Specifically, we construct a 'coreset' representation of streaming data using a parallelized algorithm, which is an approximation of a set with relation to the squared distances between this set and all other points in its ambient space. We learn an adaptive object appearance model from the coreset tree in constant time and logarithmic space and use it for object tracking by detection. Our method obtains excellent results for object tracking on three standard datasets over more than 100 videos. The ability to summarize data efficiently makes our method ideally suited for tracking in long videos in presence of space and time constraints. We demonstrate this ability by outperforming a variety of algorithms on the TLD dataset with 2685 frames on average. This coreset based learning approach can be applied for both real-time learning of small, varied data and fast learning of big data. △ Less

Submitted 19 November, 2015; originally announced November 2015.

Comments: 8 pages, 5 figures, In submission to IEEE TPAMI (Transactions on Pattern Analysis and Machine Intelligence)

arXiv:1012.5936 [pdf, ps, other]

Affine-invariant geodesic geometry of deformable 3D shapes

Authors: Dan Raviv, Alexander M. Bronstein, Michael M. Bronstein, Ron Kimmel, Nir Sochen

Abstract: Natural objects can be subject to various transformations yet still preserve properties that we refer to as invariants. Here, we use definitions of affine invariant arclength for surfaces in R^3 in order to extend the set of existing non-rigid shape analysis tools. In fact, we show that by re-defining the surface metric as its equi-affine version, the surface with its modified metric tensor can be… ▽ More Natural objects can be subject to various transformations yet still preserve properties that we refer to as invariants. Here, we use definitions of affine invariant arclength for surfaces in R^3 in order to extend the set of existing non-rigid shape analysis tools. In fact, we show that by re-defining the surface metric as its equi-affine version, the surface with its modified metric tensor can be treated as a canonical Euclidean object on which most classical Euclidean processing and analysis tools can be applied. The new definition of a metric is used to extend the fast marching method technique for computing geodesic distances on surfaces, where now, the distances are defined with respect to an affine invariant arclength. Applications of the proposed framework demonstrate its invariance, efficiency, and accuracy in shape analysis. △ Less

Submitted 29 December, 2010; originally announced December 2010.

arXiv:1012.5933 [pdf, ps, other]

Affine-invariant diffusion geometry for the analysis of deformable 3D shapes

Authors: Dan Raviv, Alexander M. Bronstein, Michael M. Bronstein, Ron Kimmel, Nir Sochen

Abstract: We introduce an (equi-)affine invariant diffusion geometry by which surfaces that go through squeeze and shear transformations can still be properly analyzed. The definition of an affine invariant metric enables us to construct an invariant Laplacian from which local and global geometric structures are extracted. Applications of the proposed framework demonstrate its power in generalizing and enri… ▽ More We introduce an (equi-)affine invariant diffusion geometry by which surfaces that go through squeeze and shear transformations can still be properly analyzed. The definition of an affine invariant metric enables us to construct an invariant Laplacian from which local and global geometric structures are extracted. Applications of the proposed framework demonstrate its power in generalizing and enriching the existing set of tools for shape analysis. △ Less

Submitted 29 December, 2010; originally announced December 2010.

Showing 1–34 of 34 results for author: Raviv, D