Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods

Andrew J. Kramer
Amazon, LLC
1988kramer@gmail.com
&Christoffer Heckman
University of Colorado, Boulder
christoffer.heckman@colorado.edu
Abstract

For autonomous ground vehicles (AGVs) deployed in suburban neighborhoods and other human-centric environments the problem of localization remains a fundamental challenge. There are well established methods for localization with GPS, lidar, and cameras. But even in ideal conditions these have limitations. GPS is not always available and is often not accurate enough on its own, visual methods have difficulty coping with appearance changes due to weather and other factors, and lidar methods are prone to defective solutions due to ambiguous scene geometry. Radar on the other hand is not highly susceptible to these problems, owing in part to its longer range. Further, radar is also robust to challenging conditions that interfere with vision and lidar including fog, smoke, rain, and darkness. We present a radar-based localization system that includes a novel method for highly-accurate radar odometry for smooth, high-frequency relative pose estimation and a novel method for radar-based place recognition and relocalization. We present experiments demonstrating our methods’ accuracy and reliability, which are comparable with other methods’ published results for radar localization and we find outperform a similar method as ours applied to lidar measurements. Further, we show our methods are lightweight enough to run on common low-power embedded hardware with ample headroom for other autonomy functions.

1 Introduction

Deploying autonomous ground vehicles (AGVs) in human-centric environments like sidewalks and multi-use paths remains a significant challenge. High accuracy localization is essential because the difference between the robot driving safely on the sidewalk and drifting into traffic is often just tens of centimeters. Unfortunately, GPS does not meet this accuracy requirement when it is available, and of course one can’t assume GPS will be available at all times. Many methods have been published addressing the problem of global localization using cameras and lidar. However, even in ideal conditions these sensors have their weaknesses. Visual odometry is susceptible to low light and motion blur while visual place recognition still has significant problems with appearance changes due to weather, lighting changes, and seasonal changes. Lidar-based methods, on the other hand, are prone to defective solutions when presented with ambiguous scene geometry. These problems may be dismissed as edge cases, but even in a modest-sized fleet of AGVs these edge cases occur quite frequently.

Millimeter-wave radar has the potential to sidestep many of the issues encountered other localization methods. It is not affected by visual challenges like darkness, fog, or smoke. Further, radar is not affected by ambiguous scene geometry because it is able to directly measure motion and sense material properties. Despite these advantages, though, radar has seen limited adoption in the robotic localization literature. Many of the existing methods use the NavTech scanning radar sensor including [Cen and Newman, 2018, Cen and Newman, 2019, Barnes and Posner, 2020]. [Cen and Newman, 2018, Cen and Newman, 2019] focus on the odometry problem, [Hong et al., 2020] proposes a SLAM method, and [Barnes and Posner, 2020] adds place recognition and relocalization. Similar works focusing on scanning radars for place recognition [Kim et al., 2020, Kim et al., 2021] have shown remarkable performance, approaching that of lidar-based place recognition. Unfortunately, the NavTech sensor all of these methods depend on is large, power-hungry, and expensive.

Automotive-grade millimeter-wave radar sensors are small, low-power, and inexpensive, making them a better option for AGVs. [Schuster et al., 2016b] uses automotive radar sensors to create maps by using stream clustering to identify landmarks in radar scans. They can then reliably align radar scans to those maps for global localization. [Schuster et al., 2016a] aggregates radar detections into pseudo-images and uses the BASD feature detector [Rapp et al., 2016] to detect and match features in those pseudo images. [Kramer et al., 2020] fuses radar and IMU measurements to estimate a vehicle’s velocity and orientation in 3D, and [Kramer and Heckman, 2021] extends the method by adding a radar-based occupancy mapping method. The method of [Kramer and Heckman, 2021] is similar to [Schuster et al., 2016a] in that they aggregate radar detections into discretized maps, but [Kramer and Heckman, 2021] uses a learned noise filter based on PointNet [Qi et al., 2016] to remove spurious radar detections and they use a 3D octomap [Hornung et al., 2013] rather than 2D dense pseudo images. However, [Kramer and Heckman, 2021] only uses their radar-based map for obstacle detection and does not attempt to localize against it. [Doer and Trommer, 2020b] and [Doer and Trommer, 2020a] present a similar radar-inertial odometry (RIO) method using a Doppler-based radar constraint. A fault with all of these Doppler-based methods is they only use gyroscope measurements to constrain the robot’s heading estimate, so their heading estimate drifts severely over longer trajectories. [Doer and Trommer, 2021] propose to get around this issue by making the restrictive assumption that the robot is operating in a Manhattan (grid) world. Other techniques [Kung et al., 2021, Lu et al., 2020] have demonstrated impressive performance, but the former does not include an IMU which would likely improve performance, while the latter treats the IMU with a neural network. There are many ongoing research directions related to the use of automotive radar being developed continuously [Harlow et al., 2023], some of which also include loop closure detection with federated back-ends [Zhang et al., 2023] and are resistant to dynamic obstacles and map elements [Chen et al., 2023]. To the author’s knowledge, so far no method has solved the problem of a general heading constraint in RIO using low-cost automotive sensors.

Our contributions beyond these works are as follows:

  1. 1.

    A method for RIO that includes a novel method for dynamic outlier rejection and a unique radar-based heading constraint.

  2. 2.

    A method for creating radar-based maps that effectively removes noise and preserves repeatable environmental features.

  3. 3.

    A method for estimating the robot’s global pose by registering local-to-global radar maps with accuracy and reliability comparable to lidar-based methods.

2 Radar-Inertial Odometry

Our RIO method takes inputs from an arbitrary array of synchronized radar sensors and a single IMU. It estimates the robot’s state in 6DoF using a nonlinear optimization over a sliding window of recent states and measurements implemented with iSAM2 [Kaess et al., 2011]. The overall structure of the problem is shown as a factor graph in Figure 1. In subsequent sections we give further detail on the constraints used in the optimization problem including our dynamic object detection method and our Doppler, IMU, and heading constraints. However, we’ll first give a brief overview of the notation used in subsequent sections and defined the estimated states and sensor measurements.

Refer to caption
Figure 1: Factor graph used for the RIO estimator, with increasing time going left. The Doppler and Landmark errors edsubscript𝑒𝑑e_{d}italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and elsubscript𝑒𝑙e_{l}italic_e start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT provide constraints on the state that results in a nonlinear minimization problem over these errors by adjusting the state variables xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. The states are connected through binary constraints eisubscript𝑒𝑖e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT induced by IMU measurements. A prior is formed from marginalized states prior to x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT which have fallen off of the sliding window, thereby maintaining a constant-time algorithm.

2.1 Notation, State, and Measurements

We note scalars in lower case as in r𝑟ritalic_r, vectors in lower case bold as in 𝐯𝐯\mathbf{v}bold_v, and matrices in upper case bold as in 𝐑𝐑\mathbf{R}bold_R. The frame in which a quantity is expressed is noted as a pre-superscript. For instance, 𝐩Isuperscript𝐩𝐼\prescript{I}{}{\mathbf{p}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p denotes vector 𝐩𝐩\mathbf{p}bold_p expressed in frame I𝐼Iitalic_I. For velocities, the pre-superscript notes both the frame for and in which velocity is being expressed; note that velocities are always relative to a global reference frame. As an example, 𝐯Rsuperscript𝐯𝑅\prescript{R}{}{\mathbf{v}}start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_v is the velocity of frame R𝑅Ritalic_R relative to the global reference frame expressed in frame R𝑅Ritalic_R. We represent rotations as either 3D rotation matrices 𝐑𝐑\mathbf{R}bold_R or quaternions 𝐪𝐪\mathbf{q}bold_q. Rotation frames are noted as pre-scripts. So 𝐪RIsuperscriptsubscript𝐪𝑅𝐼\prescript{I}{R}{\mathbf{q}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT bold_q is the quaternion describing the rotation from frame R𝑅Ritalic_R to frame I𝐼Iitalic_I. The frames used in this work include O𝑂Oitalic_O, the global reference frame; I𝐼Iitalic_I, the IMU reference frame; and R𝑅Ritalic_R the frame of a radar sensor. Lastly, estimated quanities are noted with a ^^absent\hat{}over^ start_ARG end_ARG.

The proposed system optimizes a sliding window of N𝑁Nitalic_N robot states 𝐗=[𝐱0,,𝐱N]𝐗superscript𝐱0superscript𝐱𝑁\mathbf{X}=[\mathbf{x}^{0},\dots,\mathbf{x}^{N}]bold_X = [ bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , … , bold_x start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ] where the state at timestep k𝑘kitalic_k is 𝐱k=[𝐯TO,𝐪TIO,𝐛aT,𝐛gT]Tsuperscript𝐱𝑘superscriptsuperscriptsuperscript𝐯𝑇𝑂superscriptsubscriptsuperscript𝐪𝑇𝐼𝑂superscriptsubscript𝐛𝑎𝑇superscriptsubscript𝐛𝑔𝑇𝑇\mathbf{x}^{k}=[\prescript{O}{}{\mathbf{v}}^{T},\prescript{O}{I}{\mathbf{q}}^{% T},\mathbf{b}_{a}^{T},\mathbf{b}_{g}^{T}]^{T}bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = [ start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT bold_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_b start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, the robot’s velocity, orientation, and IMU biases. The measurement at a single timestep from a radar sensor is referred to as a scan, and a scan consists of a set of detections 𝒟𝒟\mathcal{D}caligraphic_D. Each detection represents a target detected by the CFAR [Scharf and Demeure, 1991] detector running onboard the radar sensor and detections are parameterized as 𝐝=[𝐩TR,r˙R]T𝐝superscriptsuperscriptsuperscript𝐩𝑇𝑅superscript˙𝑟𝑅𝑇\mathbf{d}=[\prescript{R}{}{\mathbf{p}}^{T},\prescript{R}{}{\dot{r}}]^{T}bold_d = [ start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_p start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT over˙ start_ARG italic_r end_ARG ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where 𝐩3𝐩superscript3\mathbf{p}\in\mathbb{R}^{3}bold_p ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT refers to the 3-dimensional vector between the radar sensor and the detection, and r˙˙𝑟\dot{r}\in\mathbb{R}over˙ start_ARG italic_r end_ARG ∈ blackboard_R refers to the Doppler measurement along the vector (i.e., the velocity of the detection along the ray cast between the sensor and the target). A measurement from the IMU is parameterized as 𝐳=[𝐚TI,𝝎TI]T𝐳superscriptsuperscriptsuperscript𝐚𝑇𝐼superscriptsuperscript𝝎𝑇𝐼𝑇\mathbf{z}=[\prescript{I}{}{\mathbf{a}}^{T},\prescript{I}{}{\boldsymbol{\omega% }}^{T}]^{T}bold_z = [ start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_italic_ω start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT where 𝐚Isuperscript𝐚𝐼\prescript{I}{}{\mathbf{a}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_a and 𝝎Isuperscript𝝎𝐼\prescript{I}{}{\boldsymbol{\omega}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_italic_ω are the acceleration and angular velocity in the IMU frame respectively.

2.2 Dynamic Object Rejection

A crucial part of our RIO system is our method for detecting and discarding radar detections which represent dynamic objects like cars, cyclists, and pedestrians. A common way to do this across many sensor types is random sample consensus (RANSAC) [Fischler and Bolles, 1981]. There is a simple linear relationship between a static detection’s range rate to the sensor’s velocity, r˙R=𝐯R𝐩R𝐩Rsuperscript˙𝑟𝑅superscript𝐯𝑅superscript𝐩𝑅normsuperscript𝐩𝑅\prescript{R}{}{\dot{r}}=\prescript{R}{}{\mathbf{v}}\cdot\frac{\prescript{R}{}% {\mathbf{p}}}{\|\prescript{R}{}{\mathbf{p}}\|}start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT over˙ start_ARG italic_r end_ARG = start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_v ⋅ divide start_ARG start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_p end_ARG start_ARG ∥ start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_p ∥ end_ARG and it is easily incorporated into a RANSAC model to solve the problem:

𝐯^R=argmin𝐯R(𝐝𝒟r˙𝐝R𝐯Rp𝐝Rp𝐝R)2superscript^𝐯𝑅superscript𝐯𝑅argminsuperscriptsubscript𝐝𝒟superscriptsubscript˙𝑟𝐝𝑅superscript𝐯𝑅superscriptsubscriptp𝐝𝑅normsuperscriptsubscriptp𝐝𝑅2\prescript{R}{}{\hat{\mathbf{v}}}=\underset{\prescript{R}{}{\mathbf{v}}}{\text% {argmin}}\Bigg{(}\sum_{\mathbf{d}\in\mathcal{D}}\prescript{R}{}{\dot{r}}_{% \mathbf{d}}-\prescript{R}{}{\mathbf{v}}\cdot\frac{\prescript{R}{}{\textbf{p}}_% {\mathbf{d}}}{\|\prescript{R}{}{\textbf{p}}_{\mathbf{d}}\|}\Bigg{)}^{2}start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT over^ start_ARG bold_v end_ARG = start_UNDERACCENT start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_v end_UNDERACCENT start_ARG argmin end_ARG ( ∑ start_POSTSUBSCRIPT bold_d ∈ caligraphic_D end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT - start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_v ⋅ divide start_ARG start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT end_ARG start_ARG ∥ start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT ∥ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1)

The above formulation has been used in other radar odometry methods including [Kramer et al., 2020]. It can reliably differentiate between static and dynamic objects in a scan from a single sensor and provide a good initial guess at that sensor’s velocity. Using this model with multiple synchronized sensors would require running a separate RANSAC problem for each sensor, but this presents a problem. If a scan from one sensor does not contain a sufficient number of inliers then that scan must be discarded entirely, even if there is a sufficient number of inliers from all the sensors combined. Further, this method results in different estimates of the local frame velocities for each sensor 𝐯Rssuperscript𝐯subscript𝑅𝑠\prescript{R_{s}}{}{\mathbf{v}}start_FLOATSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT bold_v but it would be more useful to estimate a single IMU frame velocity 𝐯Isuperscript𝐯𝐼\prescript{I}{}{\mathbf{v}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_v.

To address these issues we consider all detections from each sensor in the same RANSAC problem. To do this we first transform the detection’s location in the radar frame to the IMU frame as 𝐩I=𝐑RsI𝐩Rs+𝐭RsIsuperscript𝐩𝐼superscriptsubscript𝐑subscript𝑅𝑠𝐼superscript𝐩subscript𝑅𝑠superscriptsubscript𝐭subscript𝑅𝑠𝐼\prescript{I}{}{\mathbf{p}}=\prescript{I}{R_{s}}{\mathbf{R}}\prescript{R_{s}}{% }{\mathbf{p}}+\prescript{I}{R_{s}}{\mathbf{t}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p = start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_R start_FLOATSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT bold_p + start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_t where 𝐑RsIsuperscriptsubscript𝐑subscript𝑅𝑠𝐼\prescript{I}{R_{s}}{\mathbf{R}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_R is the rotation from the frame of radar sensor s𝑠sitalic_s to the IMU frame and 𝐭RsIsuperscriptsubscript𝐭subscript𝑅𝑠𝐼\prescript{I}{R_{s}}{\mathbf{t}}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_t is the translation. We then offset the range rate measurement with the velocity induced by the lever-arm between the radar sensor and IMU as

r˙I=r˙Rs+((𝐭IRs)×𝐑IRs𝝎I)𝐩Rs𝐩Rssuperscript˙𝑟𝐼superscript˙𝑟subscript𝑅𝑠superscriptsuperscriptsubscript𝐭𝐼subscript𝑅𝑠superscriptsubscript𝐑𝐼subscript𝑅𝑠superscript𝝎𝐼superscript𝐩subscript𝑅𝑠normsuperscript𝐩subscript𝑅𝑠\prescript{I}{}{\dot{r}}=\prescript{R_{s}}{}{\dot{r}}+\Big{(}\big{(}\prescript% {R_{s}}{I}{\mathbf{t}}\big{)}^{\times}\prescript{R_{s}}{I}{\mathbf{R}}% \prescript{I}{}{\boldsymbol{\omega}}\Big{)}\cdot\frac{\prescript{R_{s}}{}{% \mathbf{p}}}{\|\prescript{R_{s}}{}{\mathbf{p}}\|}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT over˙ start_ARG italic_r end_ARG = start_FLOATSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT over˙ start_ARG italic_r end_ARG + ( ( start_FLOATSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_t ) start_POSTSUPERSCRIPT × end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_R start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_italic_ω ) ⋅ divide start_ARG start_FLOATSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT bold_p end_ARG start_ARG ∥ start_FLOATSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_FLOATSUPERSCRIPT bold_p ∥ end_ARG (2)

where ()×superscript(\cdot)^{\times}( ⋅ ) start_POSTSUPERSCRIPT × end_POSTSUPERSCRIPT returns the skew-symmetric matrix corresponding to a given 3D vector. This allows us to process all of the detections from each radar sensor in the same RANSAC problem as

𝐯^I=argmin𝐯I(s𝒮𝐝𝒟sr˙𝐝I𝐯I𝐩𝐝I𝐩𝐝I)2superscript^𝐯𝐼superscript𝐯𝐼argminsuperscriptsubscript𝑠𝒮subscript𝐝subscript𝒟𝑠superscriptsubscript˙𝑟𝐝𝐼superscript𝐯𝐼superscriptsubscript𝐩𝐝𝐼normsuperscriptsubscript𝐩𝐝𝐼2\prescript{I}{}{\hat{\mathbf{v}}}=\underset{\prescript{I}{}{\mathbf{v}}}{\text% {argmin}}\Bigg{(}\sum_{s\in\mathcal{S}}\sum_{\mathbf{d}\in\mathcal{D}_{s}}% \prescript{I}{}{\dot{r}}_{\mathbf{d}}-\prescript{I}{}{\mathbf{v}}\cdot\frac{% \prescript{I}{}{\mathbf{p}}_{\mathbf{d}}}{\|\prescript{I}{}{\mathbf{p}}_{% \mathbf{d}}\|}\Bigg{)}^{2}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT over^ start_ARG bold_v end_ARG = start_UNDERACCENT start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_v end_UNDERACCENT start_ARG argmin end_ARG ( ∑ start_POSTSUBSCRIPT italic_s ∈ caligraphic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT bold_d ∈ caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT - start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_v ⋅ divide start_ARG start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT end_ARG start_ARG ∥ start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT ∥ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (3)

where 𝒮𝒮\mathcal{S}caligraphic_S is the set of radar scans and 𝒟ssubscript𝒟𝑠\mathcal{D}_{s}caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is the set of detections from radar sensor s𝑠sitalic_s. Outliers identified in solving this RANSAC problem are assumed to represent dynamic objects and are discarded.

2.3 Doppler Constraint

After discarding detections identified as dynamic in the RANSAC step, radar detections are used to create constraints on the odom frame velocity 𝐯Osuperscript𝐯𝑂\prescript{O}{}{\mathbf{v}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT bold_v. The Doppler error function is

ed(𝐱^k,𝐝,𝐳k)=r˙R(𝐑^kOR𝐯^kO+(𝐭IR)×𝐑IR(𝝎kI𝐛^gk))𝐩R𝐩Rsubscript𝑒𝑑superscript^𝐱𝑘𝐝superscript𝐳𝑘superscript˙𝑟𝑅superscriptsubscriptsuperscript^𝐑𝑘𝑂𝑅superscriptsuperscript^𝐯𝑘𝑂superscriptsuperscriptsubscript𝐭𝐼𝑅superscriptsubscript𝐑𝐼𝑅superscriptsuperscript𝝎𝑘𝐼superscriptsubscript^𝐛𝑔𝑘superscript𝐩𝑅normsuperscript𝐩𝑅\begin{split}&e_{d}(\hat{\mathbf{x}}^{k},\mathbf{d},\mathbf{z}^{k})=\\ \prescript{R}{}{\dot{r}}-&\Big{(}\prescript{R}{O}{\hat{\mathbf{R}}}^{k}% \prescript{O}{}{\hat{\mathbf{v}}}^{k}+\big{(}\prescript{R}{I}{\mathbf{t}}\big{% )}^{\times}\prescript{R}{I}{\mathbf{R}}\big{(}\prescript{I}{}{\boldsymbol{% \omega}}^{k}-\hat{\mathbf{b}}_{g}^{k}\big{)}\Big{)}\cdot\frac{\prescript{R}{}{% \mathbf{p}}}{\|\prescript{R}{}{\mathbf{p}}\|}\end{split}start_ROW start_CELL end_CELL start_CELL italic_e start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( over^ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_d , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = end_CELL end_ROW start_ROW start_CELL start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT over˙ start_ARG italic_r end_ARG - end_CELL start_CELL ( start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT over^ start_ARG bold_R end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT over^ start_ARG bold_v end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + ( start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_t ) start_POSTSUPERSCRIPT × end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_R ( start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_italic_ω start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) ⋅ divide start_ARG start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_p end_ARG start_ARG ∥ start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_p ∥ end_ARG end_CELL end_ROW (4)

where 𝐑^kOR=𝐑IR𝐑^kOIsuperscriptsubscriptsuperscript^𝐑𝑘𝑂𝑅superscriptsubscript𝐑𝐼𝑅superscriptsubscriptsuperscript^𝐑𝑘𝑂𝐼\prescript{R}{O}{\hat{\mathbf{R}}}^{k}=\prescript{R}{I}{\mathbf{R}}\prescript{% I}{O}{\hat{\mathbf{R}}}^{k}start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT over^ start_ARG bold_R end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_R start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT over^ start_ARG bold_R end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and 𝐑^kOIsuperscriptsubscriptsuperscript^𝐑𝑘𝑂𝐼\prescript{I}{O}{\hat{\mathbf{R}}}^{k}start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT over^ start_ARG bold_R end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is the global frame to IMU frame rotation, which is estimated as part of the state vector; and 𝐑IRsuperscriptsubscript𝐑𝐼𝑅\prescript{R}{I}{\mathbf{R}}start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_R is the IMU frame to radar frame rotation, which is constant. An IMU measurement interpolated to match the timestamp of the radar measurement 𝐳ksuperscript𝐳𝑘\mathbf{z}^{k}bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is also required to estimate the effect of the sensor’s angular rate on the Doppler measurement.

2.4 IMU Constraint

In our system, IMU measurements are used to measure the system’s change in global-frame velocity and orientation between radar measurements. Our IMU model is very similar to that provided in GTSAM [Forster et al., 2015], except we do not use the IMU to measure the change in the system’s full pose, only its velocity and orientation.

The IMU provides measurements at many times the rate of the radar sensors and the IMU and radar measurements are not synchronized. To address this we first interpolate between the IMU measurements immediately before and after the radar measurements to obtain estimated IMU readings that align temporally with the radar measurements at timesteps k𝑘kitalic_k and k+1𝑘1k+1italic_k + 1. We then use the method for on-manifold IMU pre-integration presented in [Forster et al., 2015] to combine the set of IMU measurements between times k𝑘kitalic_k and k+1𝑘1k+1italic_k + 1 into a single relative constraint which is used to predict the state 𝐱k+1superscript𝐱𝑘1\mathbf{x}^{k+1}bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT from 𝐱^ksuperscript^𝐱𝑘\hat{\mathbf{x}}^{k}over^ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. The IMU error is then defined as the difference between the estimated state 𝐱^k+1superscript^𝐱𝑘1\hat{\mathbf{x}}^{k+1}over^ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT and the IMU propagated state

𝐞i(𝐱^k,𝐱^k+1,𝐳k:k+1)=[2[𝐪^k+1IO(𝐪k+1IO)1]1:3𝐯^k+1O𝐯k+1O𝐛^g𝐛g𝐛^a𝐛a]subscript𝐞𝑖superscript^𝐱𝑘superscript^𝐱𝑘1superscript𝐳:𝑘𝑘1matrix2subscriptdelimited-[]tensor-productsuperscriptsubscriptsuperscript^𝐪𝑘1𝐼𝑂superscriptsuperscriptsubscriptsuperscript𝐪𝑘1𝐼𝑂1:13superscriptsuperscript^𝐯𝑘1𝑂superscriptsuperscript𝐯𝑘1𝑂subscript^𝐛𝑔subscript𝐛𝑔subscript^𝐛𝑎subscript𝐛𝑎\mathbf{e}_{i}(\hat{\mathbf{x}}^{k},\hat{\mathbf{x}}^{k+1},\mathbf{z}^{k:k+1})% =\begin{bmatrix}2\Big{[}\prescript{O}{I}{\mathbf{\hat{q}}}^{k+1}\otimes(% \prescript{O}{I}{\mathbf{q}}^{k+1})^{-1}\Big{]}_{1:3}\\ \prescript{O}{}{\mathbf{\hat{v}}}^{k+1}-\prescript{O}{}{\mathbf{v}}^{k+1}\\ \mathbf{\hat{b}}_{g}-\mathbf{b}_{g}\\ \mathbf{\hat{b}}_{a}-\mathbf{b}_{a}\end{bmatrix}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , over^ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k : italic_k + 1 end_POSTSUPERSCRIPT ) = [ start_ARG start_ROW start_CELL 2 [ start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT over^ start_ARG bold_q end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ⊗ ( start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_q start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT 1 : 3 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT over^ start_ARG bold_v end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT bold_v start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT - bold_b start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (5)

where the tensor-product\otimes operator is as defined in [Barfoot et al., 2011].

2.5 Heading Constraint

With only the Doppler and IMU constraints our state estimation problem is underdetermined. While the gyroscope biases in pitch and roll are constrained by the gravity direction, there is no constraint on the gyroscope bias in yaw. This will cause the system’s heading estimate to drift over longer trajectories. To address this we implement a novel means of constraining the system’s heading estimate by identifying and tracking persistent, repeatable landmarks in radar scans. This consists of two components: a unique method for associating new radar detections to existing landmarks, and a series of checks to determine if the confidence in a given landmark is high enough to be used to create constraints in the estimator.

To identify candidate matches between the set of new radar detections 𝒟𝒟\mathcal{D}caligraphic_D and tracked landmarks from the last timestep \mathcal{L}caligraphic_L we first construct a dense matrix of the distances 𝐒|𝒟|×||𝐒superscript𝒟\mathbf{S}\in\mathbb{R}^{|\mathcal{D}|\times|\mathcal{L}|}bold_S ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_D | × | caligraphic_L | end_POSTSUPERSCRIPT between each detection 𝐝𝐝\mathbf{d}bold_d and landmark 𝐥𝐥\mathbf{l}bold_l. We use a unique weighted polar distance metric first proposed in [Minguez et al., 2006]:

ϕ(𝐩)= atan2(𝐩y,𝐩x)𝐒𝐝,𝐥=(L2(ϕ(𝐩𝐝I)ϕ(𝐩𝐥I))2+(𝐩𝐝I𝐩𝐥I)2)12italic-ϕ𝐩 atan2superscript𝐩𝑦superscript𝐩𝑥subscript𝐒𝐝𝐥superscriptsuperscript𝐿2superscriptitalic-ϕsuperscriptsubscript𝐩𝐝𝐼italic-ϕsuperscriptsubscript𝐩𝐥𝐼2superscriptdelimited-∥∥superscriptsubscript𝐩𝐝𝐼delimited-∥∥superscriptsubscript𝐩𝐥𝐼212\begin{split}\phi(\mathbf{p})=&\text{ atan2}(\mathbf{p}^{y},\mathbf{p}^{x})\\ \mathbf{S}_{\mathbf{d},\mathbf{l}}=&\Big{(}L^{2}\big{(}\phi(\prescript{I}{}{% \mathbf{p}}_{\mathbf{d}})-\phi(\prescript{I}{}{\mathbf{p}}_{\mathbf{l}})\big{)% }^{2}\\ &\ +\big{(}\|\prescript{I}{}{\mathbf{p}_{\mathbf{d}}}\|-\|\prescript{I}{}{% \mathbf{p}_{\mathbf{l}}}\|\big{)}^{2}\Big{)}^{\frac{1}{2}}\end{split}start_ROW start_CELL italic_ϕ ( bold_p ) = end_CELL start_CELL atan2 ( bold_p start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT , bold_p start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_S start_POSTSUBSCRIPT bold_d , bold_l end_POSTSUBSCRIPT = end_CELL start_CELL ( italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ϕ ( start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT ) - italic_ϕ ( start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_l end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ( ∥ start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT ∥ - ∥ start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_l end_POSTSUBSCRIPT ∥ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_CELL end_ROW (6)

This distance metric is calculated in polar coordinates and it allows us to control the relative importance of the azimuth and range dimensions by setting the coefficient L𝐿Litalic_L. This allows us to more accurately model the noise distribution of our radar sensors when matching new detections to existing landmarks.

This is the only appearance of the position of the Doppler targets 𝐩𝐝subscript𝐩𝐝\mathbf{p_{d}}bold_p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT in our algorithm. This is because typical frame-to-frame tracking of these targets is generally not reliable over long durations due to the nature of noise affecting radar measurements. Landmarks are generally not consistently trackable for long traces in time, however they are very consistent through rotations. In fact, through rotation in yaw, radar targets are remarkably consistent, likely as a result of the radar cross section of the target not undergoing significant change compared with other types of motion with respect to hypothetical targets. We leverage this fact to support localization through rotation in yaw specifically.

At this point 𝐒𝐒\mathbf{S}bold_S can be viewed as a weighted bipartite graph and a set of potential matches between detections and landmarks is found with the Hungarian algorithm [Munkres, 1957]. A potential match is validated if the distance between the detection and landmark found by equation 6 is below a user-defined threshold. New detections for which no valid match is found are initialized as new landmarks. Tracked landmarks are discarded if the time elapsed since the last time the landmark was successfully matched to a new detection is above a threshold value.

Each valid match between a landmark and detection is not necessarily used to create a new constraint in the odometry estimator. We track two quantities for each landmark to determine if a new match should be used to create a new constraint. The number of times the landmark was previously observed must be above a threshold value and the maximum error in the obervations of the landmark must be below a threshold value. If the matched landmark meets these criteria the matched pair [𝐝,𝐥]𝐝𝐥[\mathbf{d},\mathbf{l}][ bold_d , bold_l ] is added to the set of active matches 𝒮𝒮\mathcal{S}caligraphic_S. For each active match an error term of the form:

el(𝐱^k,𝐝,𝐥)=ϕ(𝐑RI𝐩𝐝R+𝐭RI)ϕ(𝐑^kOI𝐩𝐥O+𝐭OI)subscript𝑒𝑙superscript^𝐱𝑘𝐝𝐥italic-ϕsuperscriptsubscript𝐑𝑅𝐼superscriptsubscript𝐩𝐝𝑅superscriptsubscript𝐭𝑅𝐼italic-ϕsuperscriptsubscriptsuperscript^𝐑𝑘𝑂𝐼superscriptsubscript𝐩𝐥𝑂superscriptsubscript𝐭𝑂𝐼e_{l}(\hat{\mathbf{x}}^{k},\mathbf{d},\mathbf{l})=\phi(\prescript{I}{R}{% \mathbf{R}}\prescript{R}{}{\mathbf{p}}_{\mathbf{d}}+\prescript{I}{R}{\mathbf{t% }})-\phi(\prescript{I}{O}{\hat{\mathbf{R}}}^{k}\prescript{O}{}{\mathbf{p}}_{% \mathbf{l}}+\prescript{I}{O}{\mathbf{t}})italic_e start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( over^ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_d , bold_l ) = italic_ϕ ( start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT bold_R start_FLOATSUPERSCRIPT italic_R end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_d end_POSTSUBSCRIPT + start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT bold_t ) - italic_ϕ ( start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT over^ start_ARG bold_R end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_l end_POSTSUBSCRIPT + start_FLOATSUPERSCRIPT italic_I end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT bold_t ) (7)

is added to the estimator. Note the landmark position 𝐩𝐥Osuperscriptsubscript𝐩𝐥𝑂\prescript{O}{}{\mathbf{p}}_{\mathbf{l}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_l end_POSTSUBSCRIPT and the IMU frame to global frame translation 𝐭IOsuperscriptsubscript𝐭𝐼𝑂\prescript{O}{I}{\mathbf{t}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_t are not estimated, they are treated as constants. 𝐩𝐥Osuperscriptsubscript𝐩𝐥𝑂\prescript{O}{}{\mathbf{p}}_{\mathbf{l}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_l end_POSTSUBSCRIPT is set to the first observed position of the landmark and 𝐭IOsuperscriptsubscript𝐭𝐼𝑂\prescript{O}{I}{\mathbf{t}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_t is found by integrating the estimated velocities 𝐯^Osuperscript^𝐯𝑂\prescript{O}{}{\hat{\mathbf{v}}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT over^ start_ARG bold_v end_ARG. This greatly simplifies the optimization and ensures the landmark error terms only influence the system’s orientation estimate 𝐪^IOsuperscriptsubscript^𝐪𝐼𝑂\prescript{O}{I}{\hat{\mathbf{q}}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT over^ start_ARG bold_q end_ARG. Empirical testing showed estimating 𝐩𝐥Osuperscriptsubscript𝐩𝐥𝑂\prescript{O}{}{\mathbf{p}}_{\mathbf{l}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT bold_p start_POSTSUBSCRIPT bold_l end_POSTSUBSCRIPT and 𝐭IOsuperscriptsubscript𝐭𝐼𝑂\prescript{O}{I}{\mathbf{t}}start_FLOATSUPERSCRIPT italic_O end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT bold_t as part of the system’s state actually has negative effects on the accuracy of the system’s odometry estimate. Since this method is inevitably paired with a technique that corresponds current radar measurements with a prior map, long-term longitudinal tracking introduces error that creates tension with the prior map tracking, and is therefore not utilized.

3 Radar Mapping and Matching

Our radar-inertial odometry method provides high-frequency, smooth relative motion estimates but does not include a method to constrain drift or localize the robot in a global frame. In the following sections we describe our method for solving these problems. This consists of two components: a method for constructing maps from radar detections described in Section 3.1 and a method for estimating the robot’s pose by registering local radar maps to global radar maps described in Section 3.2.

Refer to caption
(a) Robot’s path through a neighborhood.
Refer to caption
(b) Radar map generated from that mission.
Figure 2: A neighborhood trajectory and single global map generated using the volumetric mapping method.

3.1 Radar Mapping

The principal challenge in creating useful maps from radar data is in distinguishing spurious detections from those that represent real features of the environment. Spurious detections are very common and they arise from many sources including multipath reflections, reflections from dynamic objects, reflections from the ground, ringing from signal processing, and others. This results in a very low repeatability between radar scans. This is demonstrated in Figure 3a, which shows the raw detections logged over the course of a short mission transformed to a common reference frame. The image shows very little identifiable structure in the aligned detections, which seem to be distributed evenly over the whole environment.

Refer to caption
(a) Raw radar detections from a short mission transformed into a global reference frame.
Refer to caption
(b) The detections in Figure 3a binned into a spatial histogram colored by the number of detections per voxel.
Figure 3: Example of binning raw radar detections into spatial histograms to elucidate the structure of the environment.

An important characteristic of radar detections is not apparent from Figure 3a, however. As the robot moves through the environment true detections tend to appear in the same places while spurious detections tend to be spatially sparse. This means the structure of the environment becomes more apparent if the detections are binned into a spatial histogram where each voxel stores the number of detections that occur within it. Figure 3b shows one such histogram colored by the number of detections per voxel, where detections are summed into the spatial histogram at the frequency of the sensor. One can see the structure of the environment is much more apparent in that image, with curved sidewalks and some primitive geometry of the neighborhood becoming clearer.

We take advantage of this property in our mapping method by first adding new radar detections to an occupancy grid map (OGM), which is a voxelized grid of 3D space parametrized by a probability from 0 to 1 which represents the probability that a specific voxel is occupied [Hornung et al., 2013]. The overall OGM then has a tunable set of parameters, namely the likelihood threshold to consider a certain voxel “occupied,” and the probability of occupancy P(n|zt)𝑃conditional𝑛subscript𝑧𝑡P(n|z_{t})italic_P ( italic_n | italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) where n𝑛nitalic_n is a given voxel and ztsubscript𝑧𝑡z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the current measurement, either a detection or non-detection. A probabilistic fusion is then applied using the posterior P(n|z1:t1)𝑃conditional𝑛subscript𝑧:1𝑡1P(n|z_{1:t-1})italic_P ( italic_n | italic_z start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ) and an environmental prior P(n)𝑃𝑛P(n)italic_P ( italic_n ) which is commonly assumed to be 0.5 (a uniform prior probability); our approach to this fusion is identical to the method of [Moravec and Elfes, 1985, Hornung et al., 2013]. We have found that through this method we can generate richly detailed radar maps that we have observed to remove map errors generally resulting from multipath reflections, ground reflections, and other radar-specific noise, while retaining the structure of the environment. An example OGM of a large area can be seen in Figure 2b. Figure 4 shows detail on a specific area of a radar OGM along with an aerial view of the same area to show the correspondence between features in the map and the real world.

Refer to caption
(a) A small subsection of radar OGM with observable features annotated.
Refer to caption
(b) An aerial view of the same area represented in Figure 4a with corresponding features annotated.
Figure 4: Detailed view of a radar-based OGM showing correspondences to an aerial view of the same area.

Estimating the robot’s pose involves registering a small local map of the robot’s immediate surroundings, referred to as a query map, to a much larger map covering the whole area in which the robot is operating, referred to as a global map. The requirements for these two types of maps are different and the process for creating them differs as well. Creating a global map requires very high metric accuracy and global consistency over an area spanning whatever may be reasonably loaded onto the robot. In many applications, memory and runtime limitations drive this requirement, which may mean the map covers an area of 1–10 km2. However, such a large map might contain millions of points, resulting in high memory overhead at runtime. Therefore, we further subdivide the neighborhood maps into chunks, which tessellate the global map and cover a smaller area, approximately 250m2. The area to be mapped is traversed by a special mapping robot as shown in Figure 2b. The radar sensor arrangement on the mapping robot is identical to that on the production robots, but the mapping robot’s state estimation may differ; it may for instance be equipped with a high-accuracy GPS+INS system whose outputs are batch-optimized offline, or it may have a full data stream over a mission from which a fully bundle-adjusted trajectory may be resolved. In either case, as in many autonomous vehicle operations where prior maps are used for localization, the trajectory of the mapping robot must be centimeter-accurate in a global frame. Over the course of a mapping robot mission the detections from the robot’s three radar sensors are also logged. After the mission the optimized trajectory is used to transform the logged detections to a common frame of reference. These transformed detections are then consumed into an OGM; the global map is represented as the point centers of the occupied cells of the OGM.

Query maps on the other hand must rely on only information available at hand to construct the OGMs, namely robot odometry, and therefore are subject to drift over time. Relative pose estimates from the RIO method described in 2 have sufficient accuracy for transforming radar detections to the frame of the query map over a limited scale. Since many automotive-grade radars have accuracy on the order of 10cm at 30m range, query maps are only collected over an axis-aligned bounding box of x,y𝑥𝑦absentx,y\initalic_x , italic_y ∈ 40m ×\times×40m centered at the robot’s current position. This clipping of data accomplishes two goals: first it limits the amount of odometric error that can infiltrate map generation, and second it bounds map size, since all voxels in the query map outside of this bounding box are cleared from memory. Note that this full node-traversal is required regardless, as the full data structure containing the map must be traversed in order to populate a point cloud for later consumption by map matching.

3.2 Radar Map Matching

The process for registering a query map to a global map proceeds in two stages during a mission. First, a “Full” comparison is performed, where coarse alignment is performed via a discretized brute-force search of possible alignments between the query and global maps. Then the best candidate alignment is refined using a smaller fine alignment, which also relies on brute-force searching, but terminates using the iterative closest point (ICP) method [Recherche et al., 1992]. Second, once the “Full” search has concluded successfully, subsequent matching for later query maps may be performed using a “Tracking” comparison, where only the fine alignment and ICP steps are invoked. Since Tracking relies on a smaller number of comparisons and over a much smaller spatial scale, it concludes much more rapidly than the Full comparison, which generally is not required unless Tracking results in a failure to find a match.

During the coarse alignment phase, a discretized space of possible translations and rotations between the query map and global map are evaluated via brute-force search. For each possible alignment the points in the query map 𝐩qsubscript𝐩𝑞\mathbf{p}_{q}bold_p start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT are associated to their nearest neighbor in the global map 𝐩gsubscript𝐩𝑔\mathbf{p}_{g}bold_p start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT to create the set of matched points 𝒫𝒫\mathcal{P}caligraphic_P. A score s𝑠sitalic_s for each alignment is calculated as the the number of points in the query map that are within d𝑑ditalic_d of the global map:

s=pq,pg𝒫{1,if pqpg<d0,otherwise 𝑠subscriptsubscriptp𝑞subscriptp𝑔𝒫cases1if normsubscriptp𝑞subscriptp𝑔𝑑0otherwise s=\sum_{\textbf{p}_{q},\textbf{p}_{g}\in\mathcal{P}}\begin{cases}1,&\text{if }% \|\textbf{p}_{q}-\textbf{p}_{g}\|<d\\ 0,&\text{otherwise }\end{cases}italic_s = ∑ start_POSTSUBSCRIPT p start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , p start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ∈ caligraphic_P end_POSTSUBSCRIPT { start_ROW start_CELL 1 , end_CELL start_CELL if ∥ p start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT - p start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ∥ < italic_d end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW (8)

where d𝑑ditalic_d is a threshold for a point to be an inlier.

Spatial pyramiding [Olson, 2015] is employed to make the brute-force search process more impervious to defective matches resulting from local minima of the cost surface induced by Eq. (8). The top K𝐾Kitalic_K scoring alignments from pyramid level n𝑛nitalic_n are refined in pyramid level n+1𝑛1n+1italic_n + 1 and the process is repeated until the highest-resolution pyramid level is reached. In our methodology we search within ±π/2plus-or-minus𝜋2\pm\pi/2± italic_π / 2 in yaw and ±5mplus-or-minus5𝑚\pm 5m± 5 italic_m in translation for the “Full” search, and in “Tracking” limit our search to ±π/4plus-or-minus𝜋4\pm\pi/4± italic_π / 4 and ±1mplus-or-minus1𝑚\pm 1m± 1 italic_m which we find to yield few losses of tracking, generally conditioned on the environment and the presence of map features to be matched.

The highest scoring alignment from the brute-force search process are then refined using ICP with the standard point-to-point metric. A score is calculated for the refined alignment using equation 8. If this score is equal to or above a threshold value, the alignment is considered to have been successful.

4 Experiments

We conducted a series of experiments evaluating the accuracy, reliability, and computational requirements of our odometry and relocalization methods. Both experiments used the same autonomous ground vehicle which was equipped with 3 automotive radars, a Lord Microstrain 3DM-GX5 IMU, and a high-accuracy Novatel GPS+INS system for groundtruth. The vehicle and sensor locations are detailed in Figure 5 and specifications of the radar sensors are detailed in Table 1. We acknowledge the existence of some excellent datasets for radar [Choi et al., 2023, Kramer et al., 2021, Barnes et al., 2020], each relying on their own radar sensor(s) to encourage the development of novel capabilities. In our case, our radar was a proprietary blend of hardware and signal processing techniques, limiting the capability of our testing on different datasets.

Refer to caption
Figure 5: Fields of view of the robot’s radar sensors. Above a side view of the robot shows the rear sensor’s vertical field of view and below a top-down view shows all three sensors’ horizontal fields of view.
max range 80m80m80\text{m}80 m
range resolution 0.3m0.3m0.3\text{m}0.3 m
range accuracy 0.03m0.03m0.03\text{m}0.03 m
Doppler min/max ±30m/splus-or-minus30m/s\pm 30\text{m/s}± 30 m/s
Doppler resolution 0.4m/s0.4m/s0.4\text{m/s}0.4 m/s
Doppler accuracy 0.04m/s0.04m/s0.04\text{m/s}0.04 m/s
azimuth FOV ±75plus-or-minussuperscript75\pm 75^{\circ}± 75 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
azimuth resolution 2superscript22^{\circ}2 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
azimuth accuracy 0.2superscript0.20.2^{\circ}0.2 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
elevation FOV ±15plus-or-minussuperscript15\pm 15^{\circ}± 15 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
elevation resolution 2.5superscript2.52.5^{\circ}2.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
elevation accuracy 0.25superscript0.250.25^{\circ}0.25 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT
Update Rate 20202020Hz
Frequency 7677767776-7776 - 77GHz
Table 1: Specifications of the radar sensors used in field experiments.

4.1 Radar-Inertial Odometry Evaluation

To evaluate the RIO method we use four missions carried out in residential neighborhoods. Collectively, the missions cover approximately 12km in length and 4 hours in duration. We report statistics on the error accumulated by our RIO method over non-overlapping 10m subsegments of each trajectory using the GPS+INS system for groundtruth. We report the median (p50), 95th percentile (p95), 99th percentile (p99), and maximum drift for both translation and heading for each mission in Table 2.

Translation Drift (m/m) Heading Drift (/m{}^{\circ}/\text{m}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / m)
Mission 1 p50 0.013 0.021
p95 0.027 0.084
p99 0.052 0.254
max 0.275 0.483
Mission 2 p50 0.017 0.037
p95 0.043 0.123
p99 0.064 0.194
max 0.127 0.224
Mission 3 p50 0.015 0.033
p95 0.035 0.105
p99 0.042 0.162
max 0.078 0.99
Mission 4 p50 0.017 0.034
p95 0.041 0.119
p99 0.050 0.176
max 0.072 0.248
Table 2: Translation and heading drift statistics for each mission.

With median translational drift as low as 0.0130.0130.0130.013m/m and heading drift as low as 0.021/msuperscript0.021m0.021^{\circ}/\text{m}0.021 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT / m, our RIO method is impressively accurate. It is very lightweight as well. When running on an Intel Core i7-10870H processor our RIO method’s mean update time is just 0.86ms. On an ARM Cortex-A57 (a common low-power embedded processor) its mean update time is still only 4ms. Our radar sensor’s update period is 50ms, so even on low-power embedded hardware our RIO method leaves plenty of headroom for other processes.

4.1.1 Ablation Study

To evaluate the importance of different components of the proposed RIO system we re-ran the system offline with different components de-activated. To evaluate the benefit of using multiple synchronized radar sensors we include an experiment in which only the front-left sensor is used. To evaluate the benefit of the heading constraint described in Section 2.5 we include an experiment in which the heading constraint is not used. Table 3 shows the results of these experiments.

Translation Drift (m/m) Heading Drift (/m{}^{\circ}/\text{m}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / m)
Three Radar p50 0.017 0.034
p95 0.041 0.119
p99 0.050 0.176
max 0.072 0.248
Single Radar p50 0.017 0.059
p95 0.070 0.195
p99 0.141 0.309
max 0.314 0.493
No Heading Constraint p50 0.049 0.526
p95 0.257 2.258
p99 0.322 4.714
max 0.392 15.59
Table 3: Translation and heading drift statistics for mission 4 with different components of the odometry system de-activated.

We can see from Table 3 that using only one radar still performs reasonably well. The median and even the 95th percentile drift statistics for the single radar version are fairly close to the three radar version. The benefit to using multiple radar sensors seems to be in the tails of the error distribution because the 99th percentile and maximum drift statistics for the single radar version are far higher than the three radar version. The version with the heading constraint de-activated does not perform nearly as well, however. The translation drift statistics for the no-heading-constraint version are 3 to 5 times those for the three radar version and the heading drift statistics are 16 to 60 times higher. Clearly the heading constraint is crucial for accurate odometry estimation.

4.2 Radar Map Matching Evaluation

To evaluate our radar mapping and matching method we created a global map of a neighborhood shown in Figure 2b using our mapping robot. Then the robot traverses the same path twice continuously creating local query maps and attempting to register them to the global map. The mapping robot’s Novatel GPS+INS system is used both to create the global map, and to evaluate the accuracy of alignments between the query and global maps. The global map covers a square area roughly 220m on a side, the query maps are roughly 20m on a side, and both the global and query maps use a 0.2m resolution in their OGMs. Within this global map region, the robot traverses a trajectory of 1.6km.

We evaluate the quality of our map matching method by comparing the transforms calculated from successful map registrations to groundtruth transforms from the Novatel GPS+INS system. We also note the availability of map matches as the number of successful matches over the total number of attempted matches. We compare the results for radar map matching to a our own lidar map matching method, which uses LOAM-generated submaps [Zhang and Singh, 2014] as the query map (rather than an OGM) but otherwise uses the same coarse-to-fine Full and Tracking comparison methods as our radar map matching method.

Table 4 shows the errors in translation estimates for radar and lidar map matching for both missions. One can see the median errors for radar and lidar map matching are similar, although lidar map matching gives lower median error. Radar map matching clearly outstrips lidar at the tails of the error distribution though. Radar’s 90th and 99th percentile errors are far lower than those for lidar.

Radar Translation Error (m) Lidar Translation Error (m)
x𝑥xitalic_x y𝑦yitalic_y x𝑥xitalic_x y𝑦yitalic_y
Mission 1 p50 0.1757 0.199 0.1791 0.2111
p90 0.5056 0.4903 1.8316 1.3199
p99 0.7995 0.9167 8.324 4.9173
max 1.6315 2.9356 10.5059 6.8023
Mission 2 p50 0.2205 0.2161 0.1675 0.1932
p90 0.6081 0.54 1.3718 1.932
p99 0.8813 0.7492 4.6913 6.5304
max 1.0016 0.8195 5.808 7.5731
Table 4: Translation error for radar and lidar map matching techniques.

The advantage of radar map matching in accuracy is somewhat compensated by its lower availability, however. Out of 1715 attempted matches, our radar map matching method succeeds 1389 times for a total availability of 0.81. Lidar, on the other hand matches 1652 out of 1826 for a total availability of 0.90. This is demonstrated in Figure 6, which shows the poses estimated by lidar map matching in red, radar map matching in black, and a groundtruth pose for each radar pose in green. One can see from this image that lidar map matching successfully finds matches more often, but its pose estimates can also be less accurate.

Refer to caption
Figure 6: Comparison of the poses estimated by our radar map matching method with lidar map matching with radar pose estimates in black, lidar pose estimates in red, and groundtruth poses for each estimated radar pose in green.

5 Conclusion

In this paper we presented novel radar-based methods for odometry and global localization. For our radar-inertial odometry (RIO) method we combine an established Doppler velocity constraint with a novel method for dynamic outlier rejection and a unique heading constraint based on radar landmarks. We demonstrate the combination of these elements results in unprecedented relative pose estimation accuracy for a method using automotive sensors. Our RIO method’s accuracy is comparable to that of some methods based on high-resolution NavTech sensors. For example, our RIO method achieves median translation drift of 1.7% in the worst case while Under The Radar achieves 2.1% [Barnes and Posner, 2020] and Masking By Moving achieves 1.6% [Barnes et al., 2019] according to their evaluation. We note however that a direct comparison on their data is not possible given the unique experimental platform for which our method was developed. Further, our proposed RIO method is shown to be impressively lightweight; it can be used on low-power embedded hardware with ample headroom for other processes.

Our radar-based mapping method is able to aggregate radar detections into a representation of the robot’s environment that is repeatable and highly descriptive. This is demonstrated by our method for registering radar maps, which is able to achieve relative pose estimation accuracy similar to lidar-based methods. Unlike lidar, radar is sensitive to the properties of materials in addition to their geometry. This means radar maps can represent changes in material properties where there is no corresponding change in geometry as in the cases of access hole covers, storm drain grates, etc. This makes radar map registration less susceptible to geometric ambiguity, as demonstrated by the much lower 99th percentile and maximum pose estimation errors of radar map matching relative to lidar. Our primary finding is that by balancing the IMU, Doppler, and landmark factors in RIO, as well as utilizing an incrementally-built OGM for map matching, we are able to achieve localization performance acceptable for suburban neighborhood navigation.

Together, our RIO and radar map matching methods provide a highly robust localization system for autonomous ground vehicles (AGVs) using only radar and inertial measurements. Radar map matching is able to globally localize the vehicle with respect to a pre-built map, and RIO can provide smooth and continuous pose updates between map matches. These developments could allow for the operation of AGVs in far more diverse and challenging environments from complete darkness to rain, snow, smoke and fog.

Acknowledgments

This work was done for and supported by Amazon’s Scout project.

References

  • Barfoot et al., 2011 Barfoot, T., R. Forbes, J., and Furgale, P. (2011). Pose estimation using linearized rotations and quaternion algebra. Acta Astronautica, 68:101–112.
  • Barnes et al., 2020 Barnes, D., Gadd, M., Murcutt, P., Newman, P., and Posner, I. (2020). The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 6433–6438. IEEE.
  • Barnes and Posner, 2020 Barnes, D. and Posner, I. (2020). Under the radar: Learning to predict robust keypoints for odometry estimation and metric localisation in radar. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris.
  • Barnes et al., 2019 Barnes, D., Weston, R., and Posner, I. (2019). Masking by moving: Learning distraction-free radar odometry from pose information. CoRR, abs/1909.03752.
  • Cen and Newman, 2018 Cen, S. H. and Newman, P. (2018). Precise Ego-Motion Estimation with Millimeter-Wave Radar Under Diverse and Challenging Conditions. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD. IEEE.
  • Cen and Newman, 2019 Cen, S. H. and Newman, P. (2019). Radar-only ego-motion estimation in difficult settings via graph matching. arXiv:1904.11476 [cs].
  • Chen et al., 2023 Chen, H., Liu, Y., and Cheng, Y. (2023). Drio: Robust radar-inertial odometry in dynamic environments. IEEE Robotics and Automation Letters, 8(9):5918–5925.
  • Choi et al., 2023 Choi, M., Yang, S., Han, S., Lee, Y., Lee, M., Choi, K. H., and Kim, K.-S. (2023). Msc-rad4r: Ros-based automotive dataset with 4d radar. IEEE Robotics and Automation Letters.
  • Doer and Trommer, 2021 Doer, C. and Trommer, G. (2021). x-rio: Radar inertial odometry with multiple radar sensors and yaw aiding. Gyroscopy and Navigation, 12:329–339.
  • Doer and Trommer, 2020a Doer, C. and Trommer, G. F. (2020a). An ekf based approach to radar inertial odometry. In 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pages 152–159.
  • Doer and Trommer, 2020b Doer, C. and Trommer, G. F. (2020b). Radar inertial odometry with online calibration. In 2020 European Navigation Conference (ENC), pages 1–10.
  • Fischler and Bolles, 1981 Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24:381–395.
  • Forster et al., 2015 Forster, C., Carlone, L., Dellaert, F., and Scaramuzza, D. (2015). Imu preintegration on manifold for efficient visual-inertial maximum-a-posteriori estimation. In Robotics: Science and Systems.
  • Harlow et al., 2023 Harlow, K., Jang, H., Barfoot, T. D., Kim, A., and Heckman, C. (2023). A new wave in robotics: Survey on recent mmwave radar applications in robotics.
  • Hong et al., 2020 Hong, Z., Petillot, Y., and Wang, S. (2020). Radarslam: Radar based large-scale slam in all weathers. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5164–5170.
  • Hornung et al., 2013 Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C., and Burgard, W. (2013). Octomap: An efficient probabilistic 3d mapping framework based on octrees. Autonomous robots, 34(3):189–206.
  • Kaess et al., 2011 Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J., and Dellaert, F. (2011). isam2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering. In 2011 IEEE International Conference on Robotics and Automation, pages 3281–3288.
  • Kim et al., 2021 Kim, G., Choi, S., and Kim, A. (2021). Scan context++: Structural place recognition robust to rotation and lateral variations in urban environments. IEEE Transactions on Robotics.
  • Kim et al., 2020 Kim, G., Park, Y. S., Cho, Y., Jeong, J., and Kim, A. (2020). Mulran: Multimodal range dataset for urban place recognition. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 6246–6253. IEEE.
  • Kramer et al., 2021 Kramer, A., Harlow, K., Williams, C., and Heckman, C. (2021). Coloradar: The direct 3d millimeter wave radar dataset.
  • Kramer and Heckman, 2021 Kramer, A. and Heckman, C. (2021). Radar-inertial state estimation and obstacle detection for micro-aerial vehicles in dense fog. In Siciliano, B., Laschi, C., and Khatib, O., editors, Experimental Robotics, pages 3–16, Cham. Springer International Publishing.
  • Kramer et al., 2020 Kramer, A., Stahoviak, C., Santamaria-Navarro, A., Agha-mohammadi, A.-a., and Heckman, C. (2020). Radar-inertial ego-velocity estimation for visually degraded environments. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 5739–5746.
  • Kung et al., 2021 Kung, P.-C., Wang, C.-C., and Lin, W.-C. (2021). A normal distribution transform-based radar odometry designed for scanning and automotive radars. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 14417–14423. IEEE.
  • Lu et al., 2020 Lu, C. X., Saputra, M. R. U., Zhao, P., Almalioglu, Y., De Gusmao, P. P., Chen, C., Sun, K., Trigoni, N., and Markham, A. (2020). milliego: single-chip mmwave radar aided egomotion estimation via deep sensor fusion. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems, pages 109–122.
  • Minguez et al., 2006 Minguez, J., Montesano, L., and Lamiraux, F. (2006). Metric-based iterative closest point scan matching for sensor displacement estimation. IEEE Transactions on Robotics, 22(5):1047–1054.
  • Moravec and Elfes, 1985 Moravec, H. and Elfes, A. (1985). High resolution maps from wide angle sonar. In Proceedings. 1985 IEEE international conference on robotics and automation, volume 2, pages 116–121. IEEE.
  • Munkres, 1957 Munkres, J. R. (1957). Algorithms for the assignment and transportation problems. Journal of The Society for Industrial and Applied Mathematics, 10:196–210.
  • Olson, 2015 Olson, E. (2015). M3rsm: Many-to-many multi-resolution scan matching. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 5815–5821. IEEE.
  • Qi et al., 2016 Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2016). Pointnet: Deep learning on point sets for 3d classification and segmentation. CoRR, abs/1612.00593.
  • Rapp et al., 2016 Rapp, M., Dietmayer, K., Hahn, M., Schuster, F., Lombacher, J., and Dickmann, J. (2016). Fscd and basd: Robust landmark detection and description on radar-based grids. In 2016 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pages 1–4.
  • Recherche et al., 1992 Recherche, E., Automatique, E., Antipolis, S., and Zhang, Z. (1992). Iterative point matching for registration of free-form curves. Int. J. Comput. Vision, 13.
  • Scharf and Demeure, 1991 Scharf, L. L. and Demeure, C. (1991). Statistical signal processing: detection, estimation, and time series analysis. Addison-Wesley Publishing Company.
  • Schuster et al., 2016a Schuster, F., Keller, C. G., Rapp, M., Haueis, M., and Curio, C. (2016a). Landmark based radar slam using graph optimization. In 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pages 2559–2564.
  • Schuster et al., 2016b Schuster, F., Wörner, M., Keller, C. G., Haueis, M., and Curio, C. (2016b). Robust localization based on radar signal clustering. pages 839–844.
  • Zhang and Singh, 2014 Zhang, J. and Singh, S. (2014). Loam: Lidar odometry and mapping in real-time. In Robotics: Science and Systems, volume 2, pages 1–9. Berkeley, CA.
  • Zhang et al., 2023 Zhang, J., Zhuge, H., Wu, Z., Peng, G., Wen, M., Liu, Y., and Wang, D. (2023). 4dradarslam: A 4d imaging radar slam system for large-scale environments based on pose graph optimization. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 8333–8340.