Keywords

1 Introduction

Image registration is the process of aligning two or more images of the same scene taken by different sensors, at different times, or from different viewpoints, and finding their corresponding geometric relationships [1]. As a coherent imaging system, Synthetic Aperture Radar (SAR) can produce high-resolution images and work under all-time and all-weather conditions [2]. SAR image registration plays an important role in integrated application of SAR data and comprehensive reflection of the geographic information. It is an indispensable procedure for many applications, such as information fusion, change detection and three-dimensional reconstruction.

Image registration methods are usually divided into two categories: intensity-based methods and feature-based methods [3]. Intensity-based methods detect no features and the alignment between two images is determined by similarity between pixel intensities. Mutual information and cross correlation are mainly used similarity measures. Due to monotonous textures and complex geometric deformations in SAR images, the intensity-based methods usually suffer from a high degree of computational complexity of global optimization and often cause local extrema when estimating the correspondence between the registered images. Compared with intensity-based methods, feature-based methods extract and match significant features from two images and the correlation between those features is used to determine the alignment. Generally, features extracted include point, edge, and the centroid of a specific area. Feature-based methods are computationally efficient and have strong robustness to radiation difference and noise. Hence, feature-based methods are the mainstream methods for SAR image registration [4]. Among feature-based methods, scale-invariant feature transform (SIFT) [5] is a representative algorithm. It has been widely used in image registration for its invariance to image rotation and scaling and partial invariance to changes in camera viewpoint and illumination [6]. Keypoints detected by SIFT are usually stable and sufficient. Some SIFT-based improvements have been proposed for SAR image registration. Chen et al. [2] proposed a new definition of gradient computation with ROEWA operator and reducing the dimension of feature descriptors, which improved the computational efficiency. Schwind et al. [7] proposed SIFT-OCT, in which the performance of feature detectors is analyzed to improve the robustness of the algorithm. Improved SIFT algorithms based on anisotropic scale space [6, 8] effectively preserve fine details and suppress the speckle noise in SAR images, which improves the correct match rate. All of these improvements are mainly to improve the computational efficiency or reduce the adverse effect of speckle noise. However, the number of correct matches is not sufficient to ensure registration accuracy due to significant nonlinear intensity difference and geometric distortion between SAR images mainly caused by different imaging conditions, such as sensor postures, imaging times, bands, polarizations [9]. Since SIFT does not have complete invariance to intensity and affine changes, performance degradation may happen when SIFT is directly applied to SAR images with poor intensity correlation or large geometric distortion. To overcome this problem, Li et al. [10] proposed R-SIFT that refined the gradient orientation of each pixel, and assigned more main orientations to each key point. In the step of feature matching, Wang et al. [11] analyzed the property of dominant orientation consistency and adopted it to improve the matching stability.

In this letter, a novel SAR image registration method based on improved SIFT is proposed. Sobel operator is introduced to redefined the gradient computation, and gradient location orientation hologram (GLOH) is improved to form a new feature descriptor, which overcomes the nonlinear intensity difference between SAR images. Improved bivariate histogram analysis is utilized to eliminate false correspondences from the initial matching results. In addition, a reliable iterative selection method based on affine transformation error analysis of adjacent features is proposed to effectively increase the correct matches. Experiments on various SAR images show the applicability of our method to find sufficient feature matches and obtain accurate registration results.

2 Description of the Proposed Method

Traditional SIFT algorithm consists of three major stages. Firstly, Gaussian scale space is constructed by convolving the original image with Gaussian kernel at different scales, and a series of difference of Gaussian (DoG) images are achieved by subtracting adjacent Gaussian images. Extrema of the DoG images are detected as the candidate features. Secondly, dominant orientation of each keypoint is calculated for each keypoint, and a 128-element feature descriptor is constructed based on the gradients in the local image patches aligned by its dominant orientation. Finally, feature points are matched using the nearest neighbor distance ratio (NNDR). More details about SIFT can be found in [5].

Traditional SIFT algorithm usually fails to provide favorable results when directly used to SAR images. Nonlinear intensity difference between the reference image and the same areas in the sensed image results in large gradient differences between correspondences that are expected to be correctly matched, which reduces the robustness and distinctiveness of feature descriptor that is generated by gradient magnitudes and gradient orientations. Meanwhile, local geometric deformation could cause false correspondences that affect the accuracy of transformation parameters.

2.1 New Gradient Computation

The extraction of SIFT feature descriptor depends on gradient histograms around the keypoint location. The premise of correct matching is that descriptors have strong distinctiveness, which means eigenvectors corresponding to different image areas are obviously different while those of the same areas are similar. Significant nonlinear intensity difference leads to different gradient magnitudes and gradient orientations in the same image areas, thus the rotation invariance can’t be ensured and features can’t be accurately matched according to minimum Euclidean distance.

In traditional SIFT algorithm, the gradient magnitudes and gradient orientations of keypoints are computed with finite difference operator

$$ m(x,y) = \sqrt {\begin{array}{*{20}l} {(L(x + 1,y) - L(x - 1,y))^{2} + } \hfill \\ {(L(x,y + 1) - L(x,y - 1))^{2} } \hfill \\ \end{array} } $$
(1)
$$ \theta (x,y) = \arctan \left( {\frac{(L(x,y + 1) - L(x,y - 1)}{L(x + 1,y) - L(x - 1,y)}} \right) $$
(2)

Where \( m(x,y) \) and \( \theta (x,y) \) represent the gradient magnitude and gradient orientation of the keypoint respectively. Finite difference is an efficient operator with good convergence, but fails to provide robustness to nonlinear intensity difference between SAR images. In addition, finite difference could easily be disturbed by speckle noise, which will cause lots of false features to be detected.

Sobel [12] is a gradient detection operator commonly used in edge detection. The approximation of the gradient is obtained by convolving the image with a spatial convolution template. Sobel could slightly be affected by intensity difference and it can smooth speckle noise to a certain extent. So the proposed method adopts Sobel for a computation of the gradient magnitudes and gradient orientations of the keypoints. And the newly defined gradients will be used in the process of orientation assignment and descriptor extraction.

Firstly, the gradient magnitude of the Gaussian scale-space image is calculate by Sobel as

$$ R_{\sigma }^{1} = \sqrt {(R_{x,\sigma }^{1} )^{2} + (R_{y,\sigma }^{1} )^{2} } $$
(3)

Where \( \sigma \) is the scale parameter of Gaussian scale space. \( R_{x,\sigma }^{1} \) and \( R_{y,\sigma }^{1} \) denote the horizontal and vertical derivatives of the Gaussian scale-space image, respectively.

Then, the newly defined gradient magnitude and gradient orientation are defined as

$$ R_{\sigma }^{2} = \sqrt {(R_{x,\sigma }^{2} )^{2} + (R_{y,\sigma }^{2} )^{2} } $$
(4)
$$ G_{\sigma }^{2} = \arctan \left( {{{R_{y,\sigma }^{2} } \mathord{\left/ {\vphantom {{R_{y,\sigma }^{2} } {R_{x,\sigma }^{2} }}} \right. \kern-0pt} {R_{x,\sigma }^{2} }}} \right) $$
(5)

Where \( R_{x,\sigma }^{2} \) and \( R_{y,\sigma }^{2} \) denote the horizontal and vertical derivatives of gradient magnitude image \( R_{\sigma }^{1} \) in Gaussian scale space, respectively. All the derivatives denote edge detection along horizontal and vertical directions, which can be calculated by convoluting original image A with horizontal template \( \varvec{B}_{\mathbf{x}} \) and vertical template \( \varvec{B}_{{\mathbf{y}}} \)

$$ {\varvec{R}}_{{\mathbf{x}}} = {\varvec{B}}_{{\mathbf{x}}} * {\varvec{A}} = \left[ {\begin{array}{*{20}c} { - 1} & 0 & { + 1} \\ { - 2} & 0 & { + 2} \\ { - 1} & 0 & { + 1} \\ \end{array} } \right] * {\varvec{A}},\quad {\varvec{R}}_{{\mathbf{y}}} = {\varvec{B}}_{{\mathbf{y}}} * {\varvec{A}} = \left[ {\begin{array}{*{20}c} { - 1} & { - 2} & { - 1} \\ 0 & 0 & 0 \\ { + 1} & { + 2} & { + 1} \\ \end{array} } \right] * {\varvec{A}} $$
(6)

2.2 Descriptor Extraction with Improved GLOH

In traditional SIFT algorithm, the invariance to image rotation and scaling are achieved by main orientation rotation and descriptor normalization. However, the affine invariance of the descriptor is not completely achieved, which will cause many unstable keypoints and weaken the adaptability to local deformation in SAR images. Gradient location orientation hologram (GLOH) feature [13] is an extension of the SIFT descriptor designed to improve its distinctiveness and robustness. The main idea is to replace the grid pattern of SIFT descriptor with affine-like log-polar concentric circles, and then reduce the dimension of the newly formed descriptor with principal component analysis (PCA) algorithm. GLOH has been proved to have nice affine invariance [13], so we improve it for a more practical version and adopted it to extract a new robust feature descriptor. Main improvements of GLOH and steps for new descriptor extraction are as follows.

  1. (1)

    Radius adaptively set according to image scale

In standard GLOH feature, the radiuses of three concentric circles are empirically set to 6, 11 and 15. Since the size of the image continuously changes with the increase of scale parameter \( \sigma \), it is essential to adaptively change the circular neighborhood to an appropriate size. Inspired by [14], we set the maximum radius \( R_{1} \) of the bins to 12 \( \sigma \), while the second largest radius and that of the center circle are set to 0.73 \( R_{1} \) and 0.25 \( R_{1} \), respectively.

  1. (2)

    Dimensionality reduction with gradient orientations re-quantified

Reducing the dimension of feature descriptor with PCA requires lots of sample training in advance, and the process of dimensionality reduction is time consuming. So we quantize the gradient orientations in 8 bins instead of 16 in all 17 location bins, and the dimension of the descriptor eigenvector is reduced from 272 to 136. Facts have proved that the efficiency of the algorithm is greatly improved with the robustness and effectiveness of the mew descriptor well maintained.

Note that the gradients computed by (4) and (5) are used in the process of orientation assignment and descriptor extraction. Figures 1 and 2 denote SIFT feature descriptor and our improved GLOH descriptor with 8 gradient orientations in each location bin. The difference between these two descriptors can be intuitively observed and compared.

Fig. 1.
figure 1

SIFT descriptor extraction

Fig. 2.
figure 2

Improved GLOH descriptor

2.3 Removal of False Matches Using Optimized Bivariate Histogram

When there are repeated patterns in the image, many keypoints in the reference image are matched to the same one in the sensed image using SIFT matching strategy (distance ratio). Therefore, we use dual-matching strategy [6] to find initial correspondences, which could improve the rate of correct matching. Fast sample consensus (FSC) algorithm [15] is adopted to refine the matching results and calculate the values of transformation model parameters. FSC is an optimization for the classical random sample consensus (RANSAC) algorithm [16], which could obtain more correct matches with a less number of iterations. But like RANSAC, if the number of outlier is higher in the input matching set, then the FSC algorithm fails to provide sufficient correct matches within a specific number of iterations.

Even after the identification of matching candidates using dual-matching strategy, SAR images could still produce unreliable tie points which will lead to many false matches. Due to the geometric deformation and randomly distributed speckle noise in SAR images, most of the false matches are randomly distributed, while the “correct set” of matching keypoints corresponds to a denser region of the 2-D representation of the horizontal (\( \Delta x \)) and vertical (\( \Delta y \)) distances between the matches. Based on this principle, we have improved the bivariate histogram [17] to further exclude false matches before using them as an input set for FSC. The bivariate histogram is formed based on the scatter plot of a set of horizontal (\( \Delta x \)) and vertical (\( \Delta y \)) distances between the matches. The number of bins \( B \) in the bivariate histogram is computed as \( 1 + 3.322\log_{10} P \), where \( P \) is the number of matched features. The correct matches could produce a dominant peak in a particular bin of the histogram, and features corresponding to that bin are the refined matches. False matches are iteratively excluded when matching candidates belong to a histogram bin with an absolute frequency less than \( 10\% \) of the maximum absolute frequency.

Note that in [17], the iterative procedure stops when the registration accuracy measure \( RMSE \) is blow a threshold \( \varepsilon \) or the maximum number of iterations \( Q \) is achieved. However, we find that the procedure usually suffers from a high degree of computational complexity, which could reduce the overall efficiency of feature matching. Fortunately, experimental results show that a maximum number of eight iterations (\( Q^{{\prime }} \le 8 \)) are sufficient for the procedure to converge to a subset of correct matches. So the number of iterations is set as the termination condition of the procedure in our improved bivariate histogram analysis.

2.4 Increasing the Number of Correct Matches

Affine transformation is selected as a transformation model in our method. Let the four parameters of the affine transformation be represented in a vector form \( \psi = [s,\theta ,t_{x} ,t_{y} ]^{T} \), where s is the scale; \( \theta \) is the rotation; \( t_{x} \) and \( t_{y} \) represent the translation parameters, respectively. The values of the transformation parameters \( \psi \) can be preliminarily calculated by FSC after removal of false matches with optimized bivariate histogram. A lot of methods have been proposed to reduce the negative effect of false matches on matching accuracy, but few methods aim at increasing correct matches. In fact, more correct matching points will obtain more accurate transformation parameters, and further a more accurate registration result.

In traditional SIFT algorithm, only the nearest neighbor of one feature point is considered as the best matching candidate of it, but complex geometric differences between SAR images may cause a slight displacing of some feature points from their true positions. In this case, many correct matches may not be found since their correct correspondence points are not their nearest neighbors. So a correct correspondence cannot be found by considering only the first nearest neighbor.

In this letter, we propose a novel method, called transformation error analysis of adjacent features, to increase the number of correct matches. We consider that the correct correspondence of one keypoint may be one of its first four nearest neighbors. These nearest neighbors are iteratively judged with the affine transformation parameters \( \psi \) obtained above in the order of the distance of these two points, from the nearest to the farthest. Main steps of the process are as follows.

  • Step 1: Let \( \left\{ {p^{1} , \cdots p^{m} } \right\} \) and \( \left\{ {\hat{p}^{1} , \ldots ,\hat{p}^{n} } \right\} \) represent the feature points detected from the reference image and the sensed image, respectively. \( p^{i} \) represents an element from set \( \left\{ {p^{1} , \cdots p^{m} } \right\} \). Find four nearest neighbors \( \left\{ {\hat{p}_{{^{i} }}^{1} ,\hat{p}_{{^{i} }}^{2} ,\hat{p}_{{^{i} }}^{3} ,\hat{p}_{{^{i} }}^{4} } \right\} \) of \( p^{i} \) form \( \left\{ {\hat{p}^{1} , \ldots ,\hat{p}^{n} } \right\} \). These neighbors are arrayed on the distance between the SIFT features of \( \left\{ {\hat{p}_{{^{i} }}^{1} ,\hat{p}_{{^{i} }}^{2} ,\hat{p}_{{^{i} }}^{3} ,\hat{p}_{{^{i} }}^{4} } \right\} \) and \( p^{i} \), \( \hat{p}_{{^{i} }}^{1} \) is the nearest neighbor of \( p^{i} \), and \( \hat{p}_{{^{i} }}^{4} \) is the farthest one.

  • Step 2: Calculate the transformation errors \( \varepsilon_{1} ,\varepsilon_{2} ,\varepsilon_{3} \) and \( \varepsilon_{4} \) of \( \left\{ {p^{i} ,\hat{p}_{{^{i} }}^{1} } \right\} \), \( \left\{ {p^{i} ,\hat{p}_{{^{i} }}^{2} } \right\} \) \( \left\{ {p^{i} ,\hat{p}_{{^{i} }}^{3} } \right\} \) and \( \left\{ {p^{i} ,\hat{p}_{{^{i} }}^{4} } \right\} \), respectively, using the values of transformation parameters \( \psi \). The transformation error of \( \left\{ {p^{i} ,\hat{p}_{{^{i} }}^{{}} } \right\} \) is denoted as \( \varepsilon \left( {\left\{ {p^{i} ,\hat{p}_{{^{i} }}^{{}} } \right\},\psi } \right) = \left\| {\left. {(\hat{x}_{i} ,\hat{y}_{i} ) - T((x_{i} ,y_{i} ),\psi )} \right\|} \right. \).

  • Step 3: In order of \( \varepsilon_{1} \) to \( \varepsilon_{4} \), judge one after another if one of them is less than the error threshold \( \varepsilon \) (1 pixel). If one is less than \( \varepsilon \), consider its corresponding feature and \( p^{i} \) as a correct match, and the process of judgment stops, otherwise the next transformation error will be checked. If all of these four transformation errors (\( \varepsilon_{1} ,\varepsilon_{2} ,\varepsilon_{3} \) and \( \varepsilon_{4} \)) fail to satisfy the 1 pixel transformation error criterion, it is decided that the feature \( p^{i} \) has no correct match.

  • Step 4: Check all elements in set \( \left\{ {p^{1} , \cdots p^{m} } \right\} \) if they have a corresponding correct match in \( \left\{ {\hat{p}^{1} , \ldots ,\hat{p}^{n} } \right\} \) according to the procedure described from Step 1 to Step 3, and the final feature matches could be confirmed.

Note that the final feature matches obtained above will be calculated for new affine transformation parameters, which may greatly improve the registration accuracy.

3 Experimental Results and Analysis

To verify the adaptability to intensity difference and the overall registration performance of the proposed method, two pairs of SAR images with significant intensity difference are tested. These image pairs are shown in Figs. 3 and 4 respectively. The first pair is two multi-band images from the same region. One image with a size of \( 500 \times 500 \) form C-band taken by Radarsat-2 is selected as the reference image, and the other one with a size of \( 375 \times 375 \) from X-band taken by TerraSAR is selected as the sensed image. There are obvious scale difference and geometric distortion between these two images. The second pair is two \( 5 1 2\times 512 \) multi-polarization images with the reference image obtained from the HV mode, and the sensed image of the same scene obtained from the VV mode. To increase the difficulty of the test, the sensed image is simulated with a rotation of \( 30^{ \circ } \).

Fig. 3.
figure 3

The first image pair

Fig. 4.
figure 4

The second image pair

3.1 Intensity Difference Adaptability Verification

Much in the spirit of a Hough-like voting scheme, a procedure for registration of remotely sensed images named Mode Seeking (MS) was proposed [18]. The basic idea of MS is that the histograms of the scale ratios, main orientation difference, horizontal shifts, and vertical shifts of the correct matches should exhibit a single mode (i.e. the proportion of one column in the histogram is much larger than those of other columns). In this section, the standard SIFT algorithm and our method are used to extract feature points from the first image pair, respectively. Then features are coarsely matched with distance ratio \( (d_{ratio} = 0.8) \), and the histograms of the scale ratios \( S_{ref} /S_{sensed} \), main orientation difference \( \Delta \theta \), horizontal shifts \( \Delta X \), and vertical shifts \( \Delta Y \) of the matched feature points are calculated (Figs. 5 and 6). The adaptability to intensity difference of the improved gradient computation and new descriptor extraction is verified based on modes of the histograms.

Fig. 5.
figure 5

Histograms obtained by SIFT

Fig. 6.
figure 6

Histograms obtained by proposed method

It can be seen that the four histograms obtained by the standard SIFT can’t exhibit a single mode, which, due to the intensity difference between SAR images. Such intensity difference will result in different main orientation difference of correspondences that are expected to be correctly matched. The rotation invariance of feature descriptor can’t be ensured and in this case, features could hardly be correctly matched according to the minimum Euclidean distance between the descriptors that are formed by gradient orientations and gradient magnitudes.

In contrast, all histograms obtained by our proposed method exhibit a single mode. The results show that the improved gradient computation and descriptor extraction enhance the robustness of the algorithm to nonlinear intensity difference, and improve the rate of correct matching.

3.2 SAR Image Registration Performance Verification

To evaluate the registration performance of the proposed method, we experimentally validate it on two SAR image pairs (Figs. 3 and 4). Registration comparisons between the proposed method and the traditional SIFT and SAR-SIFT [14] are implemented to demonstrate the superiority of our method in registration performance. SAR-SIFT algorithm is specifically designed for speckle noise in SAR images, which can suppress the influence of noise on feature detection and descriptor extraction. In our experiments, the threshold for dual-matching is set to 0.8.

The number of correct matches, the correct match rate \( (CMR) \) and the root mean squared error \( (RMSE) \) are adopted to quantitatively evaluate the registration performances. Figures 7 and 8 show feature matches of the three methods and registration results of our proposed method, and the quantitative evaluation results for each method are listed in Table 1.

Fig. 7.
figure 7

Registration result of image pair 1

Fig. 8.
figure 8

Registration result of image pair 2

Table 1. Quantitative comparison of the proposed method with SIFT and SAR-SIFT.

It can be seen that, of the three methods used for comparison, our proposed method gives the best performance. Compared with the standard SIFT, our approach not only obtains more number of correct matches but also has a much higher matching accuracy. Due to the use of Sobel operator and the improved GLOH, the robustness to intensity difference and the affine invariance of the new feature descriptor outperform that of the standard SIFT. The consistency of the scale, orientation and location of the correspondences is enhanced, which attributes to obtaining more feature matches. Meanwhile, bivariate histogram is improved for a more practical use to eliminate the mismatches, and more features are checked for an accurate correspondence, which improves the correct match rate and the final registration precision.

Although SAR-SIFT obtains more correct matches than the standard SIFT in both experiments and even more than those of our approach in image pair 2, it would lead to relatively little \( CMR \) and large \( RMSE \), which is an unsatisfactory result. The main reason is that SAR-SIFT uses ROEWA operator to calculate the gradient orientation of the keypiont, which is mainly applied to images with more speckle noise. That is to say, it would have limited effect on images with nonlinear intensity difference.

In addition, our approach inherits the superiority of the standard SIFT. An accurate registration result can still be achieved under conditions of scale and rotation changes.

4 Conclusion

In this letter, a novel SAR registration method based on improved SIFT is proposed. Sobel operator is used for gradient computation of the keypoints and GLOH is improved for feature descriptor extraction, which makes the algorithm more robust to complex nonlinear intensity difference between the images. Bivariate histogram is improved for eliminating mismatches after dual-matching, improving the robustness of the algorithm to geometric changes. In addition, a reliable method based on transformation error analysis of adjacent features is proposed to increase the number of correct matches. Experimental results on SAR images show that our method reveals better performance than the state-of-the-art methods in terms of registration accuracy in some cases.