### Abstract

We present an accurate, robust and fast method for registration of 3D scans.Our motion estimation optimizes a robust cost function on the intrinsicrepresentation of rigid motions, i.e., the Special Euclidean group$\mathbb{SE}(3)$. We exploit the geometric properties of Lie groups as well asthe robustness afforded by an iteratively reweighted least squaresoptimization. We also generalize our approach to a joint multiview method thatsimultaneously solves for the registration of a set of scans. We demonstratethe efficacy of our approach by thorough experimental validation. Our approachsignificantly outperforms the state-of-the-art robust 3D registration methodbased on a line process in terms of both speed and accuracy. We also show thatthis line process method is a special case of our principled geometricsolution. Finally, we also present scenarios where global registration based onfeature correspondences fails but multiview ICP based on our robust motionestimation is successful.

### Quick Read (beta)

# Efficient and Robust Registration on the 3D Special Euclidean Group

###### Abstract

We present an accurate, robust and fast method for registration of 3D scans. Our motion estimation optimizes a robust cost function on the intrinsic representation of rigid motions, i.e., the Special Euclidean group $\mathbb{S}\mathbb{E}(3)$. We exploit the geometric properties of Lie groups as well as the robustness afforded by an iteratively reweighted least squares optimization. We also generalize our approach to a joint multiview method that simultaneously solves for the registration of a set of scans. We demonstrate the efficacy of our approach by thorough experimental validation. Our approach significantly outperforms the state-of-the-art robust 3D registration method based on a line process in terms of both speed and accuracy. We also show that this line process method is a special case of our principled geometric solution. Finally, we also present scenarios where global registration based on feature correspondences fails but multiview ICP based on our robust motion estimation is successful.

Efficient and Robust Registration on the 3D Special Euclidean Group

Uttaran Bhattacharya Department of Computer Science University of Maryland College Park [email protected] Venu Madhav Govindu Department of Electrical Engineering Indian Institute of Science [email protected]

June 23, 2019

## 1 Introduction

The availability of consumer depth cameras has allowed us to acquire reliable 3D scans of a scene or an object [1, 2, 3]. To build a geometrically consistent 3D reconstruction, we need to solve the key problem of aligning or registering all scans in a global frame of reference [4, 5, 6, 7]. The resulting solution can be used in a diverse range of contexts including human-computer interaction, archiving of cultural objects, industrial inspection etc. There is a wide variety of solutions to the 3D registration problem in the literature [8, 9, 10, 11, 12]. All of these methods involve a trade-off between speed and accuracy. Recently, [12] has presented a method for fast global registration (henceforth denoted FGR) of 3D scans based on the robustness of a line process. This method has been shown to outperform existing methods in terms of both speed and accuracy.

As in [12], given a set of 3D feature correspondences, we pose the registration problem as one of solving for the rigid Euclidean motion that minimizes a robust cost function. However, unlike their approach, our solution systematically utilises the rich geometric structure of the underlying Lie group representation for 3D motion, i.e., the Special Euclidean group $\mathbb{S}\mathbb{E}(3)$. In this context, we achieve robustness via the iteratively reweighted least squares (IRLS) method. The key observation of our paper is that our specific combination of geometry of rigid Euclidean motion and the robustness of IRLS allows our method to significantly outperform the fast global registration method of [12] without having to take recourse to tuning of parameters. Additionally, we show that the solution proposed in [12] is a special case of our more general (and more accurate and faster) solution. In the process, we demonstrate that we can gain both theoretical insight as well as improved performance by utilizing the rich geometric structure of the underlying geometric representation of rigid Euclidean motions.

Furthermore, our work also addresses two important considerations. Firstly, in order to achieve robustness, some loss functions used for optimization have parameters that need to be either provided a priori or estimated in situ. This is the case with [12] which uses the Geman-McClure loss function. We argue that for the problem of 3D registration the statistical efficiency (i.e. accuracy) trade-off inherent to all robust estimators can be easily optimized using loss functions that are parameter free. This obviates the need to estimate any parameter during the course of optimization. Secondly, we argue that while fast and accurate 3D registration can be achieved using 3D point feature correspondences, this approach has certain limitations. While such feature correspondences can be reliably obtained when the camera motion is small (equivalently there is significant overlap between scans), there are certain scenarios where the feature correspondences break down. For such cases, we demonstrate that accurate joint registration of multiple 3D scans can be achieved by incorporating our robust motion estimation method into local methods such as the ICP.

## 2 Literature Survey

Although the literature of 3D registration is extensive, in the following we only focus on aspects that are directly relevant to our method. A large number of methods for registering 3D scans using point correspondences have two key aspects, (a) a method for establishing point correspondences between two 3D scans and (b) a method for estimating motion given an adequate number of such correspondences. We may further classify methods according to whether they use a fixed set of point correspondences [11, 12, 13, 14] or update them [15, 16, 17, 7, 18]. In the former instance, we use invariant feature representations to find point correspondences across a pair of scans which are then used for motion estimation. In the latter approach, we alternately update point correspondences using nearest neighbors and motion estimation till convergence, such as in the classical approach of ICP and its variants (see [15, 18, 19, 20] and references therein). Independent of the method for establishing correspondences, we need a method for robust motion estimation method given a set of correspondences. This problem of robust motion estimation is the main focus of this paper.

The least squares solution to the motion estimation problem is the classical method of Umeyama [21]. However, the least squares solution breaks down in the presence of outliers. The requisite robustness is often achieved using variants of RANSAC [16, 22, 23, 24, 7, 10] or motion clustering [17, 7, 25]. Other classes of approaches are based on the branch-and-bound framework [8, 26, 27, 28, 29]. However, all of these approaches often require expensive iterations and are slow to converge. Other solutions based on expectation-maximization using Gaussian mixture model representations of the scans [30, 31, 32] are similarly slow. Another approach that is relatively efficient is to perform robust averaging of pairwise motions between scans [33, 34, 35], or perform some variants of pose-graph optimization [11], to produce tightly registered scans. Yet another approach is to use the well-known IRLS method [36] to efficiently optimize robust cost functions [37]. The recent fast global registration method of [12] made use of the duality between robust estimators and line processes [38] to develop a fast approach for global registration. This method has been shown to produce the best results till date in terms of speed as well as accuracy.

## 3 Lie Group Structure of Euclidean Motions

Our method utilizes the geometric structure of Lie groups, specifically that of the Special Euclidean group $\mathbb{S}\mathbb{E}(3)$. In this section, we provide a brief description of the group structure and relevant Lie group properties. The Euclidean motion between two 3D scans consists of a rotation and a translation. While rotations can be represented in a variety of ways, we represent them as $3\times 3$ orthonormal matrices $\mathbf{R}$, i.e., ${\mathrm{\mathbf{R}\mathbf{R}}}^{\top}={\mathbf{I}}_{3}$ and $|\mathbf{R}|=+1$ (here and for the rest of this paper, we use ${\mathbf{I}}_{n}$ to denote the $n\times n$ identity matrix). A rigid Euclidean motion $(\mathbf{R},\mathbf{t})$ $(\mathbf{t}\in {\mathbb{R}}^{3})$ can then be compactly represented by a $4\times 4$ matrix

$$\mathbf{M}=\left[\begin{array}{ccc}\hfill \mathbf{R}\hfill & \hfill |\hfill & \hfill \mathbf{t}\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill |\hfill & \hfill 1\hfill \end{array}\right].$$ | (1) |

The matrices $\mathbf{R}$ and $\mathbf{M}$ satisfy the properties of a matrix group and also form smooth, differential manifolds, i.e., they are Lie groups. Thus, $\mathbf{R}\in \mathbb{S}\mathbb{O}(3)$ and $\mathbf{M}\in \mathbb{S}\mathbb{E}(3)$ where $\mathbb{S}\mathbb{O}(3)$ and $\mathbb{S}\mathbb{E}(3)$ are the Special Orthogonal and the Special Euclidean groups respectively. We also note that $\mathbf{R}$ and $\mathbf{M}$ have $3$ and $6$ degrees of freedom respectively.

Lie Groups: The Lie group structure of $\mathbb{S}\mathbb{E}(3)$ plays an important role in our formulation. Detailed expositions on the geometric properties of this representation are presented in [39, 40]. For our purposes, it will suffice to note that for finite dimensional Lie groups (matrix groups) the product and inverse operations are differentiable mappings. Every point in a Lie group has a local neighborhood (tangent space) called its Lie algebra which has the properties of a vector space. For $\mathbf{R}\in \mathbb{S}\mathbb{O}(3)$ and $\mathbf{M}\in \mathbb{S}\mathbb{E}(3)$, the corresponding Lie algebra are denoted as ${[\bm{\omega}]}_{\times}\in \U0001d530\U0001d52c(3)$ and $\U0001d52a\in \U0001d530\U0001d522(3)$ respectively. Here ${[\bm{\omega}]}_{\times}\in \U0001d530\U0001d52c(3)$ is the skew-symmetric form of the axis-angle rotation representation $\bm{\omega}$. In this representation, $\mathbf{R}$ represents a rotation by angle $||\bm{\omega}||$ about the axis $\frac{\bm{\omega}}{||\bm{\omega}||}$. Further, we can move from a Lie group to its Lie algebra and vice-versa using the logarithm and the exponential mappings respectively. Thus, $\mathbf{R}=\mathrm{exp}({[\bm{\omega}]}_{\times})$ and ${[\bm{\omega}]}_{\times}=\mathrm{log}(\mathbf{R})$. Similarly, we have $\mathbf{M}=\mathrm{exp}(\U0001d52a)$ and $\U0001d52a=\mathrm{log}(\mathbf{M})$ with the forms

$$\begin{array}{c}\hfill \U0001d52a=\mathrm{log}(\mathbf{M})=\left[\begin{array}{ccc}\hfill {[\bm{\omega}]}_{\times}\hfill & \hfill |\hfill & \hfill \mathbf{u}\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill |\hfill & \hfill 0\hfill \end{array}\right];\\ \hfill \mathbf{M}=\mathrm{exp}(\U0001d52a)=\sum _{k=0}^{\mathrm{\infty}}\frac{{\U0001d52a}^{k}}{k!}=\left[\begin{array}{ccc}\hfill \mathbf{R}\hfill & \hfill |\hfill & \hfill \mathbf{t}\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill |\hfill & \hfill 1\hfill \end{array}\right].\end{array}$$ | (2) |

Further, we note that the $\mathrm{exp}(\cdot )$ and $\mathrm{log}(\cdot )$ mappings for $\mathbb{S}\mathbb{O}(3)$ and $\mathbb{S}\mathbb{E}(3)$ have closed form expressions that can be efficiently implemented. Specifically,

$$\mathbf{R}=\mathrm{exp}({[\bm{\omega}]}_{\times})={\mathbf{I}}_{3}+\mathrm{sin}\theta {[\bm{\omega}]}_{\times}+(1-\mathrm{cos}\theta ){[\bm{\omega}]}_{\times}^{2}$$ | (3) |

$$\mathbf{t}=\mathrm{\mathbf{P}\mathbf{u}}\text{for}\mathbf{P}={\mathbf{I}}_{3}+\frac{(1-\mathrm{cos}\theta )}{{\theta}^{2}}{[\bm{\omega}]}_{\times}+\frac{(\theta -\mathrm{sin}\theta )}{{\theta}^{3}}{[\bm{\omega}]}_{\times}^{2}$$ | (4) |

where $\theta =||\bm{\omega}||$. We also note that the special properties of the group structure of $\mathbb{S}\mathbb{E}(3)$ are discussed in [41].

## 4 Robust Motion Estimation

Given $\mathcal{S}\ge 3$ pairs of point correspondences $\{({\mathbf{p}}^{s},{\mathbf{q}}^{s})|1\le s\le \mathcal{S}\}$ between a pair of scans, we can solve for the motion $\mathbf{M}$ required for aligning the pair of scans. In the case of global approaches, where the motion can potentially be large, such point correspondences are obtained by matching geometric features such as FPFH [23] whereas in iterative schemes like ICP, the correspondence of points is obtained by finding the nearest neighbor match on the second scan for a point on the first one. Due to a variety of sources of error, the correspondence pairs are not in perfect alignment or could be grossly incorrect, i.e., ${e}^{s}=\parallel {\mathbf{p}}^{s}-{\mathrm{\mathbf{M}\mathbf{q}}}^{s}\parallel \ne 0$, where ${e}^{s}$ denotes the norm of the discrepancy in registration for the $s$-th correspondence pair $({\mathbf{p}}^{s},{\mathbf{q}}^{s})$. When we assume that the individual errors have a zero-mean, iid Gaussian distribution, the optimal estimate for $\mathbf{M}$ is obtained by a least squares minimization and has a closed form [21]. However, since correspondences could be highly incorrect in practice, instead of a least squares formulation, we pose motion estimation as the optimization of a robust cost function

$$\underset{\mathbf{M}\in \mathbb{S}\mathbb{E}(3)}{\mathrm{min}}\mathcal{C}(\mathbf{M})=\sum _{s=1}^{\mathcal{S}}\rho \left(\parallel {\mathbf{p}}^{s}-{\mathrm{\mathbf{M}\mathbf{q}}}^{s}\parallel \right)=\sum _{s=1}^{\mathcal{S}}\rho ({e}^{s}(\mathbf{M}))$$ | (5) |

where $\rho (\cdot )$ is a robust loss function and we also note that the individual error terms ${e}^{s}(\mathbf{M})$ are a function of the motion $\mathbf{M}$. The use of robust estimators is well studied in statistics as an M-estimation problem and has been successfully used in a variety of vision problems [42]. However, in addition to robustness, in Eqn. 5 we require our solution to satisfy the geometric constraints for $\mathbf{M}\in \mathbb{S}\mathbb{E}(3)$. These requirements of robust estimation under geometric constraints are satisfied by our solution.

### 4.1 Proposed Solution for Pairwise Registration

We propose to minimize the cost function $\mathcal{C}(\mathbf{M})$ in Eqn. 5 in an iterative fashion. Let the estimate for $\mathbf{M}$ at the $(k-1)$-th iteration be denoted $\mathbf{M}(k-1)$. In the $k$-th iteration, let us update the motion matrix by $\mathrm{\Delta}\mathbf{M}(k)$, i.e., $\mathbf{M}(k)=\mathrm{\Delta}\mathbf{M}(k)\mathbf{M}(k-1)$. Noting that here the matrix multiplication is not commutative, our formulation uses a left-invariant metric on $\mathbb{S}\mathbb{E}(3)$ [41, 43]. Using a first-order approximation for the motion update matrix $\mathrm{\Delta}\mathbf{M}(k)$, we have

$\mathrm{\Delta}\mathbf{M}(k)$ | $\approx {\mathbf{I}}_{4}+\mathrm{\Delta}\U0001d52a(k)$ | (6) | ||

$\Rightarrow \mathcal{C}(\mathbf{M}(k))$ | $={\displaystyle \sum _{s=1}^{\mathcal{S}}}\rho \left(\parallel {\mathbf{p}}^{s}-\mathbf{M}(k){\mathbf{q}}^{s}\parallel \right)$ | (7) |

$$=\sum _{s=1}^{\mathcal{S}}\rho \left(\parallel {\mathbf{p}}^{s}-({\mathbf{I}}_{4}+\mathrm{\Delta}\U0001d52a(k))\mathbf{M}(k-1){\mathbf{q}}^{s}\parallel \right).$$ | (8) |

Here we note that the Lie algebra matrix $\mathrm{\Delta}\U0001d52a(k)$ encodes the $6$ parameters that we need to estimate for the update $\mathrm{\Delta}\mathbf{M}(k)$. We can obtain the vector representation for these $6$ parameters using the ‘vee’ operator, i.e., $\U0001d533=\mathrm{\Delta}{\U0001d52a}^{\vee}={\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{u}\hfill \end{array}\right]}^{\top}$. Since we use a first-order approximation in Eqn. 6, the cost $\mathcal{C}(\mathbf{M}(k))$ is linear in $\mathrm{\Delta}\U0001d52a(k)$. We equivalently note that it is also linear in $\U0001d533(k)$. Thus, we rewrite the individual error terms ${e}^{s}(\mathbf{M}(k))$ as

$\begin{array}{cc}\hfill {e}^{s}(\mathbf{M}(k))& =\parallel {\mathbf{p}}^{s}-({\mathbf{I}}_{4}+\mathrm{\Delta}\U0001d52a(k))\mathbf{M}(k-1){\mathbf{q}}^{s}\parallel \hfill \\ & =\parallel {\mathbf{A}}^{s}\U0001d533-{\mathbf{b}}^{s}\parallel \hfill \end{array}$ | (9) |

where ${\mathbf{A}}^{s}$ and ${\mathbf{b}}^{s}$ are the appropriate matrices. The derivation of the explicit forms of ${\mathbf{A}}^{s}$ and ${\mathbf{b}}^{s}$ are given in the appendix. To obtain the update in the $k$-th iteration, we now optimize the cost $\mathcal{C}(\mathbf{M}(k))={\sum}_{s=1}^{\mathcal{S}}\rho ({e}^{s}(\U0001d533))$ w.r.t. $\U0001d533$, and get,

$\frac{{\rho}^{\prime}({e}^{s})}{{e}^{s}}}{({\mathbf{A}}^{s})}^{\top}{\mathbf{A}}^{s}\U0001d533$ | $={\displaystyle \frac{{\rho}^{\prime}({e}^{s})}{{e}^{s}}}{({\mathbf{A}}^{s})}^{\top}{\mathbf{b}}^{s}$ | (10) |

for each summand indexed by $s$, where ${\rho}^{\prime}(\cdot )=\frac{\partial \rho}{\partial e}$ is the influence function of the robust loss $\rho (\cdot )$. We may further denote ${w}^{s}=\frac{{\rho}^{\prime}({e}^{s})}{{e}^{s}}$, which is the relative weight accorded to the $s$-th equation in Eqn. 10. Collecting all such relationships obtained for each pair of correspondences $({\mathbf{p}}^{s},{\mathbf{q}}^{s})$ into a single system of equations we have

$${\mathbf{A}}^{\top}\mathrm{\mathbf{W}\mathbf{A}}\U0001d533={\mathbf{A}}^{\top}\mathrm{\mathbf{W}\mathbf{b}}$$ | (11) |

where $\mathbf{A}=\left[\begin{array}{c}\hfill {\mathbf{A}}^{1}\hfill \\ \hfill \mathrm{\dots}\hfill \\ \hfill {\mathbf{A}}^{\mathcal{S}}\hfill \end{array}\right]$, $\mathbf{b}=\left[\begin{array}{c}\hfill {\mathbf{b}}^{1}\hfill \\ \hfill \mathrm{\dots}\hfill \\ \hfill {\mathbf{b}}^{\mathcal{S}}\hfill \end{array}\right]$, and $\mathbf{W}=\left[\begin{array}{ccc}\hfill {w}^{1}{\mathbf{I}}_{3}\hfill & & \\ & \hfill \mathrm{\ddots}\hfill & \\ & & \hfill {w}^{\mathcal{S}}{\mathbf{I}}_{3}\hfill \end{array}\right]$.

Eqn. 11 is a weighted linear system of equations with the solution $\U0001d533={({\mathbf{A}}^{\top}\mathrm{\mathbf{W}\mathbf{A}})}^{-1}{\mathbf{A}}^{\top}\mathrm{\mathbf{W}\mathbf{b}}$. However, it should be noted that each individual weight ${w}^{s}$ is a function of the error ${e}^{s}$ which, in turn, is dependent on $\U0001d533$, since ${e}^{s}=\parallel {\mathbf{A}}^{s}\U0001d533-{\mathbf{b}}^{s}\parallel $. Thus, in Eqn. 11, the equivalent relationship is ${\mathbf{A}}^{\top}\mathbf{W}(\U0001d533)\mathbf{A}\U0001d533={\mathbf{A}}^{\top}\mathbf{W}(\U0001d533)\mathbf{b}$. The solution for this system of equations is the well-known iteratively reweighted least squares (IRLS) method [36, 44]. In the IRLS method, in each iteration the weights ${w}^{s}$ are estimated based on the current estimate of $\U0001d533$. Given these weights, $\U0001d533$ is re-estimated. This process is repeated till convergence.

Given a solution for $\U0001d533$, we can estimate $\mathrm{\Delta}\mathbf{M}(k)=\mathrm{exp}(\widehat{\U0001d533}(k))$ where the ‘hat’ operator $\widehat{\U0001d533}$ converts the estimated Lie algebra parameters $\U0001d533$ into its equivalent matrix representation $\mathrm{\Delta}\U0001d52a(k)$. We emphasize here that although in Eqn. 6, we assumed a first-order approximation, we map the estimated $\U0001d533(k)$ into an intrinsic estimate of the motion update, i.e., $\mathrm{\Delta}\mathbf{M}(k)=\mathrm{exp}(\widehat{\U0001d533}(k))\in \mathbb{S}\mathbb{E}(3)$. In other words, a first-order approximation in an intermediate step does not mean that the actual cost function is approximated. The mapping $\mathrm{\Delta}\U0001d52a(k)=\mathrm{exp}(\widehat{\U0001d533}(k))$ ensures that the estimated motion $\mathbf{M}(k)$ is always a valid member of the $\mathbb{S}\mathbb{E}(3)$ group. We may now state our solution for robust motion estimation as given in Algorithm 4.1.

[h]
$\{({\mathbf{p}}^{1},{\mathbf{q}}^{1})\mathrm{\cdots}({\mathbf{p}}^{\mathcal{S}},{\mathbf{q}}^{\mathcal{S}})\}$ ($\mathcal{S}$ correspondences across a pairs of scans)

Output: $\mathbf{M}\in \mathbb{S}\mathbb{E}(3)$ (Robust estimate of motion between scans)

Initialization: $\mathbf{M}$ is set to $4\times 4$ identity matrix

[l] \While$||\U0001d533||>\u03f5$ \State1. Compute $\{({\mathbf{A}}^{s},{\mathbf{b}}^{s})|\forall s\in [1\mathrm{\cdots}\mathcal{S}]\}$ using Eqn. 9 \State2. Compute weights ${w}^{s}=\frac{{\rho}^{\prime}({e}^{s})}{{e}^{s}}$ as defined by Eqn. 10 \State3. Estimate $\U0001d533$ as IRLS solution for Eqn. 11 \State4. Update $\mathbf{M}\leftarrow \mathrm{exp}(\widehat{\U0001d533})\mathbf{M}$ \EndWhile Algorithm 4.1 is an iterative algorithm with a nested iteration. The outer loop is defined by the while statement and we denote its number of iterations as ${K}_{outer}$. The inner loop consists of the IRLS step of line 3 in Algorithm 4.1 since IRLS is itself an iterative method. We denote the number of iterations of the IRLS step as ${K}_{IRLS}$.

### 4.2 Extension to Joint Multiview Registration

In Algorithm 4.1 we presented the registration solution for two scans. This approach can be extended to the simultaneous or joint multiview registration of a set of scans. Towards this end, we define a viewgraph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ where ${v}_{i}\in \mathcal{V}$ represents the Euclidean motion of the $i$-th scan (equivalently camera) and an edge $(i,j)\in \mathcal{E}$ signifies that the relative Euclidean motion between the $i$-th and $j$-th scan can be determined from available matches. Further, we denote the number of scans as $N=|\mathcal{V}|$. We may now define the cost function to be optimized for joint multiview registration as follows

$\begin{array}{cc}\hfill \mathcal{C}(\mathbb{M})& ={\displaystyle \sum _{(i,j)\in \mathcal{E}}}{\displaystyle \sum _{s=1}^{{\mathcal{S}}_{ij}}}\rho \left(\parallel {\mathbf{M}}_{i}{\mathbf{p}}_{i}^{s}-{\mathbf{M}}_{j}{\mathbf{p}}_{j}^{s}\parallel \right)\hfill \\ & ={\displaystyle \sum _{(i,j)\in \mathcal{E}}}{\displaystyle \sum _{s=1}^{{\mathcal{S}}_{ij}}}\rho ({e}_{ij}^{s}(\mathbb{M}))\hfill \end{array}$ | (12) |

where $\mathbb{M}=\left\{{\mathbf{M}}_{1}\mathrm{\cdots}{\mathbf{M}}_{N}\right\}$ denotes the set of absolute motions of each of the $N$ scans w.r.t. to a global frame of reference and ${\mathcal{S}}_{ij}$ is the number of correspondences between the $i$-th and $j$-th scan. We again use an iterative approach to minimize the cost function $\mathcal{C}(\mathbb{M})$ in Eqn. 12 w.r.t. $\mathbb{M}$. In the $k$-th iteration, we update each motion matrix ${\mathbf{M}}_{i}$ by $\mathrm{\Delta}{\mathbf{M}}_{i}(k)$, i.e., ${\mathbf{M}}_{i}(k)=\mathrm{\Delta}{\mathbf{M}}_{i}(k){\mathbf{M}}_{i}(k-1)$ $\forall i\in [1\mathrm{\cdots}N]$. Using a first-order approximation for each update matrix $\mathrm{\Delta}{\mathbf{M}}_{i}(k)$, we have

$\begin{array}{cc}\hfill {e}_{ij}^{s}(\mathbb{M}(k))& =\parallel ({\mathbf{I}}_{4}+\mathrm{\Delta}{\U0001d52a}_{i}(k)){\mathbf{M}}_{i}(k-1){\mathbf{p}}_{i}^{s}\hfill \\ & -({\mathbf{I}}_{4}+\mathrm{\Delta}{\U0001d52a}_{j}(k)){\mathbf{M}}_{j}(k-1){\mathbf{p}}_{j}^{s}\parallel \hfill \end{array}$ | (13) | |||

$=\parallel {\mathbf{A}}_{ij}^{s}\mathbb{v}-{\mathbf{b}}_{ij}^{s}\parallel $ | (14) |

where $\mathbb{v}={\left[\begin{array}{ccc}\hfill {\U0001d533}_{1}\hfill & \hfill \mathrm{\cdots}\hfill & \hfill {\U0001d533}_{N}\hfill \end{array}\right]}^{\top}$ collates the corresponding vector representations of each of the Lie algebra matrices $\mathrm{\Delta}{\U0001d52a}_{i}(k)$, and ${\mathbf{A}}_{ij}^{s}$ and ${\mathbf{b}}_{ij}^{s}$ are constructed analogous to Eqn. 9. The subsequent update in the $k$-th iteration is then analogously obtained from the relation

$${\mathbb{A}}^{\top}\mathbb{W}(\mathbb{v})\mathbb{A}\mathbb{v}={\mathbb{A}}^{\top}\mathbb{W}(\mathbb{v})\mathbb{b}$$ | (15) |

where $\mathbb{A}$, $\mathbb{b}$ and $\mathbb{W}(\mathbb{v})$ correspondingly collate all ${\mathbf{A}}_{ij}^{s}$, ${\mathbf{b}}_{ij}^{s}$ and ${\mathbf{w}}_{ij}^{s}=\frac{{\rho}^{\prime}({e}_{ij}^{s})}{{e}_{ij}^{s}}$. As earlier, the solution for the system of equations in Eqn. 15 is the IRLS method, where we estimate the weights ${w}_{ij}^{s}$ and the collated vector representation $\mathbb{v}$ in alternating iterations till convergence.

Given a solution for $\mathbb{v}$, we can estimate, for each $i\in \left[1\mathrm{\cdots}N\right]$, $\mathrm{\Delta}{\mathbf{M}}_{i}(k)=\mathrm{exp}(\widehat{{\U0001d533}_{i}}(k))$ and thereby update each member of $\mathbb{M}(k)$ as ${\mathbf{M}}_{i}(k)\leftarrow \mathrm{\Delta}{\mathbf{M}}_{i}(k){\mathbf{M}}_{i}(k-1)$. It should also be noted that in multiview registration, the choice of the global frame of reference for the set $\mathbb{M}$ is arbitrary. For our implementation, we fix it to the first scan without loss of generality, i.e., we set ${\mathbf{M}}_{1}={\mathbf{I}}_{4}$ and do not update ${\mathbf{M}}_{1}$ throughout course of optimization. The detailed algorithm is presented in the appendix.

## 5 Results

In recent literature, the fast global registration (FGR) method of [12] has been shown to outperform other global registration and local refinement algorithms in terms of speed and registration accuracy. Therefore, owing to space constraints, we confine the comparison of our results mostly to those of FGR as this will suffice to show where our performance stands in terms of the other registration algorithms. Furthermore, we test our algorithm on 3 different choices of the loss function $\rho (\cdot )$, namely ${L}_{\frac{1}{2}}$: $\rho (x)=\sqrt{|x|}$, ${L}_{1}$: $\rho (x)=|x|$, and scaled Geman-McClure: $\rho (x)=\frac{\mu {x}^{2}}{\mu +{x}^{2}}$, where $\mu $ is the scale factor. The FGR approach of [12] uses only the scaled Geman-McClure loss function and anneals $\mu $ in fixed steps per iteration. To enable comparison, in our tests with the scaled Geman-McClure loss function we anneal $\mu $ in an identical manner. We present results on both pairwise and multiview registration tests. For these pairwise and multiview tests we terminate using $\u03f5={10}^{-5}$ and $\u03f5={10}^{-7}$ respectively. All our reported running times are measured on a single thread on an Intel Xeon(R) E5-2650 v3 processor clocked at 2.30 GHz.

### 5.1 Pairwise Registration

We present the performance of our proposed pairwise registration on the synthetic range data provided by [12] and on the 4 indoor sequences in the Augmented ICL-NUIM dataset provided by Choi et al. [11]. In all our pairwise tests, we use ${K}_{IRLS}=2$. We compare the registration errors given by the 3 versions of our method with the following prior methods: Super4PCS [7], GoICP [29], Choi et al. [11], FGR [12] and DARE [45] (using the hyperparameters suggested by the authors). Registration errors comprise of statistical measures on the distances between the ground-truth point correspondences between pairs of scans post alignment. We also report the trajectory errors, which include statistical measures on both the rotation errors and the translation norm errors between the pairs of cameras (corresponding to the given scans) w.r.t. the ground-truth camera pair, for all the methods.

Synthetic range dataset: We perform the set of controlled experiments described in [12] on each of their 5 synthetic datasets, at the given Gaussian noise levels $\sigma =0.0025$ and $\sigma =0.005$ (for each model, $\sigma $ is in units of the surface diameter). Note that adding Gaussian noise to a pair of scans introduces outliers in the point correspondences that are computed between that pair. Since depth cameras produce noisy scans in practice, this is a realistic way of increasing the outlier percentage in the point correspondences for synthetic scans. Table 1 lists for each method and for each noise level, the mean and maximal RMSE on the aligned ground truth correspondences. For both noise levels, our method with the ${L}_{\frac{1}{2}}$ loss function attains the lowest registration error together with FGR. Table 2 reports the mean running time of the motion step of each method for each of the 5 models in the dataset. The ${L}_{\frac{1}{2}}$ of our method is more $3\times $ faster than all prior methods on the average, and the ${L}_{1}$ version is more than $5\times $ faster.

Method | $\sigma =0.0025$ | $\sigma =0.0050$ | ||||||
---|---|---|---|---|---|---|---|---|

Md.RAE | Md.TNE | Mn.RMSE | Mx.RMSE | Md.RAE | Md.TNE | Mn.RMSE | Mx.RMSE | |

Super4PCS [7] | 0.864 | 0.008 | 0.014 | 0.029 | 1.468 | 0.012 | 0.017 | 0.095 |

GoICP [29] | 1.207 | 0.011 | 0.032 | 0.133 | 1.736 | 0.019 | 0.037 | 0.127 |

Choi et al. [11] | 0.778 | 0.006 | 0.008 | 0.022 | 1.533 | 0.015 | 0.035 | 0.274 |

FGR [12] | 0.749 | 0.005 | 0.004 | 0.011 | 1.146 | 0.008 | 0.006 | 0.017 |

DARE [45] | 1.851 | 0.013 | 0.035 | 0.176 | 2.005 | 0.025 | 0.054 | 0.312 |

Our ${L}_{\frac{1}{2}}$ | 0.545 | 0.004 | 0.004 | 0.011 | 0.959 | 0.008 | 0.006 | 0.017 |

Our ${L}_{1}$ | 0.566 | 0.004 | 0.004 | 0.011 | 1.516 | 0.011 | 0.007 | 0.017 |

Our GM | 0.725 | 0.005 | 0.004 | 0.011 | 1.146 | 0.008 | 0.006 | 0.017 |

Dataset | Mean # points | Super 4PCS [7] | GoICP [29] | Choi et al. [11] | FGR [12] | DARE [45] | Our ${L}_{\frac{1}{2}}$ | Our
${L}_{1}$ |
Our GM |

Bimba | 9,416 | 16,230 | 1,550 | 650 | 11.9 | 920 | 3.9 | 2.5 | 5.5 |

Child’n | 11,148 | 18,410 | 1,620 | 890 | 16.8 | 960 | 4.8 | 3.3 | 8.0 |

Dragon | 11,232 | 20,520 | 1,840 | 970 | 17.6 | 1,090 | 5.0 | 3.4 | 8.2 |

Angel | 12,072 | 29,640 | 3,000 | 1,090 | 19.1 | 1,770 | 5.3 | 3.8 | 8.9 |

Bunny | 13,357 | 38,470 | 5,530 | 1,170 | 21.6 | 3,310 | 7.4 | 5.1 | 9.9 |

Mean | 11,445 | 24,650 | 2,710 | 960 | 17.4 | 1,610 | 5.3 | 3.6 | 8.1 |

Augmented ICL-NUIM dataset: Each of the 4 sequences in the Augmented ICL-NUIM dataset provided by Choi et al. [11] consist of 2350 to 2870 scans of an indoor scene. Moreover, the given scans are provided in a smooth temporal sequence, i.e., pairs of scans with proximity in timestamp also have sufficient view overlap with each other. This, in turn, leads to reliable FPFH feature matches between such pairs. We therefore tested the performance of all the methods for all pairs of scans $(i,j)$ in each sequence such that $|i-j|\le 10$. Table 3 lists the results on each dataset for the various methods under consideration. For each dataset and corresponding method, we list the median rotation angle error and the median translation norm error of the recovered pairwise camera motions, as well as the mean computation time on the pairs. The ${L}_{\frac{1}{2}}$ version of our method performs the best in terms of the trajectory error statistics. It can also be seen to be significantly faster than the FGR method of [12]. More such pairwise results on other datasets are given in the appendix.

Method | livingroom 1 | livingroom 2 | office 1 | office 2 | ||||||||

Md.RAE | Md.TNE | Mn.Time | Md.RAE | Md.TNE | Md.Time | Md.RAE | Md.TNE | Mn.Time | Md.RAE | Md.TNE | Mn.Time | |

Super4PCS [7] | 1.104 | 0.039 | 368,030 | 0.616 | 0.033 | 344,720 | 0.932 | 0.038 | 367,980 | 0.844 | 0.027 | 345,460 |

GoICP [29] | 1.336 | 0.071 | 35,110 | 0.992 | 0.058 | 33,420 | 1.365 | 0.066 | 34,450 | 1.104 | 0.047 | 32,530 |

Choi et al. [11] | 0.941 | 0.041 | 14,740 | 0.551 | 0.031 | 13,850 | 0.811 | 0.036 | 14,720 | 0.765 | 0.029 | 13,990 |

FGR [12] | 0.793 | 0.029 | 272 | 0.482 | 0.021 | 181 | 0.707 | 0.020 | 272 | 0.669 | 0.016 | 177 |

DARE [45] | 1.305 | 0.044 | 21,500 | 1.172 | 0.059 | 20,320 | 1.716 | 0.037 | 21,110 | 1.286 | 0.068 | 20,920 |

Our ${L}_{\frac{1}{2}}$ | 0.595 | 0.023 | 61 | 0.380 | 0.017 | 50 | 0.474 | 0.014 | 59 | 0.437 | 0.011 | 45 |

Our ${L}_{1}$ | 0.964 | 0.025 | 33 | 0.419 | 0.019 | 27 | 0.569 | 0.017 | 33 | 0.524 | 0.013 | 25 |

Our GM | 0.793 | 0.029 | 118 | 0.482 | 0.021 | 87 | 0.707 | 0.020 | 118 | 0.669 | 0.016 | 89 |

### 5.2 Joint Multiview Registration

We present the performance of our joint multiview registration algorithm on the 4 sequences in the Augmented ICL-NUIM dataset, specifically, on the 47 to 57 scene fragments that were provided for each sequence by Choi et al. [11]. We use ${K}_{IRLS}=3$ for multiview registration, as the joint optimization variable and its corresponding search space are both large ($\mathbb{v}\in {\mathbb{R}}^{6(N-1)}$ in Eqn. 15 for $N$ cameras). First, we compute pairwise motion estimates between fragments followed by a robust motion averaging step on $\mathbb{S}\mathbb{E}(3)$ that is similar to that used for rotation averaging in [46]. The output of this two-stage approach is used to initialize the joint multiview optimization in Eqn. 12. The main drawback of only using the two-step approach for global registration is that it is not a true global method. It only considers local point correspondences, and then averages out the errors made by the pairwise motion estimates. Conversely, the joint multiview approach deals with point correspondences in the global setting and solves for the global cost function. The relative improvement in reconstruction error gained by using the joint multiview approach on top of the two-stage approach is shown in a table in the appendix.

We compare our results with those of Choi et al. [11] and FGR [12]. While we are aware of other approaches to multiview registration including closed form approaches, we omit them from our comparisons because they failed to provide a solution in the large-scale dataset for multiview registration we have used. For example, Bartoli et al. [47] and Bergström et al. [48], among others, compute transformations using a closed form SVD solution that do not scale to large scale data. Other alternates, such as that of Fitzgibbon et al. [49], use an LM-based approach, which are slow and do not exploit the geometry of the problem.

Table 4 lists the mean registration error from the ground truth surface achieved by each compared method on each sequence as well as the time taken to complete execution. For estimating the mean registration error, we use the CloudCompare software available at http://www.cloudcompare.org.

Once again, the ${L}_{\frac{1}{2}}$ version of our method performs the best overall in terms of registration error and is significantly faster than the FGR method of [12]. Also our ${L}_{1}$ method is the fastest amongst all methods with a slight drop in accuracy compared to our ${L}_{\frac{1}{2}}$ method. Finally, Figure 1 shows a complete reconstruction of the sequence livingroom 2 from the Augmented ICL-NUIM dataset produced by the ${L}_{\frac{1}{2}}$ version of our multiview method. Reconstructions of other sequences are given in the appendix.

Method | livingroom 1 | livingroom 2 | office 1 | office 2 | ||||

MRE | Time | MRE | Time | MRE | Time | MRE | Time | |

Choi et al. [11] | 0.04 | 8,940 | 0.07 | 3,360 | 0.03 | 4,500 | 0.04 | 4,080 |

FGR [12] | 0.05 | 131 | 0.06 | 81 | 0.03 | 69 | 0.05 | 48 |

Our ${L}_{\frac{1}{2}}$ | 0.04 | 71 | 0.05 | 49 | 0.03 | 42 | 0.04 | 32 |

Our ${L}_{1}$ | 0.07 | 62 | 0.09 | 40 | 0.04 | 36 | 0.06 | 28 |

Our GM | 0.05 | 88 | 0.06 | 70 | 0.03 | 55 | 0.05 | 41 |

## 6 Discussion

As we demonstrated in Section 5, our motion estimation method in Algorithm 4.1 is both fast and accurate. More specifically, our method outperforms the state-of-the-art FGR method of [12] in terms of both speed and accuracy. Given their strong similarities, in Section 6.1 we examine the relationship of the method of [12] and our approach in Algorithm 4.1. Following that, we discuss the limitations of using FPFH for feature matching and our approach to overcome those in Section 6.2.

### 6.1 Comparison with FGR [12]

In [12], the cost function to be minimized is the same as that of Eqn. 5. However, for minimizing this cost function, they use a line process optimization. Originally developed in [38] for modelling discontinuities, a line process optimization can be shown to be equivalent to optimizing a robust estimator. Recalling that ${e}^{s}=\parallel {\mathbf{p}}^{s}-{\mathrm{\mathbf{M}\mathbf{q}}}^{s}\parallel $, we define a cost function

$$E(\mathbf{M},\mathbf{L})=\sum _{s=1}^{\mathcal{S}}{({e}^{s})}^{2}{l}^{s}+\psi ({l}^{s})$$ | (16) |

where $\mathbf{L}=[{l}^{1}\mathrm{\cdots}{l}^{\mathcal{S}}]$ is the collection of line processes ${l}^{s}$ for each correspondence pair $({\mathbf{p}}^{s},{\mathbf{q}}^{s})$ and $\psi (\cdot )$ is a penalty term for each line process. Here $\psi (l)$ is a monotonically decreasing function designed such that when $l=0$, $\psi (l)$ is a fixed non-negative constant and when $l=1$, $\psi (l)=0$. Thus, varying the line process $l$ in the interval $[0,1]$ allows us to move between a least-squares and a robust regime. In [38] it has been shown that for every choice of loss function $\rho (\cdot )$, there is an equivalent $\psi (\cdot )$ such that minimization in Eqn. 16 yields the same solution as that of Eqn. 5. The FGR method of [12] utilizes the robustness of the line process to estimate the desired $\mathbf{M}\in \mathbb{S}\mathbb{E}(3)$. Using a first-order approximation of the motion update, [12] arrives at a Gauss-Newton solution (Eqn. 8 of [12]). It can be easily shown that this solution is identical to solving the system of equations in Eqn. 11 in our notation. In other words, while solving for the update step $\U0001d533(k)$, the FGR method of [12] implicitly carries out a single iteration of our IRLS step in line 3 of Algorithm 4.1. In contrast, in our approach we carry out ${K}_{IRLS}>1$ iterations of the IRLS step to achieve better convergence.

Although the difference between our method and that of [12] is only in the number of iterations of the inner IRLS step, its implication is both subtle and profound. If we were solving a single IRLS problem in a vector space setting, this difference would have been immaterial. However, we note that in both of our approaches, the linear solution for the updates $\U0001d533(k)$ are interleaved with the non-linear motion updates $\mathbf{M}\leftarrow \mathrm{exp}(\widehat{\U0001d533})\mathbf{M}$, resulting in significantly different trajectories of the solution $\mathbf{M}(k)$ on the $\mathbb{S}\mathbb{E}(3)$ group. Specifically, in our case, by iterating the IRLS step to convergence we obtain the best possible estimate of $\U0001d533(k)$ in each intermediate step which, in turn, results in the best improvement of the estimate $\mathbf{M}(k)$ (equivalently the most reduction in the cost function $\mathcal{C}(\cdot )$ in the $k$-th step).

Another noteworthy difference between the two methods is the choice of the parametrization of the motion representation. We use the geometrically correct form of $\mathrm{\Delta}\U0001d52a(k)$ in Eqn. 6, i.e., $\U0001d533={\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{u}\hfill \end{array}\right]}^{\top}$ for our update step. However, for their update step, the authors of [12] use an extrinsic form of motion parametrization, i.e., ${\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{t}\hfill \end{array}\right]}^{\top}$. While our parametrization is geometrically consistent with Lie group theory, we can recognize from Eqn. 4 that the choice in [12] is approximately close to our representation for small motion, i.e., $\mathbf{u}\to \mathbf{t}$ if and only if $\theta \to 0$. Conversely, for sufficiently large $\theta $, the approximate representation ${\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{t}\hfill \end{array}\right]}^{\top}$ is notably different from the exact representation ${\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{u}\hfill \end{array}\right]}^{\top}$ of the $\U0001d530\U0001d522(3)$ form. Therefore the improvement per iteration in the cost function for the method of [12] is lower compared to our single iteration IRLS form. The result is that the line process-based method of [12] has slower convergence.

We highlight both these differences for a pair of scans from the Augmented ICL-NUIM dataset in Figure 3. For the purpose of illustration, we consider the ${L}_{\frac{1}{2}}$ loss function for the line process optimization routine proposed in [12] as well as for our optimization routine. The reason we do not consider the Geman-McClure loss function is that it has a scale factor, which, in practice, is initialized at a large value and has to be progressively annealed during optimization. This annealing introduces artifacts that unnecessarily clutter the illustration of the underlying differences. In other words, we use ${L}_{\frac{1}{2}}$ because it does not alter the fundamental properties of the two optimization routines, and at the same time lends itself to a clean illustration of our argument. We also note that in Figure 3, we represent the cost $\mathcal{C}(\mathbf{M})$ as a function of the number of iterations ${K}_{outer}$, and that it is depicted on a ${\mathrm{log}}_{10}$ scale. For ease of visualization, we only show the plot for the iteration range $[2,10]$. Alongside the figure, Table 3 reports the number of iterations ${K}_{outer}$ taken by each method to reach the different convergence criteria specified by $\u03f5$.

Firstly we observe from Table 3 that even though the line process is conceptually equivalent to our procedure with a single IRLS step, the proposed optimization routine of [12] takes more iterations than our actual procedure with a single IRLS step to converge to the same convergence criterion. This is because of the extrinsic (approximate) parametrization ${\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{t}\hfill \end{array}\right]}^{\top}$ used in [12] as opposed to the correct Lie algebraic representation ${\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{u}\hfill \end{array}\right]}^{\top}$. Secondly, it is clear from both Figure 3 and Table 3 that the cost function converges in progressively fewer iterations as we increase the number of IRLS steps. However, it should be noted that increasing the number of IRLS steps makes each iteration of our optimization routine slower as well. Therefore, a balance has to be struck between the speed of each iteration and the number of iterations required for convergence. In practice, we have found that using 2 IRLS steps per iteration for pairwise registration and 3 IRLS steps per iteration for joint multiview registration yield our desired results. In any event, the key observation is that the single iteration of the FGR is insufficient and yields poorer convergence properties compared to our formulation.

Choice of Loss Function $\rho \mathbf{}\mathrm{(}\mathrm{\cdot}\mathrm{)}$: The choice of the loss function $\rho (.)$ to be used is a critical factor in our estimation procedure. All loss functions achieve robustness to outliers by a trade-off with statistical efficiency (i.e. accuracy). In practice, the accuracy achievable by a chosen loss function depends on the empirical nature of the error distribution. As discussed earlier, in [12] the authors use the Geman-McClure loss function $\rho (x)=\frac{\mu {x}^{2}}{\mu +{x}^{2}}$. Here the performance critically depends on chosing a good value for $\mu $ that reflects the outlier distribution inherent to the data used. In [12] the authors start with a large $\mu $ and progressively reduce it in fixed amounts with every iteration. However, if the nature of the data varies significantly, such a fixed annealing procedure may not produce the best possible results. To overcome this limitation of the Geman-McClure method, as we have shown in Section 5, we have also tested for ${L}_{1}$ and ${L}_{\frac{1}{2}}$ and demonstrated that the latter provides the best performance. Apart from improved performance, an important desirable property of using ${L}_{\frac{1}{2}}$ is that it is entirely parameter free, hence we do not need to follow any additional annealing procedure. We could conceivably further improve performance by determing the optimal exponent $p$ in ${L}_{p}(.)=\parallel .{\parallel}_{2}^{p}$ for $$. However, for a broad range of possible error distributions, we find that ${L}_{\frac{1}{2}}$ is adequate.

### 6.2 Limitation of FPFH Feature-Matching Based Registration

In Section 5, we have demonstrated the potency of the FPFH feature-matching based registration paradigm. However, we note that these experiments, derived from those presented in [12], have special data characteristics. Specifically, in these examples, either the datasets are small or the motions between adjacent scans are small in magnitude and exhibit temporal smoothness. However, we note here that registration based on feature correspondences cannot work in all possible contexts. For example, when scans in the input dataset have sufficiently large depth differences or large motions between them in both rotation and translation, FPFH feature-based matches become few and unreliable. Consequently, the reconstructions given by both FGR [12] and our proposed algorithms become incorrect. In such a scenario, we need to take recourse to using a robust ICP-based multiview registration method (albeit with a greater computational cost) which converges to the correct solution.

In this alternative approach, following [33], we again consider the camera viewgraph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$. For each edge available in the set $\mathcal{E}$, we estimate the pairwise motion using a robust version of ICP. Specifically, for the motion estimation step in our robust ICP, we use the motion obtained using the ${L}_{\frac{1}{2}}$ loss optimization for pairwise registration method as described in Algorithm 4.1. After all pairwise motions represented by the edges $\mathcal{E}$ are estimated, we estimate the absolute motion of each camera w.r.t. a global frame of reference using robust motion averaging. Here, our solution for robust motion averaging for rigid Euclidean motions is similar to the robust rotation averaging method of [46]. We find that typically, we achieve the desired convergence in 3 full iterations of this procedure. To illustrate our argument, we show the qualities of reconstructions achieved for a life-size statue of Mahatma Gandhi, where the input set of scans is small and mostly have low overlap. Moreover, overlapping scans have significant depth differences between them, leading to significantly different FPFH features and consequently, a high percentage of incorrect matches. A visual representation of this scenario is given in the appendix for better understanding. As we can see in Figure 4, the joint multiview registration using FPFH features-based matches fails to correctly align some of the scans, whereas the robust multiview-based ICP routine successfully produces a correct reconstruction.

## 7 Conclusion

We have presented a fast and efficient 3D point registration method that optimizes on the $\mathbb{S}\mathbb{E}(3)$ group and generalizes the line process based optimization method. In addition, we have shown that in scenarios where feature correspondences cannot be reliably established, our robust motion estimation can make a multiview ICP method robust and effective.

## References

- [1] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon, “KinectFusion: Real-Time Dense Surface Mapping and Tracking,” in Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality, 2011.
- [2] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A Benchmark for the Evaluation of RGB-D SLAM Systems,” in International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, pp. 573–580, IEEE, 2012.
- [3] Z. Zhang, “Microsoft Kinect Sensor and its Effect,” IEEE Multimedia, vol. 19, no. 2, pp. 4–10, 2012.
- [4] J. Salvi, C. Matabosch, D. Fofi, and J. Forest, “A Review of Recent Range Image Registration Methods with Accuracy Evaluation,” Image and Vision Computing, vol. 25, no. 5, pp. 578–596, 2007.
- [5] G. K. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin, “Registration of 3D Point Clouds and Meshes: A Survey from Rigid to Nonrigid,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 7, pp. 1199–1217, 2013.
- [6] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan, “3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2270–2287, 2014.
- [7] N. Mellado, D. Aiger, and N. J. Mitra, “Super 4PCS Fast Global Pointcloud Registration via Smart Indexing,” in Computer Graphics Forum, vol. 33 (5), pp. 205–215, Wiley Online Library, 2014.
- [8] N. Gelfand, N. J. Mitra, L. J. Guibas, and H. Pottmann, “Robust Global Registration,” in Symposium on Geometry Processing, vol. 2 (3), p. 5, 2005.
- [9] A. Makadia, A. Patterson, and K. Danilidis, “Fully Automatic Registration of 3D point Clouds,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1297–1304, IEEE, 2006.
- [10] D. Holz, A. E. Ichim, F. Tombari, R. B. Rusu, and S. Behnke, “Registration with the Point Cloud Library: A Modular Framework for Aligning in 3D,” IEEE Robotics & Automation Magazine, vol. 22, no. 4, pp. 110–124, 2015.
- [11] S. Choi, Q.-Y. Zhou, and V. Koltun, “Robust Reconstruction of Indoor Scenes,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 5556–5565, IEEE, 2015.
- [12] Q.-Y. Zhou, J. Park, and V. Koltun, “Fast Global Registration,” in European Conference on Computer Vision, pp. 766–782, Springer, 2016.
- [13] F. Tombari, S. Salti, and L. Di Stefano, “Performance Evaluation of 3D Keypoint Detectors,” International Journal of Computer Vision, vol. 102, no. 1-3, pp. 198–220, 2013.
- [14] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M. Kwok, “A Comprehensive Performance Evaluation of 3D Local Feature Descriptors,” International Journal of Computer Vision, vol. 116, no. 1, pp. 66–89, 2016.
- [15] S. Rusinkiewicz and M. Levoy, “Efficient Variants of the ICP Algorithm,” in Proceedings of the Third International Conference on 3D Digital Imaging and Modeling, pp. 145–152, IEEE, 2001.
- [16] D. Aiger, N. J. Mitra, and D. Cohen-Or, “4-Points Congruent Sets for Robust Pairwise Surface Registration,” in ACM Transactions on Graphics (ToG), vol. 27 (3), p. 85, ACM, 2008.
- [17] B. Drost, M. Ulrich, N. Navab, and S. Ilic, “Model Globally, Match Locally: Efficient and Robust 3D Object Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 998–1005, IEEE, 2010.
- [18] S. Bouaziz, A. Tagliasacchi, and M. Pauly, “Sparse Iterative Closest Point,” in Proceedings of the Eleventh Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 113–123, Eurographics Association, 2013.
- [19] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek, “The Trimmed Iterative Closest Point Algorithm,” in Proceedings of the 16th International Conference on Pattern Recognition, vol. 3, pp. 545–548, IEEE, 2002.
- [20] J. Yang, H. Li, and Y. Jia, “Go-ICP: Solving 3D Registration Efficiently and Globally Optimally,” in IEEE International Conference on Computer Vision, pp. 1457–1464, IEEE, 2013.
- [21] S. Umeyama, “Least-Squares Estimation of Transformation Parameters Between Two Point Patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 376–380, 1991.
- [22] R. Raguram, J.-M. Frahm, and M. Pollefeys, “A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus,” in European Conference on Computer Vision, pp. 500–513, Springer, 2008.
- [23] R. B. Rusu, N. Blodow, and M. Beetz, “Fast Point Feature Histograms (FPFH) for 3D Registration,” in IEEE International Conference on Robotics and Automation, pp. 3212–3217, IEEE, 2009.
- [24] C. Papazov, S. Haddadin, S. Parusel, K. Krieger, and D. Burschka, “Rigid 3D Geometry Matching for Grasping of Known Objects in Cluttered Scenes,” The International Journal of Robotics Research, vol. 31, no. 4, pp. 538–553, 2012.
- [25] R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison, “SLAM++: Simultaneous Localisation and Mapping at the Level of Objects,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359, IEEE, 2013.
- [26] R. I. Hartley and F. Kahl, “Global Optimization Through Searching Rotation Space and Optimal Estimation of the Essential Matrix,” in IEEE 11th International Conference on Computer Vision, pp. 1–8, IEEE, 2007.
- [27] H. Li and R. Hartley, “The 3D-3D Registration Problem Revisited,” in IEEE 11th International Conference on Computer Vision, pp. 1–8, IEEE, 2007.
- [28] O. Enqvist, K. Josephson, and F. Kahl, “Optimal Correspondences from Pairwise Constraints,” in IEEE 12th International Conference on Computer Vision, pp. 1295–1302, IEEE, 2009.
- [29] J. Yang, H. Li, D. Campbell, and Y. Jia, “Go-ICP: a Globally Optimal Solution to 3D ICP Point-Set Registration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2241–2254, 2016.
- [30] B. Jian and B. C. Vemuri, “Robust Point Set Registration Using Gaussian Mixture Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 1633–1645, Aug 2011.
- [31] G. D. Evangelidis and R. Horaud, “Joint Alignment of Multiple Point Sets with Batch and Incremental Expectation-Maximization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 1397–1410, June 2018.
- [32] B. Eckart, K. Kim, and J. Kautz, “HGMR: Hierarchical Gaussian Mixtures for Adaptive 3D Registration,” in The European Conference on Computer Vision (ECCV), September 2018.
- [33] V. M. Govindu and A. Pooja, “On Averaging Multiview Relations for 3D Scan Registration,” IEEE Transactions on Image Processing, vol. 23, pp. 1289–1302, March 2014.
- [34] A. Torsello and A. Albarelli, “Multi-View Registration via Graph Diffusion of Dual Quaternions,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2441–2448, 2011.
- [35] F. Arrigoni, B. Rossi, and A. Fusiello, “Global Registration of 3D Point Sets via LRS Decomposition,” in European Conference on Computer Vision, pp. 489–504, 2016.
- [36] P. W. Holland and R. E. Welsch, “Robust Regression Using Iteratively Reweighted Least-Squares,” Communications in Statistics - Theory and Methods, vol. 6, no. 9, pp. 813–827, 1977.
- [37] K. Aftab and R. Hartley, “Convergence of Iteratively Re-weighted Least Squares to Robust M-estimators,” in Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pp. 480–487, IEEE, 2015.
- [38] M. J. Black and A. Rangarajan, “On the Unification of Line Processes, Outlier Rejection, and Robust Statistics with Applications in Early Vision,” International Journal of Computer Vision, vol. 19, no. 1, pp. 57–91, 1996.
- [39] G. Chirikjian, Stochastic Models, Information Theory, and Lie Groups, Volume 2: Analytic Methods and Modern Applications. Birkhauser Boston, 2011.
- [40] J. Selig, Geometrical Methods in Robotics. Springer New York, 2013.
- [41] F. C. Park, “Distance Metrics on the Rigid-Body Motions with Applications to Mechanism Design,” Journal of Mechanical Design, vol. 117, no. 1, pp. 48–54, 1995.
- [42] P. Meer, D. Mintz, A. Rosenfeld, and D. Y. Kim, “Robust Regression Methods for Computer Vision: A Review,” International Journal of Computer Vision, vol. 6, no. 1, pp. 59–70, 1991.
- [43] E. Zacur, M. Bossa, and S. Olmos, “Left-Invariant Riemannian Geodesics on Spatial Transformation Groups,” SIAM Journal on Imaging Sciences, vol. 7, no. 3, pp. 1503–1557, 2014.
- [44] R. Szeliski, Computer Vision: Algorithms and Applications. Springer-Verlag New York, Inc., 2010.
- [45] F. J. Lawin, M. Danelljan, F. Khan, P.-E. Forssén, and M. Felsberg, “Density Adaptive Point Set Registration,” in IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision Foundation, June 2018.
- [46] A. Chatterjee and V. M. Govindu, “Robust Relative Rotation Averaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 958–972, April 2018.
- [47] A. Bartoli, D. Pizarro, and M. Loog, “Stratified Generalized Procrustes Analysis,” International journal of computer vision, vol. 101, no. 2, pp. 227–253, 2013.
- [48] P. Bergström and O. Edlund, “Robust Registration of Point Sets Using Iteratively Reweighted Least Squares,” Computational Optimization and Applications, vol. 58, no. 3, pp. 543–561, 2014.
- [49] A. W. Fitzgibbon, “Robust Registration of 2D and 3D Point Sets,” Image and Vision Computing, vol. 21, no. 13-14, pp. 1145–1153, 2003.
- [50] A. S. Mian, M. Bennamoun, and R. Owens, “Three-Dimensional Model-Based Object Recognition and Segmentation in Cluttered Scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1584–1601, 2006.

## Appendix A Linear Form of the Error Term in Eqn. 9

Let the given set of correspondences be:

$$\begin{array}{c}\hfill \{({\mathbf{p}}^{s},{\mathbf{q}}^{s})|\mathrm{\hspace{0.17em}1}\le s\le \mathcal{S};{\mathbf{p}}^{s}={\left[\begin{array}{cc}\hfill {({\mathbf{p}}^{s})}^{\prime}\hfill & \hfill 1\hfill \end{array}\right]}^{\top},\\ \hfill {\mathbf{q}}^{s}={\left[\begin{array}{cc}\hfill {({\mathbf{q}}^{s})}^{\prime}\hfill & \hfill 1\hfill \end{array}\right]}^{\top}\text{for}\{{({\mathbf{p}}^{s})}^{\prime},{({\mathbf{q}}^{s})}^{\prime}\}\in {\mathbb{R}}^{3}\}.\end{array}$$ |

Then we can write the error term in Eqn. 9 as

$\begin{array}{cc}\hfill {e}^{s}(\mathbf{M}(k))& =\parallel \left[\begin{array}{c}\hfill {({\mathbf{p}}^{s})}^{\prime}\hfill \\ \hfill 1\hfill \end{array}\right]-\hfill \\ & ({\mathbf{I}}_{4}+\mathrm{\Delta}\U0001d52a(k))\mathbf{M}(k-1)\left[\begin{array}{c}\hfill {({\mathbf{q}}^{s})}^{\prime}\hfill \\ \hfill 1\hfill \end{array}\right]\parallel \hfill \end{array}$ | (17) | |||

$\begin{array}{cc}\hfill \Rightarrow {e}^{s}(\mathbf{M}(k))& =\parallel \left[\begin{array}{c}\hfill {({\mathbf{p}}^{s})}^{\prime}-{({\mathbf{q}}^{s})}^{\prime}\hfill \\ \hfill 0\hfill \end{array}\right]-\hfill \\ & \left[\begin{array}{ccc}\hfill {[\bm{\omega}]}_{\times}\hfill & \hfill |\hfill & \hfill \mathbf{u}\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill |\hfill & \hfill 0\hfill \end{array}\right]\mathbf{M}(k-1)\left[\begin{array}{c}\hfill {({\mathbf{q}}^{s})}^{\prime}\hfill \\ \hfill 1\hfill \end{array}\right]\parallel \hfill \end{array}$ | (18) |

where we get $\mathrm{\Delta}\U0001d52a(k)=\left[\begin{array}{ccc}\hfill {[\bm{\omega}]}_{\times}\hfill & \hfill |\hfill & \hfill \mathbf{u}\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill |\hfill & \hfill 0\hfill \end{array}\right]$ from Eqn. 2. Rewriting Eqn. 18 with $\U0001d533={\left[\begin{array}{cc}\hfill \bm{\omega}\hfill & \hfill \mathbf{u}\hfill \end{array}\right]}^{\top}$, $\mathbf{M}(k-1)=\left[\begin{array}{ccc}\hfill \mathbf{R}(k-1)\hfill & \hfill |\hfill & \hfill \mathbf{t}(k-1)\hfill \\ \hfill \mathrm{\U0001d7ce}\hfill & \hfill |\hfill & \hfill 1\hfill \end{array}\right]$ (from Eqn. 1), and by dropping the trailing $0$ for ease of representation, we get,

$$\begin{array}{c}\hfill {e}^{s}(\mathbf{M}(k))=\parallel \left[\begin{array}{c}\hfill -{\left[\mathbf{R}(k-1){({\mathbf{q}}^{s})}^{\prime}+\mathbf{t}(k-1)\right]}_{\times}|{\mathbf{I}}_{3}\hfill \end{array}\right]\U0001d533-\\ \hfill ({({\mathbf{p}}^{s})}^{\prime}-{({\mathbf{q}}^{s})}^{\prime})\parallel .\end{array}$$ | (19) |

Thus, we have in Eqn. 9,

${\mathbf{A}}^{s}$ | $=\left[\begin{array}{c}\hfill -{\left[\mathbf{R}(k-1){({\mathbf{q}}^{s})}^{\prime}+\mathbf{t}(k-1)\right]}_{\times}|{\mathbf{I}}_{3}\hfill \end{array}\right],$ | (20) | ||

${\mathbf{b}}^{s}$ | $={({\mathbf{p}}^{s})}^{\prime}-{({\mathbf{q}}^{s})}^{\prime}.$ | (21) |

## Appendix B Algorithm for Joint Multiview Registration

Similar to our algorithm for robust motion estimation between a pair of 3D scans, we state our solution for the robust motion estimation of a set of $N\phantom{\rule{veryverythickmathspace}{0ex}}(\ge 2)$ 3D scans as given in Algorithm B.

[h]
$\{\left\{({\mathbf{p}}_{i}^{1},{\mathbf{p}}_{j}^{1})\mathrm{\cdots}({\mathbf{p}}_{i}^{{\mathcal{S}}_{ij}},{\mathbf{p}}_{j}^{{\mathcal{S}}_{ij}})\right\}|(i,j)\in \mathcal{E}\}$ (according to the viewgraph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$)

Output: $\mathbb{M}=\{{\mathbf{M}}_{i}|{\mathbf{M}}_{i}\in \mathbb{S}\mathbb{E}(3)\forall i\in [1\mathrm{\cdots}N]\}$ (Robust estimate of motion of the set of $N=|\mathcal{V}|$ scans)

Initialization: $\mathbb{M}$ is set to initial guess ${\mathbb{M}}_{initial}$

[l] \While$||\mathbb{v}||>N\u03f5$ \Comment$\mathbb{v}={\left[\begin{array}{ccc}\hfill {\U0001d533}_{1}\hfill & \hfill \mathrm{\cdots}\hfill & \hfill {\U0001d533}_{N}\hfill \end{array}\right]}^{\top}$ \State1. Compute $\{({\mathbf{A}}_{ij}^{s},{\mathbf{b}}_{ij}^{s})|\forall s\in [1\mathrm{\cdots}{\mathcal{S}}_{ij}];(i,j)\in \mathcal{E}\}$ using Eqn. 13 \State2. Compute weights ${w}_{ij}^{s}=\frac{{\rho}^{\prime}({e}_{ij}^{s})}{{e}_{ij}^{s}}$ as used in Eqn. 15 \State3. Estimate $\mathbb{v}$ as IRLS solution for Eqn. 15 \State4. Update ${\mathbf{M}}_{i}\leftarrow \mathrm{exp}(\widehat{{\U0001d533}_{\U0001d526}}){\mathbf{M}}_{i}$ $\forall i\in [1\mathrm{\cdots}N]$ \EndWhile

## Appendix C More Results

UWA dataset: Table 5 reports the performace of the motion step of the Fast Global Registration (FGR) method of [12] as well as all the 3 versions of our method on the UWA dataset [50]. This dataset consists of 5 objects and 50 scenes, with each scene consisting of a combination of these objects. The task is to align individual objects to the scenes, given that each scene contains substantial clutter and occlusion. A total of 188 such alignment tasks are provided in the dataset. As we can see from Table 5, the ${L}_{\frac{1}{2}}$ version of our method produces the lowest median rotation angle and median translation norm errors. It is also significantly faster than FGR [12].

Method | Median RAE | Median TNE | Mean Time |
---|---|---|---|

FGR [12] | 1.276 | 0.152 | 32.7 |

Our ${L}_{\frac{1}{2}}$ | 0.319 | 0.034 | 10.1 |

Our ${L}_{1}$ | 0.824 | 0.108 | 7.0 |

Our GM | 1.276 | 0.152 | 14.5 |

Relative improvement of joint multiview approach over two-stage motion averaging approach: As described in Section 5.2, we show in Table 6 that the reconstruction error of the scenes in the Augmented ICL-NUIM dataset [11] decreases when we use our joint multiview estimation procedure on top of the two-stage motion averaged approach. We show the improvement achieved using the ${L}_{\frac{1}{2}}$ loss, which is our best-performing version.

livingroom 1 | livingroom 2 | office 1 | office 2 | |
---|---|---|---|---|

Before MV | 0.07 | 0.07 | 0.06 | 0.07 |

After MV | 0.04 | 0.05 | 0.03 | 0.04 |

## Appendix D An Illustration of the Limitation of FPFH Feature-Matching Based Registration

As discussed in Section 6.2, we have presented a scenario where the FPFH feature-matching based registration technique breaks down due to unreliability of the feature matches themselves. In this particular scenario, we have 23 scans of a life-size statue of Mahatma Gandhi collected with a standard commercial depth camera. Figure 8 shows the plan view of a schematic of the cameras (represented as balls) around the statue, as recovered by our ICP-based multiview approach. Recall that these cameras are the nodes of the viewgraph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$. We also display a schematic of the edges in the viewgraph $\mathcal{G}$ (using sticks). The thickness of each edge is proportional to the number of FPFH feature matches found between the correponding camera (or equivalently scan) pair.

We can observe from Figure 8 that the thinnest edges are found between pairs of cameras at different depths, implying that there are extremely few FPFH feature matches between these cameras. Compounding this observation with the fact that FPFH features are noisy to begin with, the resultant motions between these cameras, even with our robust cost function, are grossly incorrect. In contrast, our ICP-based multiview approach can, albeit at a higher computational cost, align these cameras correctly and produce the desired reconstruction.