Efficient and Robust Registration on the 3D Special Euclidean Group

  • 2019-04-11 03:52:42
  • Uttaran Bhattacharya, Venu Madhav Govindu
  • 25

Abstract

We present an accurate, robust and fast method for registration of 3D scans.Our motion estimation optimizes a robust cost function on the intrinsicrepresentation of rigid motions, i.e., the Special Euclidean group$\mathbb{SE}(3)$. We exploit the geometric properties of Lie groups as well asthe robustness afforded by an iteratively reweighted least squaresoptimization. We also generalize our approach to a joint multiview method thatsimultaneously solves for the registration of a set of scans. We demonstratethe efficacy of our approach by thorough experimental validation. Our approachsignificantly outperforms the state-of-the-art robust 3D registration methodbased on a line process in terms of both speed and accuracy. We also show thatthis line process method is a special case of our principled geometricsolution. Finally, we also present scenarios where global registration based onfeature correspondences fails but multiview ICP based on our robust motionestimation is successful.

 

Quick Read (beta)

Efficient and Robust Registration on the 3D Special Euclidean Group

Uttaran Bhattacharya
Department of Computer Science
University of Maryland College Park
[email protected]
&Venu Madhav Govindu
Department of Electrical Engineering
Indian Institute of Science
[email protected]
Abstract

We present an accurate, robust and fast method for registration of 3D scans. Our motion estimation optimizes a robust cost function on the intrinsic representation of rigid motions, i.e., the Special Euclidean group 𝕊𝔼(3). We exploit the geometric properties of Lie groups as well as the robustness afforded by an iteratively reweighted least squares optimization. We also generalize our approach to a joint multiview method that simultaneously solves for the registration of a set of scans. We demonstrate the efficacy of our approach by thorough experimental validation. Our approach significantly outperforms the state-of-the-art robust 3D registration method based on a line process in terms of both speed and accuracy. We also show that this line process method is a special case of our principled geometric solution. Finally, we also present scenarios where global registration based on feature correspondences fails but multiview ICP based on our robust motion estimation is successful.

 

Efficient and Robust Registration on the 3D Special Euclidean Group


  Uttaran Bhattacharya Department of Computer Science University of Maryland College Park [email protected] Venu Madhav Govindu Department of Electrical Engineering Indian Institute of Science [email protected]

June 23, 2019

1 Introduction

The availability of consumer depth cameras has allowed us to acquire reliable 3D scans of a scene or an object [1, 2, 3]. To build a geometrically consistent 3D reconstruction, we need to solve the key problem of aligning or registering all scans in a global frame of reference [4, 5, 6, 7]. The resulting solution can be used in a diverse range of contexts including human-computer interaction, archiving of cultural objects, industrial inspection etc. There is a wide variety of solutions to the 3D registration problem in the literature [8, 9, 10, 11, 12]. All of these methods involve a trade-off between speed and accuracy. Recently, [12] has presented a method for fast global registration (henceforth denoted FGR) of 3D scans based on the robustness of a line process. This method has been shown to outperform existing methods in terms of both speed and accuracy.
As in [12], given a set of 3D feature correspondences, we pose the registration problem as one of solving for the rigid Euclidean motion that minimizes a robust cost function. However, unlike their approach, our solution systematically utilises the rich geometric structure of the underlying Lie group representation for 3D motion, i.e., the Special Euclidean group 𝕊𝔼(3). In this context, we achieve robustness via the iteratively reweighted least squares (IRLS) method. The key observation of our paper is that our specific combination of geometry of rigid Euclidean motion and the robustness of IRLS allows our method to significantly outperform the fast global registration method of [12] without having to take recourse to tuning of parameters. Additionally, we show that the solution proposed in [12] is a special case of our more general (and more accurate and faster) solution. In the process, we demonstrate that we can gain both theoretical insight as well as improved performance by utilizing the rich geometric structure of the underlying geometric representation of rigid Euclidean motions.
Furthermore, our work also addresses two important considerations. Firstly, in order to achieve robustness, some loss functions used for optimization have parameters that need to be either provided a priori or estimated in situ. This is the case with [12] which uses the Geman-McClure loss function. We argue that for the problem of 3D registration the statistical efficiency (i.e. accuracy) trade-off inherent to all robust estimators can be easily optimized using loss functions that are parameter free. This obviates the need to estimate any parameter during the course of optimization. Secondly, we argue that while fast and accurate 3D registration can be achieved using 3D point feature correspondences, this approach has certain limitations. While such feature correspondences can be reliably obtained when the camera motion is small (equivalently there is significant overlap between scans), there are certain scenarios where the feature correspondences break down. For such cases, we demonstrate that accurate joint registration of multiple 3D scans can be achieved by incorporating our robust motion estimation method into local methods such as the ICP.

2 Literature Survey

Although the literature of 3D registration is extensive, in the following we only focus on aspects that are directly relevant to our method. A large number of methods for registering 3D scans using point correspondences have two key aspects, (a) a method for establishing point correspondences between two 3D scans and (b) a method for estimating motion given an adequate number of such correspondences. We may further classify methods according to whether they use a fixed set of point correspondences [11, 12, 13, 14] or update them [15, 16, 17, 7, 18]. In the former instance, we use invariant feature representations to find point correspondences across a pair of scans which are then used for motion estimation. In the latter approach, we alternately update point correspondences using nearest neighbors and motion estimation till convergence, such as in the classical approach of ICP and its variants (see [15, 18, 19, 20] and references therein). Independent of the method for establishing correspondences, we need a method for robust motion estimation method given a set of correspondences. This problem of robust motion estimation is the main focus of this paper.
The least squares solution to the motion estimation problem is the classical method of Umeyama [21]. However, the least squares solution breaks down in the presence of outliers. The requisite robustness is often achieved using variants of RANSAC [16, 22, 23, 24, 7, 10] or motion clustering [17, 7, 25]. Other classes of approaches are based on the branch-and-bound framework [8, 26, 27, 28, 29]. However, all of these approaches often require expensive iterations and are slow to converge. Other solutions based on expectation-maximization using Gaussian mixture model representations of the scans [30, 31, 32] are similarly slow. Another approach that is relatively efficient is to perform robust averaging of pairwise motions between scans [33, 34, 35], or perform some variants of pose-graph optimization [11], to produce tightly registered scans. Yet another approach is to use the well-known IRLS method [36] to efficiently optimize robust cost functions [37]. The recent fast global registration method of [12] made use of the duality between robust estimators and line processes [38] to develop a fast approach for global registration. This method has been shown to produce the best results till date in terms of speed as well as accuracy.

3 Lie Group Structure of Euclidean Motions

Our method utilizes the geometric structure of Lie groups, specifically that of the Special Euclidean group 𝕊𝔼(3). In this section, we provide a brief description of the group structure and relevant Lie group properties. The Euclidean motion between two 3D scans consists of a rotation and a translation. While rotations can be represented in a variety of ways, we represent them as 3×3 orthonormal matrices 𝐑, i.e., 𝐑𝐑=𝐈3 and |𝐑|=+1 (here and for the rest of this paper, we use 𝐈n to denote the n×n identity matrix). A rigid Euclidean motion (𝐑,𝐭) (𝐭3) can then be compactly represented by a 4×4 matrix

𝐌=[𝐑|𝐭𝟎|1]. (1)

The matrices 𝐑 and 𝐌 satisfy the properties of a matrix group and also form smooth, differential manifolds, i.e., they are Lie groups. Thus, 𝐑𝕊𝕆(3) and 𝐌𝕊𝔼(3) where 𝕊𝕆(3) and 𝕊𝔼(3) are the Special Orthogonal and the Special Euclidean groups respectively. We also note that 𝐑 and 𝐌 have 3 and 6 degrees of freedom respectively.

Lie Groups: The Lie group structure of 𝕊𝔼(3) plays an important role in our formulation. Detailed expositions on the geometric properties of this representation are presented in  [39, 40]. For our purposes, it will suffice to note that for finite dimensional Lie groups (matrix groups) the product and inverse operations are differentiable mappings. Every point in a Lie group has a local neighborhood (tangent space) called its Lie algebra which has the properties of a vector space. For 𝐑𝕊𝕆(3) and 𝐌𝕊𝔼(3), the corresponding Lie algebra are denoted as [𝝎]×𝔰𝔬(3) and 𝔪𝔰𝔢(3) respectively. Here [𝝎]×𝔰𝔬(3) is the skew-symmetric form of the axis-angle rotation representation 𝝎. In this representation, 𝐑 represents a rotation by angle ||𝝎|| about the axis 𝝎||𝝎||. Further, we can move from a Lie group to its Lie algebra and vice-versa using the logarithm and the exponential mappings respectively. Thus, 𝐑=exp([𝝎]×) and [𝝎]×=log(𝐑). Similarly, we have 𝐌=exp(𝔪) and 𝔪=log(𝐌) with the forms

𝔪=log(𝐌)=[[𝝎]×|𝐮𝟎|0];𝐌=exp(𝔪)=k=0𝔪kk!=[𝐑|𝐭𝟎|1]. (2)

Further, we note that the exp() and log() mappings for 𝕊𝕆(3) and 𝕊𝔼(3) have closed form expressions that can be efficiently implemented. Specifically,

𝐑=exp([𝝎]×)=𝐈3+sinθ[𝝎]×+(1-cosθ)[𝝎]×2 (3)
𝐭=𝐏𝐮 for 𝐏=𝐈3+(1-cosθ)θ2[𝝎]×+(θ-sinθ)θ3[𝝎]×2 (4)

where θ=||𝝎||. We also note that the special properties of the group structure of 𝕊𝔼(3) are discussed in [41].

4 Robust Motion Estimation

Given 𝒮3 pairs of point correspondences {(𝐩s,𝐪s)|1s𝒮} between a pair of scans, we can solve for the motion 𝐌 required for aligning the pair of scans. In the case of global approaches, where the motion can potentially be large, such point correspondences are obtained by matching geometric features such as FPFH [23] whereas in iterative schemes like ICP, the correspondence of points is obtained by finding the nearest neighbor match on the second scan for a point on the first one. Due to a variety of sources of error, the correspondence pairs are not in perfect alignment or could be grossly incorrect, i.e., es=𝐩s-𝐌𝐪s0, where es denotes the norm of the discrepancy in registration for the s-th correspondence pair (𝐩s,𝐪s). When we assume that the individual errors have a zero-mean, iid Gaussian distribution, the optimal estimate for 𝐌 is obtained by a least squares minimization and has a closed form [21]. However, since correspondences could be highly incorrect in practice, instead of a least squares formulation, we pose motion estimation as the optimization of a robust cost function

min𝐌𝕊𝔼(3)𝒞(𝐌)=s=1𝒮ρ(𝐩s-𝐌𝐪s)=s=1𝒮ρ(es(𝐌)) (5)

where ρ() is a robust loss function and we also note that the individual error terms es(𝐌) are a function of the motion 𝐌. The use of robust estimators is well studied in statistics as an M-estimation problem and has been successfully used in a variety of vision problems [42]. However, in addition to robustness, in Eqn. 5 we require our solution to satisfy the geometric constraints for 𝐌𝕊𝔼(3). These requirements of robust estimation under geometric constraints are satisfied by our solution.

4.1 Proposed Solution for Pairwise Registration

We propose to minimize the cost function 𝒞(𝐌) in Eqn. 5 in an iterative fashion. Let the estimate for 𝐌 at the (k-1)-th iteration be denoted 𝐌(k-1). In the k-th iteration, let us update the motion matrix by Δ𝐌(k), i.e., 𝐌(k)=Δ𝐌(k)𝐌(k-1). Noting that here the matrix multiplication is not commutative, our formulation uses a left-invariant metric on 𝕊𝔼(3) [41, 43]. Using a first-order approximation for the motion update matrix Δ𝐌(k), we have

Δ𝐌(k) 𝐈4+Δ𝔪(k) (6)
𝒞(𝐌(k)) =s=1𝒮ρ(𝐩s-𝐌(k)𝐪s) (7)
=s=1𝒮ρ(𝐩s-(𝐈4+Δ𝔪(k))𝐌(k-1)𝐪s). (8)

Here we note that the Lie algebra matrix Δ𝔪(k) encodes the 6 parameters that we need to estimate for the update Δ𝐌(k). We can obtain the vector representation for these 6 parameters using the ‘vee’ operator, i.e., 𝔳=Δ𝔪=[𝝎𝐮]. Since we use a first-order approximation in Eqn. 6, the cost 𝒞(𝐌(k)) is linear in Δ𝔪(k). We equivalently note that it is also linear in 𝔳(k). Thus, we rewrite the individual error terms es(𝐌(k)) as

es(𝐌(k))=𝐩s-(𝐈4+Δ𝔪(k))𝐌(k-1)𝐪s=𝐀s𝔳-𝐛s (9)

where 𝐀s and 𝐛s are the appropriate matrices. The derivation of the explicit forms of 𝐀s and 𝐛s are given in the appendix. To obtain the update in the k-th iteration, we now optimize the cost 𝒞(𝐌(k))=s=1𝒮ρ(es(𝔳)) w.r.t. 𝔳, and get,

ρ(es)es(𝐀s)𝐀s𝔳 =ρ(es)es(𝐀s)𝐛s (10)

for each summand indexed by s, where ρ()=ρe is the influence function of the robust loss ρ(). We may further denote ws=ρ(es)es, which is the relative weight accorded to the s-th equation in Eqn. 10. Collecting all such relationships obtained for each pair of correspondences (𝐩s,𝐪s) into a single system of equations we have

𝐀𝐖𝐀𝔳=𝐀𝐖𝐛 (11)

where 𝐀=[𝐀1𝐀𝒮], 𝐛=[𝐛1𝐛𝒮], and 𝐖=[w1𝐈3w𝒮𝐈3].
Eqn. 11 is a weighted linear system of equations with the solution 𝔳=(𝐀𝐖𝐀)-1𝐀𝐖𝐛. However, it should be noted that each individual weight ws is a function of the error es which, in turn, is dependent on 𝔳, since es=𝐀s𝔳-𝐛s. Thus, in Eqn. 11, the equivalent relationship is 𝐀𝐖(𝔳)𝐀𝔳=𝐀𝐖(𝔳)𝐛. The solution for this system of equations is the well-known iteratively reweighted least squares (IRLS) method [36, 44]. In the IRLS method, in each iteration the weights ws are estimated based on the current estimate of 𝔳. Given these weights, 𝔳 is re-estimated. This process is repeated till convergence.
Given a solution for 𝔳, we can estimate Δ𝐌(k)=exp(𝔳^(k)) where the ‘hat’ operator 𝔳^ converts the estimated Lie algebra parameters 𝔳 into its equivalent matrix representation Δ𝔪(k). We emphasize here that although in Eqn. 6, we assumed a first-order approximation, we map the estimated 𝔳(k) into an intrinsic estimate of the motion update, i.e., Δ𝐌(k)=exp(𝔳^(k))𝕊𝔼(3). In other words, a first-order approximation in an intermediate step does not mean that the actual cost function is approximated. The mapping Δ𝔪(k)=exp(𝔳^(k)) ensures that the estimated motion 𝐌(k) is always a valid member of the 𝕊𝔼(3) group. We may now state our solution for robust motion estimation as given in Algorithm 4.1.

{algorithm}

[h] IRLS estimation of pairwise 3D registration Input: {(𝐩1,𝐪1)(𝐩𝒮,𝐪𝒮)} (𝒮 correspondences across a pairs of scans)
Output: 𝐌𝕊𝔼(3) (Robust estimate of motion between scans)
Initialization: 𝐌 is set to 4×4 identity matrix

\[email protected]@algorithmic

[l] \While||𝔳||>ϵ \State1. Compute {(𝐀s,𝐛s)|s[1𝒮]} using Eqn. 9 \State2. Compute weights ws=ρ(es)es as defined by Eqn. 10 \State3. Estimate 𝔳 as IRLS solution for Eqn. 11 \State4. Update 𝐌exp(𝔳^)𝐌 \EndWhile Algorithm 4.1 is an iterative algorithm with a nested iteration. The outer loop is defined by the while statement and we denote its number of iterations as Kouter. The inner loop consists of the IRLS step of line 3 in Algorithm 4.1 since IRLS is itself an iterative method. We denote the number of iterations of the IRLS step as KIRLS.

4.2 Extension to Joint Multiview Registration

In Algorithm 4.1 we presented the registration solution for two scans. This approach can be extended to the simultaneous or joint multiview registration of a set of scans. Towards this end, we define a viewgraph 𝒢={𝒱,} where vi𝒱 represents the Euclidean motion of the i-th scan (equivalently camera) and an edge (i,j) signifies that the relative Euclidean motion between the i-th and j-th scan can be determined from available matches. Further, we denote the number of scans as N=|𝒱|. We may now define the cost function to be optimized for joint multiview registration as follows

𝒞(𝕄)=(i,j)s=1𝒮ijρ(𝐌i𝐩is-𝐌j𝐩js)=(i,j)s=1𝒮ijρ(eijs(𝕄)) (12)

where 𝕄={𝐌1𝐌N} denotes the set of absolute motions of each of the N scans w.r.t. to a global frame of reference and 𝒮ij is the number of correspondences between the i-th and j-th scan. We again use an iterative approach to minimize the cost function 𝒞(𝕄) in Eqn. 12 w.r.t. 𝕄. In the k-th iteration, we update each motion matrix 𝐌i by Δ𝐌i(k), i.e., 𝐌i(k)=Δ𝐌i(k)𝐌i(k-1) i[1N]. Using a first-order approximation for each update matrix Δ𝐌i(k), we have

eijs(𝕄(k))=(𝐈4+Δ𝔪i(k))𝐌i(k-1)𝐩is-(𝐈4+Δ𝔪j(k))𝐌j(k-1)𝐩js (13)
=𝐀ijs𝕧-𝐛ijs (14)

where 𝕧=[𝔳1𝔳N] collates the corresponding vector representations of each of the Lie algebra matrices Δ𝔪i(k), and 𝐀ijs and 𝐛ijs are constructed analogous to Eqn. 9. The subsequent update in the k-th iteration is then analogously obtained from the relation

𝔸𝕎(𝕧)𝔸𝕧=𝔸𝕎(𝕧)𝕓 (15)

where 𝔸, 𝕓 and 𝕎(𝕧) correspondingly collate all 𝐀ijs, 𝐛ijs and 𝐰ijs=ρ(eijs)eijs. As earlier, the solution for the system of equations in Eqn. 15 is the IRLS method, where we estimate the weights wijs and the collated vector representation 𝕧 in alternating iterations till convergence.
Given a solution for 𝕧, we can estimate, for each i[1N], Δ𝐌i(k)=exp(𝔳i^(k)) and thereby update each member of 𝕄(k) as 𝐌i(k)Δ𝐌i(k)𝐌i(k-1). It should also be noted that in multiview registration, the choice of the global frame of reference for the set 𝕄 is arbitrary. For our implementation, we fix it to the first scan without loss of generality, i.e., we set 𝐌1=𝐈4 and do not update 𝐌1 throughout course of optimization. The detailed algorithm is presented in the appendix.

5 Results

In recent literature, the fast global registration (FGR) method of  [12] has been shown to outperform other global registration and local refinement algorithms in terms of speed and registration accuracy. Therefore, owing to space constraints, we confine the comparison of our results mostly to those of FGR as this will suffice to show where our performance stands in terms of the other registration algorithms. Furthermore, we test our algorithm on 3 different choices of the loss function ρ(), namely L12: ρ(x)=|x|, L1: ρ(x)=|x|, and scaled Geman-McClure: ρ(x)=μx2μ+x2, where μ is the scale factor. The FGR approach of [12] uses only the scaled Geman-McClure loss function and anneals μ in fixed steps per iteration. To enable comparison, in our tests with the scaled Geman-McClure loss function we anneal μ in an identical manner. We present results on both pairwise and multiview registration tests. For these pairwise and multiview tests we terminate using ϵ=10-5 and ϵ=10-7 respectively. All our reported running times are measured on a single thread on an Intel Xeon(R) E5-2650 v3 processor clocked at 2.30 GHz.

5.1 Pairwise Registration

We present the performance of our proposed pairwise registration on the synthetic range data provided by [12] and on the 4 indoor sequences in the Augmented ICL-NUIM dataset provided by Choi et al. [11]. In all our pairwise tests, we use KIRLS=2. We compare the registration errors given by the 3 versions of our method with the following prior methods: Super4PCS [7], GoICP [29], Choi et al. [11], FGR [12] and DARE [45] (using the hyperparameters suggested by the authors). Registration errors comprise of statistical measures on the distances between the ground-truth point correspondences between pairs of scans post alignment. We also report the trajectory errors, which include statistical measures on both the rotation errors and the translation norm errors between the pairs of cameras (corresponding to the given scans) w.r.t. the ground-truth camera pair, for all the methods.
Synthetic range dataset: We perform the set of controlled experiments described in [12] on each of their 5 synthetic datasets, at the given Gaussian noise levels σ=0.0025 and σ=0.005 (for each model, σ is in units of the surface diameter). Note that adding Gaussian noise to a pair of scans introduces outliers in the point correspondences that are computed between that pair. Since depth cameras produce noisy scans in practice, this is a realistic way of increasing the outlier percentage in the point correspondences for synthetic scans. Table 1 lists for each method and for each noise level, the mean and maximal RMSE on the aligned ground truth correspondences. For both noise levels, our method with the L12 loss function attains the lowest registration error together with FGR. Table 2 reports the mean running time of the motion step of each method for each of the 5 models in the dataset. The L12 of our method is more 3× faster than all prior methods on the average, and the L1 version is more than 5× faster.

Method σ=0.0025 σ=0.0050
Md.RAE Md.TNE Mn.RMSE Mx.RMSE Md.RAE Md.TNE Mn.RMSE Mx.RMSE
Super4PCS [7] 0.864 0.008 0.014 0.029 1.468 0.012 0.017 0.095
GoICP [29] 1.207 0.011 0.032 0.133 1.736 0.019 0.037 0.127
Choi et al. [11] 0.778 0.006 0.008 0.022 1.533 0.015 0.035 0.274
FGR [12] 0.749 0.005 0.004 0.011 1.146 0.008 0.006 0.017
DARE [45] 1.851 0.013 0.035 0.176 2.005 0.025 0.054 0.312
Our L12 0.545 0.004 0.004 0.011 0.959 0.008 0.006 0.017
Our L1 0.566 0.004 0.004 0.011 1.516 0.011 0.007 0.017
Our GM 0.725 0.005 0.004 0.011 1.146 0.008 0.006 0.017
Table 1: Median rotation angle error (Md.RAE) (in degrees), median translation norm error (Md.TNE), mean RMSE (Mn.RMSE) and maximal RMSE (Mx.RMSE) (all in units of the surface diameters) achieved by each method for each noise level σ on the synthetic range datasets.
Dataset Mean # points Super 4PCS [7] GoICP [29] Choi et al. [11] FGR [12] DARE [45] Our L12 Our
L1
Our GM
Bimba 9,416 16,230 1,550 650 11.9 920 3.9 2.5 5.5
Child’n 11,148 18,410 1,620 890 16.8 960 4.8 3.3 8.0
Dragon 11,232 20,520 1,840 970 17.6 1,090 5.0 3.4 8.2
Angel 12,072 29,640 3,000 1,090 19.1 1,770 5.3 3.8 8.9
Bunny 13,357 38,470 5,530 1,170 21.6 3,310 7.4 5.1 9.9
Mean 11,445 24,650 2,710 960 17.4 1,610 5.3 3.6 8.1
Table 2: Running time (in milliseconds) of the motion step of each method for each model in the synthetic range dataset.

Augmented ICL-NUIM dataset: Each of the 4 sequences in the Augmented ICL-NUIM dataset provided by Choi et al. [11] consist of 2350 to 2870 scans of an indoor scene. Moreover, the given scans are provided in a smooth temporal sequence, i.e., pairs of scans with proximity in timestamp also have sufficient view overlap with each other. This, in turn, leads to reliable FPFH feature matches between such pairs. We therefore tested the performance of all the methods for all pairs of scans (i,j) in each sequence such that |i-j|10. Table 3 lists the results on each dataset for the various methods under consideration. For each dataset and corresponding method, we list the median rotation angle error and the median translation norm error of the recovered pairwise camera motions, as well as the mean computation time on the pairs. The L12 version of our method performs the best in terms of the trajectory error statistics. It can also be seen to be significantly faster than the FGR method of [12]. More such pairwise results on other datasets are given in the appendix.

Method livingroom 1 livingroom 2 office 1 office 2
Md.RAE Md.TNE Mn.Time Md.RAE Md.TNE Md.Time Md.RAE Md.TNE Mn.Time Md.RAE Md.TNE Mn.Time
Super4PCS [7] 1.104 0.039 368,030 0.616 0.033 344,720 0.932 0.038 367,980 0.844 0.027 345,460
GoICP [29] 1.336 0.071 35,110 0.992 0.058 33,420 1.365 0.066 34,450 1.104 0.047 32,530
Choi et al. [11] 0.941 0.041 14,740 0.551 0.031 13,850 0.811 0.036 14,720 0.765 0.029 13,990
FGR [12] 0.793 0.029 272 0.482 0.021 181 0.707 0.020 272 0.669 0.016 177
DARE [45] 1.305 0.044 21,500 1.172 0.059 20,320 1.716 0.037 21,110 1.286 0.068 20,920
Our L12 0.595 0.023 61 0.380 0.017 50 0.474 0.014 59 0.437 0.011 45
Our L1 0.964 0.025 33 0.419 0.019 27 0.569 0.017 33 0.524 0.013 25
Our GM 0.793 0.029 118 0.482 0.021 87 0.707 0.020 118 0.669 0.016 89
Table 3: Median rotation angle error (RAE) (in degrees), median translation norm error (TNE) (in meters) and mean running time (in milliseconds) of the motion step of each method for each sequence in the Augmented ICL-NUIM dataset.

5.2 Joint Multiview Registration

We present the performance of our joint multiview registration algorithm on the 4 sequences in the Augmented ICL-NUIM dataset, specifically, on the 47 to 57 scene fragments that were provided for each sequence by Choi et al. [11]. We use KIRLS=3 for multiview registration, as the joint optimization variable and its corresponding search space are both large (𝕧6(N-1) in Eqn. 15 for N cameras). First, we compute pairwise motion estimates between fragments followed by a robust motion averaging step on 𝕊𝔼(3) that is similar to that used for rotation averaging in [46]. The output of this two-stage approach is used to initialize the joint multiview optimization in Eqn. 12. The main drawback of only using the two-step approach for global registration is that it is not a true global method. It only considers local point correspondences, and then averages out the errors made by the pairwise motion estimates. Conversely, the joint multiview approach deals with point correspondences in the global setting and solves for the global cost function. The relative improvement in reconstruction error gained by using the joint multiview approach on top of the two-stage approach is shown in a table in the appendix.
We compare our results with those of Choi et al. [11] and FGR [12]. While we are aware of other approaches to multiview registration including closed form approaches, we omit them from our comparisons because they failed to provide a solution in the large-scale dataset for multiview registration we have used. For example, Bartoli et al. [47] and Bergström et al. [48], among others, compute transformations using a closed form SVD solution that do not scale to large scale data. Other alternates, such as that of Fitzgibbon et al. [49], use an LM-based approach, which are slow and do not exploit the geometry of the problem.
Table 4 lists the mean registration error from the ground truth surface achieved by each compared method on each sequence as well as the time taken to complete execution. For estimating the mean registration error, we use the CloudCompare software available at http://www.cloudcompare.org.
Once again, the L12 version of our method performs the best overall in terms of registration error and is significantly faster than the FGR method of [12]. Also our L1 method is the fastest amongst all methods with a slight drop in accuracy compared to our L12 method. Finally, Figure 1 shows a complete reconstruction of the sequence livingroom 2 from the Augmented ICL-NUIM dataset produced by the L12 version of our multiview method. Reconstructions of other sequences are given in the appendix.

Method livingroom 1 livingroom 2 office 1 office 2
MRE Time MRE Time MRE Time MRE Time
Choi et al. [11] 0.04 8,940 0.07 3,360 0.03 4,500 0.04 4,080
FGR [12] 0.05 131 0.06 81 0.03 69 0.05 48
Our L12 0.04 71 0.05 49 0.03 42 0.04 32
Our L1 0.07 62 0.09 40 0.04 36 0.06 28
Our GM 0.05 88 0.06 70 0.03 55 0.05 41
Table 4: Mean registration error (MRE) (in meters) and running time (in seconds) for each method for full reconstruction from the fragments of each sequence in the Augmented ICL-NUIM dataset.
Figure 1: Reconstruction of the livingroom 2 sequence from the Augmented ICL-NUIM dataset, as given by our method with the L12 loss function

6 Discussion

As we demonstrated in Section 5, our motion estimation method in Algorithm 4.1 is both fast and accurate. More specifically, our method outperforms the state-of-the-art FGR method of [12] in terms of both speed and accuracy. Given their strong similarities, in Section 6.1 we examine the relationship of the method of [12] and our approach in Algorithm 4.1. Following that, we discuss the limitations of using FPFH for feature matching and our approach to overcome those in Section 6.2.

6.1 Comparison with FGR [12]

In [12], the cost function to be minimized is the same as that of Eqn. 5. However, for minimizing this cost function, they use a line process optimization. Originally developed in [38] for modelling discontinuities, a line process optimization can be shown to be equivalent to optimizing a robust estimator. Recalling that es=𝐩s-𝐌𝐪s, we define a cost function

E(𝐌,𝐋)=s=1𝒮(es)2ls+ψ(ls) (16)

where 𝐋=[l1l𝒮] is the collection of line processes ls for each correspondence pair (𝐩s,𝐪s) and ψ() is a penalty term for each line process. Here ψ(l) is a monotonically decreasing function designed such that when l=0, ψ(l) is a fixed non-negative constant and when l=1, ψ(l)=0. Thus, varying the line process l in the interval [0,1] allows us to move between a least-squares and a robust regime. In [38] it has been shown that for every choice of loss function ρ(), there is an equivalent ψ() such that minimization in Eqn. 16 yields the same solution as that of Eqn. 5. The FGR method of [12] utilizes the robustness of the line process to estimate the desired 𝐌𝕊𝔼(3). Using a first-order approximation of the motion update, [12] arrives at a Gauss-Newton solution (Eqn. 8 of [12]). It can be easily shown that this solution is identical to solving the system of equations in Eqn. 11 in our notation. In other words, while solving for the update step 𝔳(k), the FGR method of [12] implicitly carries out a single iteration of our IRLS step in line 3 of Algorithm 4.1. In contrast, in our approach we carry out KIRLS>1 iterations of the IRLS step to achieve better convergence.
Although the difference between our method and that of [12] is only in the number of iterations of the inner IRLS step, its implication is both subtle and profound. If we were solving a single IRLS problem in a vector space setting, this difference would have been immaterial. However, we note that in both of our approaches, the linear solution for the updates 𝔳(k) are interleaved with the non-linear motion updates 𝐌exp(𝔳^)𝐌, resulting in significantly different trajectories of the solution 𝐌(k) on the 𝕊𝔼(3) group. Specifically, in our case, by iterating the IRLS step to convergence we obtain the best possible estimate of 𝔳(k) in each intermediate step which, in turn, results in the best improvement of the estimate 𝐌(k) (equivalently the most reduction in the cost function 𝒞() in the k-th step).
Another noteworthy difference between the two methods is the choice of the parametrization of the motion representation. We use the geometrically correct form of Δ𝔪(k) in Eqn. 6, i.e., 𝔳=[𝝎𝐮] for our update step. However, for their update step, the authors of [12] use an extrinsic form of motion parametrization, i.e., [𝝎𝐭]. While our parametrization is geometrically consistent with Lie group theory, we can recognize from Eqn. 4 that the choice in [12] is approximately close to our representation for small motion, i.e., 𝐮𝐭 if and only if θ0. Conversely, for sufficiently large θ, the approximate representation [𝝎𝐭] is notably different from the exact representation [𝝎𝐮] of the 𝔰𝔢(3) form. Therefore the improvement per iteration in the cost function for the method of [12] is lower compared to our single iteration IRLS form. The result is that the line process-based method of [12] has slower convergence.
We highlight both these differences for a pair of scans from the Augmented ICL-NUIM dataset in Figure 3. For the purpose of illustration, we consider the L12 loss function for the line process optimization routine proposed in [12] as well as for our optimization routine. The reason we do not consider the Geman-McClure loss function is that it has a scale factor, which, in practice, is initialized at a large value and has to be progressively annealed during optimization. This annealing introduces artifacts that unnecessarily clutter the illustration of the underlying differences. In other words, we use L12 because it does not alter the fundamental properties of the two optimization routines, and at the same time lends itself to a clean illustration of our argument. We also note that in Figure 3, we represent the cost 𝒞(𝐌) as a function of the number of iterations Kouter, and that it is depicted on a log10 scale. For ease of visualization, we only show the plot for the iteration range [2,10]. Alongside the figure, Table 3 reports the number of iterations Kouter taken by each method to reach the different convergence criteria specified by ϵ.

Figure 2: Comparison of the line process solution with our method for different KIRLS. For ease of visualization, we show performance only for the iterations between 2 and 10.
ϵ LP Ours, KIRLS=  [12] 1 2 3 10-1 2 2 2 2 10-2 5 5 4 4 10-3 13 13 8 6 10-4 21 18 11 8 10-5 27 24 13 10 10-6 33 32 19 14 10-7 63 46 25 18
Figure 3: Number of iterations Kouter taken by each method to reach each convergence criterion ϵ

Firstly we observe from Table 3 that even though the line process is conceptually equivalent to our procedure with a single IRLS step, the proposed optimization routine of [12] takes more iterations than our actual procedure with a single IRLS step to converge to the same convergence criterion. This is because of the extrinsic (approximate) parametrization [𝝎𝐭] used in [12] as opposed to the correct Lie algebraic representation [𝝎𝐮]. Secondly, it is clear from both Figure 3 and Table 3 that the cost function converges in progressively fewer iterations as we increase the number of IRLS steps. However, it should be noted that increasing the number of IRLS steps makes each iteration of our optimization routine slower as well. Therefore, a balance has to be struck between the speed of each iteration and the number of iterations required for convergence. In practice, we have found that using 2 IRLS steps per iteration for pairwise registration and 3 IRLS steps per iteration for joint multiview registration yield our desired results. In any event, the key observation is that the single iteration of the FGR is insufficient and yields poorer convergence properties compared to our formulation.
Choice of Loss Function ρ(): The choice of the loss function ρ(.) to be used is a critical factor in our estimation procedure. All loss functions achieve robustness to outliers by a trade-off with statistical efficiency (i.e. accuracy). In practice, the accuracy achievable by a chosen loss function depends on the empirical nature of the error distribution. As discussed earlier, in [12] the authors use the Geman-McClure loss function ρ(x)=μx2μ+x2. Here the performance critically depends on chosing a good value for μ that reflects the outlier distribution inherent to the data used. In [12] the authors start with a large μ and progressively reduce it in fixed amounts with every iteration. However, if the nature of the data varies significantly, such a fixed annealing procedure may not produce the best possible results. To overcome this limitation of the Geman-McClure method, as we have shown in Section 5, we have also tested for L1 and L12 and demonstrated that the latter provides the best performance. Apart from improved performance, an important desirable property of using L12 is that it is entirely parameter free, hence we do not need to follow any additional annealing procedure. We could conceivably further improve performance by determing the optimal exponent p in Lp(.)=.2p for 0<p<1. However, for a broad range of possible error distributions, we find that L12 is adequate.

6.2 Limitation of FPFH Feature-Matching Based Registration

In Section 5, we have demonstrated the potency of the FPFH feature-matching based registration paradigm. However, we note that these experiments, derived from those presented in [12], have special data characteristics. Specifically, in these examples, either the datasets are small or the motions between adjacent scans are small in magnitude and exhibit temporal smoothness. However, we note here that registration based on feature correspondences cannot work in all possible contexts. For example, when scans in the input dataset have sufficiently large depth differences or large motions between them in both rotation and translation, FPFH feature-based matches become few and unreliable. Consequently, the reconstructions given by both FGR [12] and our proposed algorithms become incorrect. In such a scenario, we need to take recourse to using a robust ICP-based multiview registration method (albeit with a greater computational cost) which converges to the correct solution.
In this alternative approach, following [33], we again consider the camera viewgraph 𝒢={𝒱,}. For each edge available in the set , we estimate the pairwise motion using a robust version of ICP. Specifically, for the motion estimation step in our robust ICP, we use the motion obtained using the L12 loss optimization for pairwise registration method as described in Algorithm 4.1. After all pairwise motions represented by the edges are estimated, we estimate the absolute motion of each camera w.r.t. a global frame of reference using robust motion averaging. Here, our solution for robust motion averaging for rigid Euclidean motions is similar to the robust rotation averaging method of [46]. We find that typically, we achieve the desired convergence in 3 full iterations of this procedure. To illustrate our argument, we show the qualities of reconstructions achieved for a life-size statue of Mahatma Gandhi, where the input set of scans is small and mostly have low overlap. Moreover, overlapping scans have significant depth differences between them, leading to significantly different FPFH features and consequently, a high percentage of incorrect matches. A visual representation of this scenario is given in the appendix for better understanding. As we can see in Figure 4, the joint multiview registration using FPFH features-based matches fails to correctly align some of the scans, whereas the robust multiview-based ICP routine successfully produces a correct reconstruction.

Figure 4: 3D reconstruction of a statue of Mahatma Gandhi. The close-up on the top left shows that joint multiview registration using FPFH features fails whereas the close-up on the bottom left shows successful registration using our robust pairwise motion estimation within a multiview ICP routine. The full reconstruction is shown on the right.

7 Conclusion

We have presented a fast and efficient 3D point registration method that optimizes on the 𝕊𝔼(3) group and generalizes the line process based optimization method. In addition, we have shown that in scenarios where feature correspondences cannot be reliably established, our robust motion estimation can make a multiview ICP method robust and effective.

References

  • [1] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon, “KinectFusion: Real-Time Dense Surface Mapping and Tracking,” in Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality, 2011.
  • [2] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A Benchmark for the Evaluation of RGB-D SLAM Systems,” in International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, pp. 573–580, IEEE, 2012.
  • [3] Z. Zhang, “Microsoft Kinect Sensor and its Effect,” IEEE Multimedia, vol. 19, no. 2, pp. 4–10, 2012.
  • [4] J. Salvi, C. Matabosch, D. Fofi, and J. Forest, “A Review of Recent Range Image Registration Methods with Accuracy Evaluation,” Image and Vision Computing, vol. 25, no. 5, pp. 578–596, 2007.
  • [5] G. K. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin, “Registration of 3D Point Clouds and Meshes: A Survey from Rigid to Nonrigid,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 7, pp. 1199–1217, 2013.
  • [6] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan, “3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2270–2287, 2014.
  • [7] N. Mellado, D. Aiger, and N. J. Mitra, “Super 4PCS Fast Global Pointcloud Registration via Smart Indexing,” in Computer Graphics Forum, vol. 33 (5), pp. 205–215, Wiley Online Library, 2014.
  • [8] N. Gelfand, N. J. Mitra, L. J. Guibas, and H. Pottmann, “Robust Global Registration,” in Symposium on Geometry Processing, vol. 2 (3), p. 5, 2005.
  • [9] A. Makadia, A. Patterson, and K. Danilidis, “Fully Automatic Registration of 3D point Clouds,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1297–1304, IEEE, 2006.
  • [10] D. Holz, A. E. Ichim, F. Tombari, R. B. Rusu, and S. Behnke, “Registration with the Point Cloud Library: A Modular Framework for Aligning in 3D,” IEEE Robotics & Automation Magazine, vol. 22, no. 4, pp. 110–124, 2015.
  • [11] S. Choi, Q.-Y. Zhou, and V. Koltun, “Robust Reconstruction of Indoor Scenes,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 5556–5565, IEEE, 2015.
  • [12] Q.-Y. Zhou, J. Park, and V. Koltun, “Fast Global Registration,” in European Conference on Computer Vision, pp. 766–782, Springer, 2016.
  • [13] F. Tombari, S. Salti, and L. Di Stefano, “Performance Evaluation of 3D Keypoint Detectors,” International Journal of Computer Vision, vol. 102, no. 1-3, pp. 198–220, 2013.
  • [14] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M. Kwok, “A Comprehensive Performance Evaluation of 3D Local Feature Descriptors,” International Journal of Computer Vision, vol. 116, no. 1, pp. 66–89, 2016.
  • [15] S. Rusinkiewicz and M. Levoy, “Efficient Variants of the ICP Algorithm,” in Proceedings of the Third International Conference on 3D Digital Imaging and Modeling, pp. 145–152, IEEE, 2001.
  • [16] D. Aiger, N. J. Mitra, and D. Cohen-Or, “4-Points Congruent Sets for Robust Pairwise Surface Registration,” in ACM Transactions on Graphics (ToG), vol. 27 (3), p. 85, ACM, 2008.
  • [17] B. Drost, M. Ulrich, N. Navab, and S. Ilic, “Model Globally, Match Locally: Efficient and Robust 3D Object Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 998–1005, IEEE, 2010.
  • [18] S. Bouaziz, A. Tagliasacchi, and M. Pauly, “Sparse Iterative Closest Point,” in Proceedings of the Eleventh Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 113–123, Eurographics Association, 2013.
  • [19] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek, “The Trimmed Iterative Closest Point Algorithm,” in Proceedings of the 16th International Conference on Pattern Recognition, vol. 3, pp. 545–548, IEEE, 2002.
  • [20] J. Yang, H. Li, and Y. Jia, “Go-ICP: Solving 3D Registration Efficiently and Globally Optimally,” in IEEE International Conference on Computer Vision, pp. 1457–1464, IEEE, 2013.
  • [21] S. Umeyama, “Least-Squares Estimation of Transformation Parameters Between Two Point Patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 376–380, 1991.
  • [22] R. Raguram, J.-M. Frahm, and M. Pollefeys, “A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus,” in European Conference on Computer Vision, pp. 500–513, Springer, 2008.
  • [23] R. B. Rusu, N. Blodow, and M. Beetz, “Fast Point Feature Histograms (FPFH) for 3D Registration,” in IEEE International Conference on Robotics and Automation, pp. 3212–3217, IEEE, 2009.
  • [24] C. Papazov, S. Haddadin, S. Parusel, K. Krieger, and D. Burschka, “Rigid 3D Geometry Matching for Grasping of Known Objects in Cluttered Scenes,” The International Journal of Robotics Research, vol. 31, no. 4, pp. 538–553, 2012.
  • [25] R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison, “SLAM++: Simultaneous Localisation and Mapping at the Level of Objects,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359, IEEE, 2013.
  • [26] R. I. Hartley and F. Kahl, “Global Optimization Through Searching Rotation Space and Optimal Estimation of the Essential Matrix,” in IEEE 11th International Conference on Computer Vision, pp. 1–8, IEEE, 2007.
  • [27] H. Li and R. Hartley, “The 3D-3D Registration Problem Revisited,” in IEEE 11th International Conference on Computer Vision, pp. 1–8, IEEE, 2007.
  • [28] O. Enqvist, K. Josephson, and F. Kahl, “Optimal Correspondences from Pairwise Constraints,” in IEEE 12th International Conference on Computer Vision, pp. 1295–1302, IEEE, 2009.
  • [29] J. Yang, H. Li, D. Campbell, and Y. Jia, “Go-ICP: a Globally Optimal Solution to 3D ICP Point-Set Registration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2241–2254, 2016.
  • [30] B. Jian and B. C. Vemuri, “Robust Point Set Registration Using Gaussian Mixture Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 1633–1645, Aug 2011.
  • [31] G. D. Evangelidis and R. Horaud, “Joint Alignment of Multiple Point Sets with Batch and Incremental Expectation-Maximization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 1397–1410, June 2018.
  • [32] B. Eckart, K. Kim, and J. Kautz, “HGMR: Hierarchical Gaussian Mixtures for Adaptive 3D Registration,” in The European Conference on Computer Vision (ECCV), September 2018.
  • [33] V. M. Govindu and A. Pooja, “On Averaging Multiview Relations for 3D Scan Registration,” IEEE Transactions on Image Processing, vol. 23, pp. 1289–1302, March 2014.
  • [34] A. Torsello and A. Albarelli, “Multi-View Registration via Graph Diffusion of Dual Quaternions,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2441–2448, 2011.
  • [35] F. Arrigoni, B. Rossi, and A. Fusiello, “Global Registration of 3D Point Sets via LRS Decomposition,” in European Conference on Computer Vision, pp. 489–504, 2016.
  • [36] P. W. Holland and R. E. Welsch, “Robust Regression Using Iteratively Reweighted Least-Squares,” Communications in Statistics - Theory and Methods, vol. 6, no. 9, pp. 813–827, 1977.
  • [37] K. Aftab and R. Hartley, “Convergence of Iteratively Re-weighted Least Squares to Robust M-estimators,” in Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pp. 480–487, IEEE, 2015.
  • [38] M. J. Black and A. Rangarajan, “On the Unification of Line Processes, Outlier Rejection, and Robust Statistics with Applications in Early Vision,” International Journal of Computer Vision, vol. 19, no. 1, pp. 57–91, 1996.
  • [39] G. Chirikjian, Stochastic Models, Information Theory, and Lie Groups, Volume 2: Analytic Methods and Modern Applications. Birkhauser Boston, 2011.
  • [40] J. Selig, Geometrical Methods in Robotics. Springer New York, 2013.
  • [41] F. C. Park, “Distance Metrics on the Rigid-Body Motions with Applications to Mechanism Design,” Journal of Mechanical Design, vol. 117, no. 1, pp. 48–54, 1995.
  • [42] P. Meer, D. Mintz, A. Rosenfeld, and D. Y. Kim, “Robust Regression Methods for Computer Vision: A Review,” International Journal of Computer Vision, vol. 6, no. 1, pp. 59–70, 1991.
  • [43] E. Zacur, M. Bossa, and S. Olmos, “Left-Invariant Riemannian Geodesics on Spatial Transformation Groups,” SIAM Journal on Imaging Sciences, vol. 7, no. 3, pp. 1503–1557, 2014.
  • [44] R. Szeliski, Computer Vision: Algorithms and Applications. Springer-Verlag New York, Inc., 2010.
  • [45] F. J. Lawin, M. Danelljan, F. Khan, P.-E. Forssén, and M. Felsberg, “Density Adaptive Point Set Registration,” in IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision Foundation, June 2018.
  • [46] A. Chatterjee and V. M. Govindu, “Robust Relative Rotation Averaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 958–972, April 2018.
  • [47] A. Bartoli, D. Pizarro, and M. Loog, “Stratified Generalized Procrustes Analysis,” International journal of computer vision, vol. 101, no. 2, pp. 227–253, 2013.
  • [48] P. Bergström and O. Edlund, “Robust Registration of Point Sets Using Iteratively Reweighted Least Squares,” Computational Optimization and Applications, vol. 58, no. 3, pp. 543–561, 2014.
  • [49] A. W. Fitzgibbon, “Robust Registration of 2D and 3D Point Sets,” Image and Vision Computing, vol. 21, no. 13-14, pp. 1145–1153, 2003.
  • [50] A. S. Mian, M. Bennamoun, and R. Owens, “Three-Dimensional Model-Based Object Recognition and Segmentation in Cluttered Scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1584–1601, 2006.

Appendix A Linear Form of the Error Term in Eqn. 9

Let the given set of correspondences be:

{(𝐩s,𝐪s)| 1s𝒮;𝐩s=[(𝐩s)1],𝐪s=[(𝐪s)1] for {(𝐩s),(𝐪s)}3}.

Then we can write the error term in Eqn. 9 as

es(𝐌(k))=[(𝐩s)1]-(𝐈4+Δ𝔪(k))𝐌(k-1)[(𝐪s)1] (17)
es(𝐌(k))=[(𝐩s)-(𝐪s)0]-[[𝝎]×|𝐮𝟎|0]𝐌(k-1)[(𝐪s)1] (18)

where we get Δ𝔪(k)=[[𝝎]×|𝐮𝟎|0] from Eqn. 2. Rewriting Eqn. 18 with 𝔳=[𝝎𝐮], 𝐌(k-1)=[𝐑(k-1)|𝐭(k-1)𝟎|1] (from Eqn. 1), and by dropping the trailing 0 for ease of representation, we get,

es(𝐌(k))=[-[𝐑(k-1)(𝐪s)+𝐭(k-1)]×|𝐈3]𝔳-((𝐩s)-(𝐪s)). (19)

Thus, we have in Eqn. 9,

𝐀s =[-[𝐑(k-1)(𝐪s)+𝐭(k-1)]×|𝐈3], (20)
𝐛s =(𝐩s)-(𝐪s). (21)

Appendix B Algorithm for Joint Multiview Registration

Similar to our algorithm for robust motion estimation between a pair of 3D scans, we state our solution for the robust motion estimation of a set of N(2) 3D scans as given in Algorithm B.

{algorithm}

[h] IRLS estimation of joint multiview 3D registration Input: {{(𝐩i1,𝐩j1)(𝐩i𝒮ij,𝐩j𝒮ij)}|(i,j)} (according to the viewgraph 𝒢={𝒱,})
Output: 𝕄={𝐌i|𝐌i𝕊𝔼(3)i[1N]} (Robust estimate of motion of the set of N=|𝒱| scans)
Initialization: 𝕄 is set to initial guess 𝕄initial

\[email protected]@algorithmic

[l] \While||𝕧||>Nϵ \Comment𝕧=[𝔳1𝔳N] \State1. Compute {(𝐀ijs,𝐛ijs)|s[1𝒮ij];(i,j)} using Eqn. 13 \State2. Compute weights wijs=ρ(eijs)eijs as used in Eqn. 15 \State3. Estimate 𝕧 as IRLS solution for Eqn. 15 \State4. Update 𝐌iexp(𝔳𝔦^)𝐌i i[1N] \EndWhile

Appendix C More Results

UWA dataset: Table 5 reports the performace of the motion step of the Fast Global Registration (FGR) method of  [12] as well as all the 3 versions of our method on the UWA dataset [50]. This dataset consists of 5 objects and 50 scenes, with each scene consisting of a combination of these objects. The task is to align individual objects to the scenes, given that each scene contains substantial clutter and occlusion. A total of 188 such alignment tasks are provided in the dataset. As we can see from Table 5, the L12 version of our method produces the lowest median rotation angle and median translation norm errors. It is also significantly faster than FGR [12].

Table 5: Median rotation angle error (RAE) (in degrees), median translation norm error (TNE) (in units of the mean scene diameter) and mean running time (in milliseconds) of the motion step of each method for each sequence in the UWA dataset.
Method Median RAE Median TNE Mean Time
FGR [12] 1.276 0.152 32.7
Our L12 0.319 0.034 10.1
Our L1 0.824 0.108 7.0
Our GM 1.276 0.152 14.5

Relative improvement of joint multiview approach over two-stage motion averaging approach: As described in Section 5.2, we show in Table 6 that the reconstruction error of the scenes in the Augmented ICL-NUIM dataset [11] decreases when we use our joint multiview estimation procedure on top of the two-stage motion averaged approach. We show the improvement achieved using the L12 loss, which is our best-performing version.

Table 6: Mean registration error (in meters) before and after applying the joint multiview (MV) procedure with L12 loss on the motion averaged estimate, for full reconstruction from the fragments of each sequence in the Augmented ICL-NUIM dataset.
livingroom 1 livingroom 2 office 1 office 2
Before MV 0.07 0.07 0.06 0.07
After MV 0.04 0.05 0.03 0.04

Augmented ICL-NUIM dataset: We also show full reconstructions of the livingroom 1, office 1 and office 2 sequences from the Augmented ICL-NUIM dataset [11] in Figures 56 and 7 respectively, as provided by the L12 version of our method.

Figure 5: Reconstruction of the livingroom 1 sequence from the Augmented ICL-NUIM dataset, as given by our method with the L12 loss function
Figure 6: Reconstruction of the office 1 sequence from the Augmented ICL-NUIM dataset, as given by our method with the L12 loss function
Figure 7: Reconstruction of the office 2 sequence from the Augmented ICL-NUIM dataset, as given by our method with the L12 loss function

Appendix D An Illustration of the Limitation of FPFH Feature-Matching Based Registration

As discussed in Section 6.2, we have presented a scenario where the FPFH feature-matching based registration technique breaks down due to unreliability of the feature matches themselves. In this particular scenario, we have 23 scans of a life-size statue of Mahatma Gandhi collected with a standard commercial depth camera. Figure 8 shows the plan view of a schematic of the cameras (represented as balls) around the statue, as recovered by our ICP-based multiview approach. Recall that these cameras are the nodes of the viewgraph 𝒢={𝒱,}. We also display a schematic of the edges in the viewgraph 𝒢 (using sticks). The thickness of each edge is proportional to the number of FPFH feature matches found between the correponding camera (or equivalently scan) pair.
We can observe from Figure 8 that the thinnest edges are found between pairs of cameras at different depths, implying that there are extremely few FPFH feature matches between these cameras. Compounding this observation with the fact that FPFH features are noisy to begin with, the resultant motions between these cameras, even with our robust cost function, are grossly incorrect. In contrast, our ICP-based multiview approach can, albeit at a higher computational cost, align these cameras correctly and produce the desired reconstruction.

Figure 8: Plan view of a schematic representation of the viewgraph used for reconstruction of the statue of Mahatma Gandhi. See text for details.