Abstract
In this work, we adapt a method based on multiple hypothesis tracking (MHT)that has been shown to give stateoftheart vessel segmentation results ininteractive settings, for the purpose of extracting trees. Regularly spacedtubular templates are fit to image data forming local hypotheses. These localhypotheses are used to construct the MHT tree, which is then traversed to makesegmentation decisions. However, some critical parameters in this method arescaledependent and have an adverse effect when tracking structures of varyingdimensions. We propose to use statistical ranking of local hypotheses inconstructing the MHT tree, which yields a probabilistic interpretation ofscores across scales and helps alleviate the scaledependence of MHTparameters. This enables our method to track trees starting from a single seedpoint. Our method is evaluated on chest CT data to extract airway trees andcoronary arteries. In both cases, we show that our method performssignificantly better than the original MHT method.
Quick Read (beta)
Abstract
Purpose: In this work, we adapt a method based on multiple hypothesis tracking (MHT) that has been shown to give stateoftheart vessel segmentation results in interactive settings, for the
purpose of extracting trees.
Methods: Regularly spaced tubular templates are fit to image data forming local hypotheses. These local hypotheses are then used to construct the MHT tree, which is then traversed to make segmentation decisions. Some critical parameters in the method, we base ours on, are scaledependent and have an adverse effect when tracking structures of varying dimensions. We propose to use statistical ranking of local hypotheses in constructing the MHT tree which yields a probabilistic interpretation of scores across scales and helps alleviate the scaledependence of MHT parameters. This enables our method to track trees starting from a single seed point.
Results: The proposed method is evaluated on chest CT data to extract airway trees and coronary arteries and compared to relevant baselines. In both cases, we show that our method performs significantly better than the Original MHT method in semiautomatic setting.
Conclusions: The statistical ranking of local hypotheses introduced allows the MHT method to be used in noninteractive settings yielding competitive results for segmenting treestructures.
Keywords: multiple hypothesis tracking, tree segmentation, CT, airways, vessels
]
Extracting Treestructures in CT data by Tracking Multiple Statistically Ranked Hypotheses
Raghavendra Selvan${}^{\mathrm{1}}$, Jens Petersen${}^{\mathrm{1}}$, Jesper H Pedersen${}^{\mathrm{2}}$, Marleen de Bruijne${}^{\mathrm{1}\mathrm{,}\mathrm{3}}$
${}^{1}$Department of Computer Science, University of Copenhagen, Denmark
${}^{2}$Department of Thoracic Surgery, Rigshospitalet, University of Copenhagen, Denmark
${}^{3}$Biomedical Imaging Group Rotterdam, Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
July 11, 2019
Author email for correspondence: [email protected]
I. Introduction
In medical image analysis, reliable methods to extract airways, blood vessels and neuron tracks can have important clinical usage. For instance, airway tree extraction is used to study the morphology of airways, which is useful to derive biomarkers for diseases such as chronic obstructive pulmonary disease (COPD) ^{1, 2} and cystic fibrosis ^{3}; and segmentation of coronary vessels is useful in prognosis of cardiovascular diseases ^{4}.
Several methods have been proposed to address tree segmentation tasks encountered in medical images. One common approach is to model appearance of local structures in the tree and extract the tree itself by formulating a global connectivity model. Multiscale vesselness filtering is a widely employed technique to enhance local tubular structures ^{5, 6, 7} like vessels and airways. Another approach is to detect local structures using multiscale blob detectors and tracking these blobs to extract branches ^{8}. A comprehensive survey of vessel segmentation methods is presented in Lesage et al. ^{14} and the class of methods we focus on in this contribution are addressed as templatematching based centerline tracking methods in this survey. A similar comparative study for airway segmentation methods was published in the EXACT’09 study ^{16}, wherein several methods were evaluated on chest CT data. One important takeaway from this study was that there was scope for improvement in extracting missing airway branches. Another interesting point from the study was that more than half of the competing methods used some form of region growing in their segmentation procedure ^{17, 18, 19, 20}.
In this work, we investigate the feasibility of using multiple hypothesis tracking (MHT) ^{21, 22} – a well known decision making paradigm in multitarget tracking – for segmenting trees. In methods based on region growing ^{23} and conventional object tracking ^{24}, segmentation decisions are made instantaneously rendering them susceptibe to local anomalies due to pathologies, acquisition noise, interfering structures and other sources of noise. The primary contrasting feature of using MHT is that it is based on deferring decisions to a future step. From any given tracking step, all possible states corresponding to different solutions (local hypotheses) are maintained up to a predefined search depth (global hypotheses) in the form of a hypothesis tree. Decisions about which of the hypotheses to propagate and which ones to prune away at each step are made by tracing back the best hypothesis from the end of the hypothesis tree. Such deferred decisions, invariably, take more information into account and make the decisions at any given step more robust than instantaneous ones. The concept of instantaneous and deferred decisions are illustrated in Figure 1.
In the work of Friman et al. ^{25}, MHT was used along with template matching for segmenting small and bright tubular structures from a dark background. This method when used in an interactive setting along with minimal paths has remained the best performing method in a coronary vessel segmentation challenge ^{26}. However, to the best of our knowledge, it has not been applied to other tree extraction problems. Critical parameters used to maintain the MHT tree in this method are scale dependent; this has an adverse effect when tracking structures of varying dimensions, such as airway trees. We believe this method has potential to solve a wider range of problems if the limitations with scale dependence can be alleviated.
We introduce statistical ranking of local hypotheses as a means to make local and global hypothesis scores independent of scales. This makes tuning of two crucial MHT parameters easier, allowing us to apply the method to extract trees and/or structures with branches of varying dimensions without user interaction. The work presented here is based on Friman et al. ^{25} and an extension of our previously published work ^{27}. Compared to ^{27}, we present the proposed modifications formally and perform more comprehensive evaluation. We show airway extraction experiments on lowdose CT data from the Danish lung cancer screening trial ^{28} and segmentation of coronary arteries from CT angiography data from the coronary challenge ^{26}.
II. Method
In this section, we present an overview of the method in Friman et al. ^{25}, describe its limitations and present modifications that enable our proposed method to overcome these limitations. For convenience, the method in Friman et al. ^{25} will be addressed as the Original MHT method and ours, with the modifications, as the Modified MHT method.
II.A. The Original MHT method ^{25}
The primary objective formulated in the Original MHT method is to segment bright tubular structures from darker background based on their relative contrast. The bright, elongated structures of interest are modeled as sequences of tubular segments, as depicted in Figure 2. Segmentation is performed using a tracking approach, by analysing a multiple hypothesis tree comprising templatematched local hypotheses. The MHT method takes up a predictupdateaggregate procedure to build the hypothesis tree.
Predict: The prediction step is carried out from each tracking step where several local hypotheses are formulated at regular step lengths controlled by the step length factor (used as a scaling of the local radius to obtain actual step length). The search span and number of local hypotheses at each step are controlled by the search angle and number of angles parameters, respectively. Local hypotheses are modeled using a parameterised template function which maps crosssectional intensity variations of a tubular segment, of radius $r$ centered at ${\mathbf{x}}_{0}$ along the direction $\widehat{\mathbf{v}}$, to a profile function with values between $0$ and $1$, given as,
$$T(\mathbf{x};{\mathbf{x}}_{0},\widehat{\mathbf{v}},r)=\frac{{r}^{\gamma}}{{({d}_{E}^{2}(\mathbf{x};{\mathbf{x}}_{0},\widehat{\mathbf{v}}))}^{\gamma /2}+{r}^{\gamma}}.$$  (1) 
Steepness of the profile function is controlled by $\gamma $ and ${d}_{E}^{2}(\mathbf{x};{\mathbf{x}}_{0},\widehat{\mathbf{v}})$ is the squared Euclidean distance between any point $\mathbf{x}\in {\mathbb{R}}^{3}$ and axis of the tubular template along $\widehat{\mathbf{v}}$. We retain $\gamma =8$ from Friman et al. ^{25} and the corresponding profile function is shown in Figure 2 (right).
Update: The update step consists of performing the templatematching with the image data. This requires a model of the image neighbourhood, $\mathbf{I}(\mathbf{x})$ and is given as
$$\mathbf{I}(\mathbf{x})=kT(\mathbf{x};{\mathbf{x}}_{0},\widehat{\mathbf{v}},r)+m+\u03f5(\mathbf{x}),$$  (2) 
where $k$ is the contrast and $m$ is the mean intensity and $\u03f5(\mathbf{x})$ captures the noise due to modeling uncertainties, interfering structures, image artifacts or acquisition noise. Predicted local hypotheses are fit to the image data by solving the following weighted leastsquares problem:
$$\underset{k,m,r,{\mathbf{x}}_{0},\widehat{\mathbf{v}}}{\mathrm{arg}\mathrm{min}}{\mathbf{W}(r,{\mathbf{x}}_{0},\widehat{\mathbf{v}})[kT({\mathbf{x}}_{0},\widehat{\mathbf{v}},r)+m{\mathrm{\U0001d7cf}}_{n}\mathbf{I}]}^{2},$$  (3) 
where $\mathbf{W}(r,{\mathbf{x}}_{0},\widehat{\mathbf{v}})$ is a weighting function with diagonal entries corresponding to an asymmetric Gaussian centered at ${\mathbf{x}}_{0}$. It is used to localise the fitting procedure and $\mathbf{I}$ is the image data with nonzero weights under this weighting function. The width of the weighting function is controlled by the weight window parameter.
The result of the template fitting procedure is an updated local hypothesis with estimates for the mean intensity and contrast values, which are then used to quantify the fitness of the local hypotheses. A local hypothesis score is introduced to capture the fitness of local hypotheses:
$${l}_{i}\triangleq \frac{km}{\text{std}(k)},$$  (4) 
which can be seen as a measure of how different the bright tubular structure is from its background; this can be interpreted as the contrast signaltonoise ratio (SNR), used to compare different local hypotheses. Local hypotheses below a threshold, referred to as the local threshold, are discarded to control the number of candidate hypotheses.
Aggregate: An MHT tree of a predetermined search depth, $d$, is constructed using local hypotheses from each step as illustrated in Figure 1. A sequence of local hypotheses of length $d$ forms a global hypothesis with an average global score,
$${s}_{g}=\frac{{\sum}_{i=1}^{d}{l}_{i}}{d}.$$  (5) 
This score is compared with the global threshold and all global hypotheses that do not exceed this threshold are discarded. Finally, the global hypothesis with the highest score is retained at the end of the tracking and output as the segmented result.
In summary, two categories of parameters are used to tune the Original MHT method. First category of parameters are related to generating local hypotheses: minimum and maximum radii, step length factor between successive tracking steps, window of the weighting function, maximum search angle and number of local hypotheses at each step. Second category of parameters are used to control the MHT behaviour: search depth, local and global thresholds.
II.B. Modifications to the Original MHT method
The Original MHT method was devised as an interactive method to segment bright, elongated structures. It is not immediately applicable for the automatic extraction of trees with branches of varying dimensions. In this section, some limitations of the Original MHT method in this regard are elaborated and modifications are proposed to overcome them.
II.B.1. Dealing with scaledependence
An important factor to consider when tracking treestructures with branches of varying dimensions is the range of their radius. With airway trees for instance, the radius of visible airways in CT typically ranges from $1$mm to $10$mm. When using the Original MHT method, such variations make it hard to obtain optimal parameter settings across scales. This behaviour is demonstrated for local hypothesis scores, computed using Eq. (4), in Figure 3. Specifically, local hypothesis scores tend to decrease with scale, making it challenging to obtain a single optimal local threshold that works well across scales. When the local threshold is tuned to extract smaller branches, the chance of adding large and suboptimal branches to the hypothesis tree is higher resulting in oversegmentation. Similarly, when the score is tuned for larger branches even the best candidate hypothesis for smaller branches might not qualify as valid candidates resulting in undersegmentation.
To alleviate the aforementioned scale dependence, we devise a strategy of statistically ranking the local hypotheses based on their scores. That is, instead of using the scores in Eq. (4) directly, we sort them in decreasing order to derive a ranking based score at each step in the range $(0,1]$ . If $N$ local hypotheses are sampled at any given step, then the scores of the ranked local hypotheses are obtained as,
$${l}_{i}^{\prime}=\frac{1}{{R}_{i}}\mathit{\hspace{1em}}\forall i\in 1\mathrm{\dots}N,$$  (6) 
where ${R}_{i}\in \{1,\mathrm{\dots},N\}$ is the rank of the $N$ local hypotheses. By this scheme of relative scoring of hypotheses, where the best local hypothesis is assigned $1$ at each tracking step, the problem of scale dependence of hypothesis scores is immediately alleviated. This is demonstrated in the table to the right in Figure 3. This procedure also provides a probabilistic interpretation of the hypothesis scores which is more useful in maintaining the MHT tree when local hypotheses are aggregated in Eq. (5).
II.B.2. Handling branching
Another factor to consider when tracking trees is bifurcations. In the Original MHT method, bifurcation detection is performed by clustering hypotheses based on their spatial location at every step using a spectral clustering algorithm ^{29}. However, there are no further details as to how the MHT tree is maintained after bifurcations. Once a bifurcation is detected, there are several ways to proceed. As the tracking happens per branch, new seed points can be added from the detected points of bifurcation and tracking can be restarted. This entails rebuilding of the MHT tree from each of the new seed points. The implementation of the Original MHT method, available as a MeVisLab^{*}^{*} * http://mevislab.de/ module, follows this strategy. This has a negative consequence of discarding the information aggregated until the step before branching.
Another strategy that does not rely on rebuilding the MHT tree is to save the state of MHT tree at branching and resume tracking from each of the newly detected branching points separately. By resuming tracking from both branching points with the history of the parent branch we do not throw away information. In our Modified MHT method, this strategy is used to fully benefit from the MHT procedure.
III. Experiments and Results
We evaluate the Modified MHT method for two applications: extraction of airway trees and segmentation of coronary arteries. We used the MeVisLab implementation of the Original MHT method for performing the Original MHT experiments and a customised module with our modifications on MeVisLab for the Modified MHT method. All experiments were performed on a laptop with $8$ cores and $32$ GB memory running Debian operating system. Details of preprocessing of data, experiments, error measures and results are presented next.
III.A. Airway tree extraction from chest CT
Experiments were performed on a subset of 32 lowdose CT scans, from different subjects, chosen randomly from the Danish Lung Cancer Screening Trial ^{28} dataset. Participants of the study were either current or former smokers between the age range 50 and 70 years, and had a history of smoking at least 20 pack years. The chosen subset of 32 scans has a distribution, that is similar to the DLCST study, with 19 COPD cases and 13 nonCOPD cases. The 3D images in this dataset have a slice spacing of 1 mm and inplane resolution varying between $0.72$ to $0.78$ mm. The images were randomly split into training and test sets comprising of 16 images each. Performance of the method was compared with reference segmentations composed of union of two previous methods which were corrected by an expert user: the first method uses an appearance model based on a voxel classifier to distinguish airway voxels from background and uses region growing along with a vessel similarity measure to extract airways ^{23}, and the second method uses a similar voxel classifier but extracts airways by continually extending locally optimal paths ^{30}. The reference segmentations were constructed by the expert user – a PhD student and one of the main organziers of the EXACT’09 Challenge ^{16} – using a dedicated airway segmentation evaluation tool developed for the EXACT’09 Challenge. An airway branch was categorized as correct if there were no voxels beyond the airway wall, and wrong otherwise. Final corrected reference segmentations were obtained from the union of all branches classified as correct.
III.A.1. Preprocessing of data
All CT images were preprocessed and converted into probability images using a kNNbased voxel classifier trained to distinguish airway voxels from background ^{23}. Thus obtained probability images have probability close to $1$ in regions that are classified to be inside the airways and close to 0 outside. These images match the profile function described in Section II.A., wherein, the structure of interest is bright (high probability) in a dark background. An example probability image is shown in Figure 4. Noise in the image is due to several factors including acquisition, interfering vessels, ribs and lung tissue.
III.A.2. Error measure
Performance of all four methods is evaluated based on accuracy of the extracted centerlines to give equal importance to large and small branches. This in in contrast to using voxel based measures such as dice overlap coefficient which are biased towards large branches. To standardise the evaluation procedure, centerlines were extracted from the binary segmentations from all methods using a 3D thinning algorithm ^{31}. The error measure is a symmetric distance between centerlines of the predicted and the reference segmentations, given as,
${d}_{err}$  $=w{\displaystyle \frac{{\sum}_{i=1}^{{n}_{op}}\mathrm{min}{d}_{E}({c}_{i},{C}_{ref})}{{n}_{op}}}$  
$+(1w){\displaystyle \frac{{\sum}_{j=1}^{{n}_{ref}}\mathrm{min}{d}_{E}({c}_{j},{C}_{op})}{{n}_{ref}}}.$  (7) 
In the above equation, ${C}_{ref}$, ${C}_{op}$ are set of equidistant points on the centerlines of the reference and output segmentation results, respectively, comprising of ${n}_{ref}$ and ${n}_{op}$ number of points. ${c}_{i},{c}_{j}\in {\mathbb{R}}^{n}$ are individual points on the centerlines, ${d}_{E}$ is the Euclidean distance and $w$ is a weight, such that $0\le w\le 1$. Notice that the first term in Eq. (7) captures the distance between the two centerlines due to false positives, whereas the second term captures the distance due to false negatives. In this work we use $w=0.5$. Depending on the application the weight can be modified to obtain a desirable measure that reflects the sensitivity or specificity needs.
III.A.3. Experiment setup and parameter tuning
Parameter/Method  Search Range  Org.  Mod. 
Min. radius* (mm)  1  1  1 
Max. radius* (mm)  10  10  10 
Step length  [1.0, 1.1, …, 2.0]  1.5  1.1 
Weight window${}^{+}$  [3, 4, 5] / [1, 2, …, 5]  3.0  1.0 
Search depth  6  6  6 
Search angle (deg.)  [30, 40, …, 70]  60  70 
Number of angles  [1, 2, …, 5]  3  2 
Local thres.(${T}_{\text{loc}}$)  [1.0, 2.0, …, 5.0]  2.0  – 
Global thres.${}^{+}$  $2{T}_{\text{loc}}$ / [0.5, 0.6, …, 0.9 ]  4.0  0.7 

The Modified MHT method was compared to the Original MHT method and with region growing applied to both the probability and intensity images. Parameters of all the methods were tuned on the training set comprising 16 images and tested on an independent test set consisting of 16 images. Both the MHT methods have tunable parameters; the first set of parameters is related to the tubular template: minimum and maximum allowable radii of the templates, scaling factors for step length and weight window. A second category of parameters is related to multiple hypothesis tracking: search depth, search angle, number of angles, local and global hypothesis thresholds. Tuning both categories of parameters for the training set is cumbersome, so the minimum and maximum radii were fixed to $1$mm and $10$mm, respectively, based on prior knowledge about the morphology of airway trees. Increasing search depth can improve quality of segmentation decisions but it is constrained by the exponential increase in computation of additional hypotheses. With initial experiments, on two of the training scans, we tried out the largest search depth (6) for the chest CT data that resulted in a training time of about $8$ hours to try out the different parameter settings reported in Table 1. Only the remaining parameters were tuned to minimise the training error defined in Eq. (7) using grid search. The range of parameters searched over, and the optimal set of parameters obtained, for both MHT methods are summarised in Table 1. Due to the scale independence introduced by the statistical ranking of local hypotheses, the Modified MHT method does not have the local threshold as a parameter. The range of parameters for global threshold shown is for the Modified MHT method, as the global threshold for the Original MHT method was always set to be twice the optimal local threshold based on Friman et al. ^{25}. Both region growing methods (RG) only have the threshold parameter to be tuned. The optimal threshold for region growing on intensity of CT images was found to be $995$ HU which is a low threshold, due to the leakage for three images in the training set for any higher thresholds. The optimal threshold for the case of probability images based on the training set performance was found to be $0.5$. All four methods require an initial seed point and the same seed point in the trachea, automatically extracted using the procedure described in Lo et al. ^{23}, was provided to all.
We use the MHT module in MeVisLab provided by the authors in Friman et al. ^{25} to perform the Original MHT experiments. In this implementation, minimum value for the weight window factor is constrained to be $3$. However, for the Modified MHT method we tried a larger range $[1,2,\mathrm{\dots},5]$, and found the training error to be slightly lower (by $0.2$mm) when compared to using the smaller range $[3,4,5]$. We adhere to using this larger range for Modified MHT method, and a smaller range for Original MHT method, also for the Coronary Artery Extraction experiments.
III.A.4. Results
Method  COPD  NoCOPD  All 

RG (intensity)  $3.269\pm 0.373$  $3.446\pm 1.051$  $3.368\pm 0.808$ 
RG (probability)  $2.147\pm 0.211$  $2.001\pm 0.575$  $2.064\pm 0.447$ 
Original MHT  $3.770\pm 0.771$  $4.109\pm 1.756$  $3.961\pm 1.384$ 
Modified MHT  $1.916\pm 0.426$  $1.940\pm 0.567$  $1.929\pm 0.494$ 
Performance comparison was based on computing the average error measure in Eq. (7) and results for the test set are presented in Table 2 and visualised in Figure 5. Differences between methods were tested for statistical significance using KruskalWallis tests . The Modified MHT method clearly shows an improvement when compared to the Original MHT method ($$) and region growing on intensity images ($$), but was not significantly better than region growing on probability images ($p=0.1$). We also report the performance of each method for the COPD and nonCOPD cases in Table 2 and find no significant difference in their performance between COPD and nonCOPD cases. Six of the test set results for the Modified MHT method are visualised in Figure 6 along with the reference segmentations.
III.B. Coronary Artery Extraction
^{†}^{†}footnotetext: Related to R#1: Comment 2We next describe experiments on 3D CT angiography (CTA) images for segmenting coronary arteries evaluated on the Coronary Artery Challenge data ^{*}^{*} * http://coronary.bigr.nl/centerlines/index.php. The challenge organisers allow methods to compete in three categories based on the extent of user interaction per vessel: automatic (no seed points), semiautomatic (one seed point) and interactive (more than one seed point). We evaluate the original and Modified MHT methods as standalone, semiautomatic methods and compare to the results of Friman et al. ^{25}.
III.B.1. Data and preprocessing
The dataset consists of 32 CTA images, split into 8 images for training and the remaining 24 for testing. These 3D volumes have a mean voxel size of $0.32\times 0.32\times 0.4$ mm${}^{3}$. The objective of the challenge is to segment four vessels per dataset. To this end, four seed points per vessel are provided, of which one must be selected for semiautomatic methods:

•
Point S: start point of centerline

•
Point E: end point of centerline

•
Point B: a point about 3cm distal of the start point along the centerline

•
Point A: a point inside distal part of the vessel.
In all images, three of the four vessels to be segmented were of the same type: right coronary artery (RCA), left anterior descending artery (LAD) and left circumflex artery (LCX). The fourth vessel, however, varied between images and was one of the large side branches (LSB). The reference standard for the training set was provided as manually drawn centerlines.
We closely adhere to the preprocessing performed in the Original MHT method ^{32, 25}. Unsigned integers are used to represent the voxel intensities as gray value (GV), under a simple transformation of the corresponding Hounsfield Unit (HU): $HU(\mathbf{x})=GV(\mathbf{x})1024$. To enable improved vessel template matching, voxel intensity of the lung tissue is raised to that of myocardial tissue and vessel calcifications are eliminated with the following thresholds:
$$  (8) 
with ${t}_{myo}=950$ and ${t}_{calc}=1700$.
III.B.2. Error measure
We use two commonly reported measures on the Coronary Challenge website in our evaluations, that quantify completeness and accuracy of the segmented vessels, defined as OV (overlap) and AI (average distance inside vessel) in Schaap et al. ^{26}, respectively. These error measures are computed based on location of centerline points and the corresponding local radius information obtained from the segmentations.

•
Overlap (OV): It is similar to Dice overlap coefficient and measures completeness of the tracked vessels. It is defined as OV in Schaap et.al ^{26} as:
$$\text{OV}=\frac{\text{TPM}+\text{TPR}}{\text{TPM}+\text{TPR}+\text{FN}+\text{FP}}$$ (9) where points in the reference centerlines are marked TPR (true positive in reference) if the distance to at least one of the points on the output segmentation is less than the annotated radius, and FN (false negative) otherwise. Points of the output segmentation are marked TPM (true positive in method) if at least one point of the reference segmentation is at a distance less than the radius at that reference point, and FP (false positive) otherwise. $\cdot $ denotes the cardinality of the set of points.

•
Accuracy (AI): It is a measure of accuracy of the tracked centerline inside the vessel, obtained by computing the average minimum distance between each of the centerline points of the predicted and reference segmentations, if the distance is less than the annotated radius at the reference point. It is defined as AI in Schaap et.al ^{26}.
We also report overlap until first error (OF) and overlap with the clinically part of the vessel (OT) to enable better comparison with other methods in the challenge, computed similar to OV in Equation (9). Further, the challenge organizers provide a score to measure agreement with observers for each of the measures which is also included in the evaluation. A score of 100 indicates that the result of the method is perfect, a score of 50 indicates performance similar to the observers and a score of 0 indicates complete failure.
III.B.3. Experiment setup and parameter tuning
Parameter/Method  Search range  Org.  Mod. 
Min. radius* (mm)  1  1  1 
Max. radius* (mm)  3  3  3 
Search depth  4  4  4 
Weight Window${}^{+}$  [3, 4, 5] / [1, 2, …, 5]  3  1 
Step Length  [1.0, 1.1, …, 2.0]  1.5  1.5 
Search Angle (deg.)  [30, 40, …, 70]  60  60 
Number of angles  [1, 2, …, 5]  3  2 
Local Thres. (${T}_{\text{loc}}$)  [1.0, 2.0, …, 5.0]  4.0  – 
Global Thres.${}^{+}$  $2{T}_{\text{loc}}$ / [0.55, 0.65,.., 0.95]  8.0  0.95 

As with the airway tree extraction experiments in Section III.A.3., most of the parameters are tuned for both MHT methods, while some are fixed based on prior knowledge. The minimum and maximum radii parameters are set to 1mm and 3mm, respectively. Search depth was fixed to be $4$ which is the same as the one used in Original MHT work for fair comparison ^{25}. Remaining parameters are tuned to maximise overlap, OV, on the training set and are summarised in Table 3. Further, as with the airway experiments, the starting value of weight window factor is $3$, and the global threshold is twice the local threshold, for the Original MHT method. Due to the varying quality of images across the training set, tracking from different seed points yields varying results. Based on the training starting tracking from Point B and tracking in both directions turned out to be most useful for both MHT methods. The final choice of seed point was based on estimating the overlap scores in the training set, starting from each of the four seed points and choosing the seed point which resulted in the highest overlap score (completeness).
III.B.4. Results
Methods  AI  OV  OF  OT  

(mm)  Sc.  (%)  Sc.  (%)  Sc.  (%)  Sc.  
MHT (int.)  0.230  47.9  98.521  84.0  67.612  72.8  91.783  84.5 
MHT (semi.)  0.388  29.8  83.078  40.6  44.078  31.7  70.992  44.1 
Mod. MHT  0.368  29.3  98.681  63.6  70.034  53.9  93.372  67.7 
We compare the Modified MHT method with the Original MHT method, evaluated as semiautomatic methods in the ongoing Coronary Artery Challenge evaluation. Results for these two submissions were processed by the challenge organisers. A version of the Original MHT method, used along with a minimal path algorithm and user interaction ^{25} has remained the best performing method in the challenge. The Modified MHT method scored $91.8\%$ overlap and accuracy of $0.38$mm, whereas the Original MHT method obtained $67.6\%$ overlap and accuracy of $0.39$mm. There is significant improvement in overlap when compared to the Original MHT method $$ as indicated by the KruskalWallis test. With these results our method stands as the third best performing method amongst all semiautomatic methods (requiring one seed point) reported on the challenge website.
Test set measures for each of the four vessels, along with the average score per dataset, are shown in Figure 7 for the Modified MHT method. The first plot in Figure 7 shows the overlap performance and it is clear that RCA, LAD and LCX vessels score high consistently and have few outliers. LSB, on the other hand, has four cases scoring less than $70\%$ in OV. One possible reason is that the vessel chosen as LSB was different across the dataset and the tuned parameters could not generalise to these four cases. The second plot in Figure 7 depicts the accuracy (AI) for the test set across the four vessels and again the average error in LSB is slightly higher than the rest. The improvement in this measure is not significant when compared to the original MHT method $(p=0.091)$, indicating that the Original MHT method accurately segments incomplete vessels (OV $=67.6\%$). In Table 4 we summarise the results for two versions of the Original MHT method and the Modified MHT method.
IV. Discussion and Conclusions
In this work, we extended the templatematching based method for extraction of bright, elongated structures using multiple hypothesis tracking in Friman et al. ^{25}. We introduced statistical ranking of local hypotheses as a possible means to overcome scale dependence of important MHT parameters.
A useful interpretation of the proposed ranking scheme is to see it as a measure of the relative significance of local hypotheses at each step. When aggregated over the depth of the hypothesis tree, it can be interpreted as the likelihood of the corresponding global hypothesis. Consider an example when the search depth is 10 and the global hypothesis threshold is $0.6$. For a global hypothesis (comprising of a sequence of 10 local hypotheses) to score above $0.6$ it would require its component local hypotheses to have been best locally at least in 6 of the 10 tracking steps, on average. This relative significance provides a probabilistic global hypothesis score, making it independent of the scale and variations of structures and removes the need to perform two level pruning of hypotheses – using local and global hypothesis thresholds – as done in the Original MHT method. Instead of explicitly pruning local hypotheses, we propagate all local hypotheses with their ranked scores from Eq. (6) and rely on the deferred decisions of MHT algorithm to yield valid branches.
One possible concern that arises when the scores in Eq. (4) are discarded for relative ranking in Eq. (6) is the chance of poor candidate hypotheses performing well when ranked amongst equally poor counterparts. A strength of the deferred decision of MHT is that with adequate depth of the MHT tree such hypotheses are unlikely to be accepted, as they would have to perform well across multiple steps. Increasing search depth increases the computational expense as the number of hypotheses can grow exponentially. With both our experiments we observed that search depth of upto $6$ is feasible to run on limited computational resources, yielding promising results. Apart from occlusions due to noise in the data, the main factor contributing to missing branches in airways and vessels are small branches (as they are more susceptible to noise). And in most cases, a hypothesis tree of search depth between $46$ can maintain adequate branch history allowing for a tradeoff between undersegmentation of the structures and longer run time.
Due to the proposed changes the Modified MHT method requires only one seed point per tree. In most cases, even the one required seed point can also be automatically provided, as in the case of airway experiments in Section III.A., and thus making the method fully automatic. We also presented an improvement to the bifurcation handling strategy by maintaining the history of the MHT tree across bifurcations instead of restarting tracking at bifurcation points.
We presented evaluations in Section III., where we first focused on extracting airway trees from chest CT data. To the extent of our knowledge, this is the first application of an MHTbased method to the task of airway extraction. The Modified MHT method was compared with region growing on intensity images, region growing on probability images and the Original MHT method. The Modified MHT method shows significant improvement in average centerline distance when compared to the Original MHT method $$. The Modified MHT method required $15$ seconds on average, per CT scan, to extract airway trees starting from a single seed point. The original MHT method when used in an automatic setting also uses $15$ seconds on average, whereas the region growing methods take less than $5$ seconds on average per scan.
We also presented evaluations on the Coronary Challenge ^{26} and compared the Modified MHT method with the Original MHT method used as a semiautomatic method. The interactive version of the Original MHT method, which is a combination of MHT, minimal paths and extensive user interaction (on average 1.5 clicks per vessel), is still the best performing method in the Coronary Artery Challenge. The Modified MHT method, however, shows significant improvement in the overlap measure (OV) when compared to the semiautomatic Original MHT method $$. The accuracy of the extracted centerlines (AI) from both semiautomatic versions of the MHT method are close, indicating that the Original MHT method (used semiautomatically) was accurate in the vessels that were segmented, but it undersegments the vessels indicated by the low OV score. This is likely due to dependence of the parameters on scale, which is overcome with the statistical ranking introduced in the Modified MHT method. In this work we evaluated the Modified MHT method using one seed point per vessel for the gains in some of the cases, while still competing in the semiautomatic category of the challenge. However, the Modified MHT method could also have been used with fewer than one seed point per vessel (one seed point per tree) and yielding similar performance, as we were able to extract substantial portions of the coronary tree starting from a single seed point. The computation time required for the Modified MHT method per vessel is $3$ seconds on average; thus not more than $15$ seconds per CTA image in the dataset. This makes it more appealing for clinical usage than the Original MHT method that similar computation time per time plus the time used for user interaction.
The original MHT method used in an interactive setting takes up to 5 minutes per scan ^{25}.
There are two recent semiautomatic methods based on diffusion tensor imaging techniques ^{33, 34} that perform better than the Modified MHT method in the OV measures as reported in the challenge website with a score of $97.3\%$ and $96.4\%$, respectively. However, in the OF measure these methods score $69.9\%$ and $77.4\%$, respectively, while the Modified MHT method obtained $77.9\%$. Also, the accuracy measure is comparable for all three methods ($\approx 0.35$ mm). Three other methods in the fully automatic category have better OV measure by a small margin, between $12\%$, when compared to the Modified MHT method ^{7, 35, 6}. Again, the Modified MHT method obtains $3\%$ higher OF score when compared to these methods. One common feature across all these better performing methods is the preprocessing performed on the CT images to enhance vessellike structures. For instance, in the currently best performing fully automatic method an improved version of Frangi’s vesselness is used before centerline extraction ^{7}, and similarly in Bauer at al. ^{6} a tube detection filter is used and a pretrained vessel detection filter is used in Zheng et al. ^{35}. In the Modified MHT method we use a simple templatematching step and, use of more sophisticated strategies as described in the other methods in the challenge should further improve the performance. For instance, we expect that the use of probability images obtained from a voxel classifier to distinguish vessel and nonvessel voxels instead of intensity images, as in the case of the airway experiments, could have been beneficial.
In conclusion, we proposed modifications to the well established Original MHT method that yield significantly better results. Statistical ranking of local hypotheses yields a common interpretation of the global hypotheses of the MHT tree at all scales. The improvements demonstrated with the modified MHT method in the reported measures translates into extraction of more complete airways and vessels, with fewer false positives. In the airway extraction experiments, the reduction in the average centerline distance has two contributing factors: fewer false positives and more complete airway trees. The latter of which – detecting more complete airways in a COPD cohort – can be of use clinically, as it is the detection of small airways that is harder and can be more useful in diagnosis of several pulmonary diseases. Further, the proposed modifications to the Original MHT method render it automatic and this can ease the clinical workflow.
Acknowledgements
This work was funded by the Independent Research Fund Denmark (DFF) and Netherlands Organisation for Scientific Research (NWO). We thank the anonymous reviewers for their constructive feedback that has improved the manuscript substantially.
The authors have no conflicts to disclose.
References
 1 Y. Nakano et al., Computed tomographic measurements of airway dimensions and emphysema in smokers: correlation with lung function, American journal of respiratory and critical care medicine 162, 1102–1108 (2000).
 2 M. Hasegawa, Y. Nasuhara, Y. Onodera, H. Makita, K. Nagai, S. Fuke, Y. Ito, T. Betsuyaku, and M. Nishimura, Airflow limitation and airway dimensions in chronic obstructive pulmonary disease, American journal of respiratory and critical care medicine 173 (2006).
 3 W. Kuo, M. de Bruijne, J. Petersen, K. Nasserinejad, H. Ozturk, Y. Chen, A. PerezRovira, and H. A. Tiddens, Diagnosis of bronchiectasis and airway wall thickening in children with cystic fibrosis: Objective airwayartery quantification, European Radiology , 1–10 (2017).
 4 W. Rosamond et al., Heart disease and stroke statistics—2008 update, Circulation 117, e25–e146 (2008).
 5 A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever, Multiscale vessel enhancement filtering, in International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 130–137, Springer, 1998.
 6 C. Bauer and H. Bischof, Edge based tube detection for coronary artery centerline extraction, The Insight Journal (2008).
 7 G. Yang, P. Kitslaar, M. Frenay, A. Broersen, M. J. Boogers, J. J. Bax, J. H. Reiber, and J. Dijkstra, Automatic centerline extraction of coronary arteries in coronary computed tomographic angiography, The international journal of cardiovascular imaging 28 (2012).
 8 R. Selvan, J. Petersen, J. H. Pedersen, and M. de Bruijne, Extraction of Airways with Probabilistic Statespace Models and Bayesian Smoothing, in Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics, pages 53–63, Springer, 2017.
 9 O. Wink, W. J. Niessen, and M. A. Viergever, Multiscale vessel tracking, IEEE Transactions on Medical Imaging 23, 130–133 (2004).
 10 E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische mathematik 1, 269–271 (1959).
 11 F. Benmansour and L. D. Cohen, Tubular structure segmentation based on minimal path method and anisotropic enhancement, International Journal of Computer Vision 92, 192–210 (2011).
 12 C. Florin, N. Paragios, and J. Williams, Particle filters, a quasimonte carlo solution for segmentation of coronaries, in International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 246–253, Springer, 2005.
 13 D. Lesage, E. D. Angelini, G. FunkaLea, and I. Bloch, Adaptive particle filtering for coronary artery segmentation from 3D CT angiograms, Computer Vision and Image Understanding 151, 29–46 (2016).
 14 D. Lesage, E. D. Angelini, I. Bloch, and G. FunkaLea, A review of 3D vessel lumen segmentation techniques: Models, features and extraction schemes, Medical image analysis 13, 819–845 (2009).
 15 Q. Meng, H. R. Roth, T. Kitasaka, M. Oda, J. Ueno, and K. Mori, Tracking and Segmentation of the Airways in Chest CT Using a Fully Convolutional Network, in International Conference on Medical Image Computing and ComputerAssisted Intervention, Springer, 2017.
 16 P. Lo et al., Extraction of airways from CT (EXACT’09), IEEE Transactions on Medical Imaging 31, 2093–2107 (2012).
 17 M. Feuerstein, T. Kitasaka, and K. Mori, Adaptive branch tracing and image sharpening for airway tree extraction in 3D chest CT, in Proc. Second International Workshop on Pulmonary Image Analysis, 2009.
 18 C. Bauer, T. Pock, H. Bischof, and R. Beichel, Airway tree reconstruction based on tube detection, in Proc. of Second International Workshop on Pulmonary Image Analysis, pages 203–213, 2009.
 19 R. Wiemker, T. Bülow, and C. Lorenz, A simple centricitybased region growing algorithm for the extraction of airways, in Proc. Second International Workshop on Pulmonary Image Analysis (MICCAI), pages 309–314, Citeseer, 2009.
 20 J. Lee and A. P. Reeves, Segmentation of the airway tree from chest CT using local volume of interest, in Proc. of Second International Workshop on Pulmonary Image Analysis, pages 273–284, 2009.
 21 D. Reid, An algorithm for tracking multiple targets, IEEE transactions on Automatic Control 24, 843–854 (1979).
 22 S. S. Blackman, Multiple hypothesis tracking for multiple target tracking, IEEE Aerospace and Electronic Systems Magazine 19, 5–18 (2004).
 23 P. Lo, J. Sporring, H. Ashraf, J. J. Pedersen, and M. de Bruijne, Vesselguided airway tree segmentation: A voxel classification approach, Medical image analysis 14, 527–538 (2010).
 24 T. Yedidya and R. Hartley, Tracking of blood vessels in retinal images using Kalman filter, in Computing: Techniques and Applications, 2008. DICTA’08. Digital Image, pages 52–58, IEEE, 2008.
 25 O. Friman, M. Hindennach, C. Kühnel, and H.O. Peitgen, Multiple hypothesis template tracking of small 3D vessel structures, Medical image analysis 14, 160–171 (2010).
 26 M. Schaap et al., Standardized evaluation methodology and reference database for evaluating coronary artery centerline extraction algorithms, Medical image analysis 13, 701–714 (2009).
 27 R. Selvan, J. Petersen, J. H. Pedersen, and M. de Bruijne, Extraction of airway trees using multiple hypothesis tracking and template matching, in 6th International Workshop on Pulmonary Image Analysis, 2016.
 28 J. H. Pedersen et al., The Danish randomized lung cancer CT screening trial—overall design and results of the prevalence round, Journal of Thoracic Oncology 4, 608–614 (2009).
 29 F. R. Chung, Spectral graph theory, Number 92, American Mathematical Soc., 1997.
 30 P. Lo, J. Sporring, J. J. H. Pedersen, and M. de Bruijne, Airway tree extraction with locally optimal paths, in International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 51–58, Springer, 2009.
 31 T.C. Lee, R. L. Kashyap, and C.N. Chu, Building skeleton models via 3D medial surface axis thinning algorithms, CVGIP: Graphical Models and Image Processing 56, 462–478 (1994).
 32 O. Friman, C. Kühnel, and H.O. Peitgen, Coronary centerline extraction using multiple hypothesis tracking and minimal paths, in International Conference on Medical Image Computing and ComputerAssisted Intervention, volume 42, 2008.
 33 S. Cetin, A. Demir, A. Yezzi, M. Degertekin, and G. Unal, Vessel tractography using an intensity based tensor model with branch detection, IEEE transactions on medical imaging 32, 348–363 (2013).
 34 S. Cetin and G. Unal, A higherorder tensor vessel tractography for segmentation of vascular structures, IEEE transactions on medical imaging 34, 2172–2185 (2015).
 35 Y. Zheng, H. Tek, and G. FunkaLea, Robust and accurate coronary artery centerline extraction in CTA by combining modeldriven and datadriven approaches, in International Conference on Medical Image Computing and ComputerAssisted Intervention, Springer, 2013.