My main research concerns the development and application of machine learning methods combined with statistical methods in the domain of medical image processing. This involves, inter alia, proven ensemble methods like random forests, novel neural-network-based methods as well as probabilistic graphical modeling.
I focused on the detection and localization of spatially correlated key points in 2D as well as 3D medical images, with potential follow-up tasks such as object segmentation and disease classification.
In the following, you can find a list of all my peer-reviewed publications, starting with the most recent one. You can click on each title to access the corresponding abstract.
If you are interested in any of the papers or the corresponding auxiliary material (i.e., slides and poster), send an email to firstname.lastname@example.org containing the relevant title(s), and I’ll happily forward the requested resources to you.
“Automatic Localization of Spatially Correlated Key Points in Medical Images”
Abstract: The task of object localization in medical images is a corner stone of automatic image processing and a prerequisite for other medical imaging tasks. In this thesis, we present a general framework for the automatic detection and localization of spatially correlated key points in medical images based on a conditional random field (CRF). The problem of selecting suitable potential functions (knowledge sources) and defining a reasonable graph topology w.r.t. the dataset is automated by our proposed data-driven CRF optimization.
We show how our fairly simple setup can be applied to different medical datasets involving different image dimensionalities (i.e., 2D and 3D), image modalities (i.e., X-ray, CT, MRI) and target objects ranging from 2 to 102 distinct key points by automatically adapting the CRF to the dataset. While the used general “default” configuration represents an easy to transfer setup, it already outperforms other state-of-the-art methods on three out of four datasets. By slightly gearing the proposed approach to the fourth dataset, we further illustrate that the approach is capable of reaching state-of-the-art performance of highly sophisticated and data-specific deep-learning-based approaches.
Additionally, we suggest and evaluate solutions for common problems of graph-based approaches such as the reduced search space and thus the potential exclusion of the correct solution, better handling of spatial outliers using latent variables and the incorporation of invariant higher order potential functions. Each extension is evaluated in detail and the whole method is additionally compared to a rivaling convolutional-neural-network-based approach on a hard problem (i.e., the localization of many locally similar repetitive target key points) in terms of exploiting the spatial correlation. Finally, we illustrate how follow-up tasks—segmentation in this case—may benefit from a correct localization by reaching state-of-the-art performance using off-the-shelve methods in combination with our proposed method.
This dissertation was honored with the Fokusfinderpreis 2021 (1000 €).
“Assessing Attribution Maps for Explaining CNN-Based Vertebral Fracture Classifiers”
Abstract: Automated evaluation of vertebral fracture status on computed tomography (CT) scans acquired for various purposes (opportunistic CT) may substantially enhance vertebral fracture detection rate. Convolutional neural networks (CNNs) have shown promising performance in numerous tasks but their black box nature may hinder acceptance by physicians. We aim (a) to evaluate CNN architectures for osteoporotic fracture discrimination as part of a pipeline localizing and classifying vertebrae in CT images and (b) to evaluate the benefit of using attribution maps to explain a network’s decision. Training different model architectures on 3D patches containing vertebrae, we show that CNNs permit highly accurate discrimination of the fracture status of individual vertebrae. Explanations were computed using selected attribution methods: Gradient, Gradient * Input, Guided BackProp, and SmoothGrad algorithms. Quantitative and visual tests were conducted to evaluate the meaningfulness of the explanations (sanity checks). The explanations were found to depend on the model architecture, the realization of the parameters, and the precise position of the target object of interest.
“A General Framework for Localizing and Locally Segmenting Correlated Objects: A Case Study on Intervertebral Discs in Multi-Modality MR Images”
Abstract: Low back pain is a leading cause of disability that has been associated with intervertebral disc (IVD) degeneration by various clinical studies. With MRT being the imaging technique of choice for IVDs due to its excellent soft tissue contrast, we propose a fully automatic approach for localizing and locally segmenting spatially correlated objects—tailored to cope with a limited set of training data while making very few domain assumptions—and apply it to lumbar IVDs in multi-modality MR images. Regression tree ensembles spatially regularized by a conditional random field are used to find the IVD centroids, which allows to cut fixed-size sections around each IVD to efficiently perform the segmentation on the sub-volumes. Exploiting the similar imaging characteristics of IVD tissue, we build an IVD-agnostic V-Net to perform the segmentation and train it on all IVDs (instead of a specific one). In particular, we compare the usage of binary (i.e., pairwise) CRF potentials combined with a latent scaling variable to tackle spine size variability with scaling-invariant ternary potentials. Evaluating our approach on a public challenge data set consisting of 16 cases from 8 subjects with 4 modalities each, we achieve an average Dice coefficient of 0.904, an average absolute surface distance of 0.423 mm and an average center distance of 0.59 mm.
“Automatically Localizing a Large Set of Spatially Correlated Key Points: A Case Study in Spine Imaging”
Abstract: The fully automatic localization of key points in medical images is an important and active area in applied machine learning, with very large sets of key points still being an open problem. To this end, we extend two general state-of-the-art localization approaches to operate on large amounts of key points and evaluate both approaches on a CT spine data set featuring 102 key points. First, we adapt the multi-stage convolutional pose machines neural network architecture to 3D image data with some architectural changes to cope with the large amount of data and key points. Imprecise localizations caused by the inherent downsampling of the network are countered by quadratic interpolation. Second, we extend a common approach—regression tree ensembles spatially regularized by a conditional random field—by a latent scaling variable to explicitly model spinal size variability. Both approaches are evaluated in detail in a 5-fold cross-validation setup in terms of localization accuracy and test time on 157 spine CT images. The best configuration achieves a mean localization error of 4.21 mm over all 102 key points.
“Detection and localization of spatially correlated point landmarks in medical images using an automatically learned conditional random field”
Abstract: The automatic detection and accurate localization of landmarks is a crucial task in medical imaging. It is necessary for tasks like diagnosis, surgical planning, and post-operative assessment. A common approach to localize multiple landmarks is to combine multiple independent localizers for individual landmarks with a spatial regularizer, e.g., a conditional random field (CRF). Its configuration, e.g., the CRF topology and potential functions, often has to be manually specified w.r.t. the application. In this paper, we present a general framework to automatically learn the optimal configuration of a CRF for localizing multiple landmarks. Furthermore, we introduce a novel “missing” label for each landmark (node in the CRF). The key idea is to define a pool of potentials and optimize their CRF weights and the potential values for missing landmarks in a learning framework. Potentials with a low weight are removed, thus optimizing the graph topology. This allows to easily transfer our framework to new applications, and to integrate different localizers. Further advantages of our algorithm are its low test runtime, low amount of training data, and interpretability. We illustrate its feasibility in a detailed evaluation on three medical datasets featuring high degrees of pathologies and outliers.
“Localization and Labeling of Posterior Ribs in Chest Radiographs Using a CRF-regularized FCN with Local Refinement”
Abstract: Localization and labeling of posterior ribs in radiographs is an important task and a prerequisite for, e.g., quality assessment, image registration, and automated diagnosis. In this paper, we propose an automatic, general approach for localizing spatially correlated landmarks using a fully convolutional network (FCN) regularized by a conditional random field (CRF) and apply it to rib localization. A reduced CRF state space in form of localization hypotheses (generated by the FCN) is used to make CRF inference feasible, potentially missing correct locations. Thus, we propose a second CRF inference step searching for additional locations. To this end, we introduce a novel “refine” label in the first inference step. For “refine”-labeled nodes, small subgraphs are extracted and a second inference is performed on all image pixels. The approach is thoroughly evaluated on 642 images of the public Indiana chest X-ray collection, achieving a landmark localization rate of 94.6%.
“A Novel Approach to Handle Inference in Discrete Markov Networks with Large Label Sets”
Abstract: MAP inference over discrete Markov networks with large label sets is often applied, e.g., in localizing multiple key points in the image domain. Often, approximate or domain specific methods are used to make the problem feasible. An alternative method is to preselect a limited (much smaller) set of suitable labels, which bears the risk to exclude the correct solution. To solve the latter problem, we propose a two-step approach: First, the reduced label sets are extended by a novel “refine” label, which — when chosen during inference — marks nodes where the label set is insufficient. The energies for this additional label are learned in conjunction with the network’s potential weights. Second, for all nodes marked with the “refine” label, additional local inference steps over the full label set are performed. This greedy refinement becomes feasible by extracting small subgraphs around the marked nodes and fixing all other nodes. We thoroughly evaluate and analyze our approach by solving the problem of localizing and identifying 16 posterior ribs in 2D chest radiographs.
“Detection and Localization of Landmarks in the Lower Extremities Using an Automatically Learned Conditional Random Field”
Abstract: The detection and localization of single or multiple landmarks is a crucial task in medical imaging. It is often required as initialization for other tasks like segmentation or registration. A common approach to localize multiple landmarks is to exploit their spatial correlations, e.g., by using a conditional random field (CRF) to incorporate geometric information between landmark pairs. This CRF is usually applied to resolve ambiguities of a localizer, e.g., a random forest or a deep neural network. In this paper, we apply a random forest/CRF combination to the task of jointly detecting and localizing 6 landmarks in the lower extremities, taken from a dataset of 660 X-ray images. The dataset is challenging since a significant number of images does not show all the landmarks. Furthermore, 11.3% of the target landmarks are altered by prostheses or pathologies.
To account for this, we introduce a “missing” label for each landmark (represented by a node in the CRF). Moreover, instead of manually specifying the CRF model by selecting suitable potential functions and the graph topology, we suggest to automatically optimize both in a learning framework. Specifically, we define a pool of potential functions and learn their CRF weights (relative contributions), in addition to the potential values in case of missing landmarks. Potentials with a low weight are removed, thus optimizing the graph topology. Detailed evaluations on our database show the feasibility of our approach. Our algorithm removed on average 23 of the initial 51 CRF potentials, and correctly detected and localized (within 10 mm tolerance) on average 92.8% of the landmarks, with individual rates ranging from 90.0% to 97.4%.
This paper was honored with the Best Paper Award (300 €).
“Efficient Epiphyses Localization Using Regression Tree Ensembles and a Conditional Random Field”
Abstract: Accurate localization of sets of anatomical landmarks is a challenging task, yet often required in automatic analysis of medical images. Several groups – e.g., Donner et al. – have shown that it is beneficial to incorporate geometrical relations of landmarks into detection procedures for complex anatomical structures. In this paper, we present a two-step approach (compared to three steps as suggested by Donner et al.) combining regression tree ensembles with a Conditional Random Field (CRF), modeling spatial relations. The comparably simple combination achieves a localization rate of 99.6% on a challenging hand radiograph dataset showing high age-related variability, which is slightly superior than state-of-the-art results achieved by Hahmann et al.
“Using Web Images as Additional Training Resource for the Discriminative Generalized Hough Transform”
Abstract: Many algorithms in computer vision, e.g., for object localization, are supervised and need annotated training data. One approach for object localization is the Discriminative Generalized Hough Transform (DGHT). It achieves state-of-the-art performance in applications like iris and epiphysis localization, if the amount and quality of training data is sufficient. This motivates techniques for extending the training corpus with limited manual effort. In this paper, we propose an active learning scheme to extend the training corpus by automatically and efficiently harvesting and selecting suitable Web images. We aim at improving localization performance, while reducing the manual supervision to a minimum. Our key idea is to estimate the benefit of a particular candidate Web image by analyzing its Hough space generated using an initial DGHT model. We show that our method performs similarly to a manual selection of Web images as well as a computationally intensive state-of-the-art approach.