Ultrasound is well-established as an imaging modality for diagnostic and interventional purposes. However, the image quality varies with operator skills as acquiring and interpreting ultrasound images requires extensive training due to the imaging artefacts, the range of acquisition parameters and the variability of patient anatomies. Automating the image acquisition task could improve acquisition reproducibility and quality but training such an algorithm requires large amounts of navigation data, not saved in routine examinations. Thus, we propose a method to generate large amounts of ultrasound images from other modalities and from arbitrary positions, such that this pipeline can later be used by learning algorithms for navigation. We present a novel simulation pipeline which uses segmentations from other modalities, an optimized volumetric data representation and GPU-accelerated Monte Carlo path tracing to generate view-dependent and patient-specific ultrasound images. We extensively validate the correctness of our pipeline with a phantom experiment, where structures' sizes, contrast and speckle noise properties are assessed. Furthermore, we demonstrate its usability to train neural networks for navigation in an echocardiography view classification experiment by generating synthetic images from more than 1000 patients. Networks pre-trained with our simulations achieve significantly superior performance in settings where large real datasets are not available, especially for under-represented classes. The proposed approach allows for fast and accurate patient-specific ultrasound image generation, and its usability for training networks for navigation-related tasks is demonstrated.
Numerous dual-energy CT (DECT) techniques have been developed in the past few decades. Dual-energy CT (DECT) statistical iterative reconstruction (SIR) has demonstrated its potential for reducing noise and increasing accuracy. Our lab proposed a joint statistical DECT algorithm for stopping power estimation and showed that it outperforms competing image-based material-decomposition methods. However, due to its slow convergence and the high computational cost of projections, the elapsed time of 3D DECT SIR is often not clinically acceptable. Therefore, to improve its convergence, we have embedded DECT SIR into a deep learning model-based unrolled network for 3D DECT reconstruction (MB-DECTNet) that can be trained in an end-to-end fashion. This deep learning-based method is trained to learn the shortcuts between the initial conditions and the stationary points of iterative algorithms while preserving the unbiased estimation property of model-based algorithms. MB-DECTNet is formed by stacking multiple update blocks, each of which consists of a data consistency layer (DC) and a spatial mixer layer, where the spatial mixer layer is the shrunken U-Net, and the DC layer is a one-step update of an arbitrary traditional iterative method. Although the proposed network can be combined with numerous iterative DECT algorithms, we demonstrate its performance with the dual-energy alternating minimization (DEAM). The qualitative result shows that MB-DECTNet with DEAM significantly reduces noise while increasing the resolution of the test image. The quantitative result shows that MB-DECTNet has the potential to estimate attenuation coefficients accurately as traditional statistical algorithms but with a much lower computational cost.
CT images have been used to generate radiation therapy treatment plans for more than two decades. Dual-energy CT (DECT) has shown high accuracy in estimating electronic density or proton stopping-power maps used in treatment planning. However, the presence of metal implants introduces severe streaking artifacts in the reconstructed images, affecting the diagnostic accuracy and treatment performance. In order to reduce the metal artifacts in DECT, we introduce a metal-artifact reduction scheme for iterative DECT algorithms. An estimate is substituted for the corrupt data in each iteration. We utilize normalized metal-artifact reduction (NMAR) composed with image-domain decomposition to initialize the algorithm and speed up the convergence. A fully 3D joint statistical DECT algorithm, dual-energy alternating minimization (DEAM), with the proposed scheme is tested on experimental and clinical helical data acquired on a Philips Brilliance Big Bore scanner. We compared DEAM with the proposed method to the original DEAM and vendor reconstructions with and without metal-artifact reduction for orthopedic implants (O-MAR). The visualization and quantitative analysis show that DEAM with the proposed method has the best performance in reducing streaking artifacts caused by metallic objects.
Dual-energy CT (DECT) has been widely investigated to generate more informative and more accurate images in the past decades. For example, Dual-Energy Alternating Minimization (DEAM) algorithm achieves sub-percentage uncertainty in estimating proton stopping-power mappings from experimental 3-mm collimated phantom data. However, elapsed time of iterative DECT algorithms is not clinically acceptable, due to their low convergence rate and the tremendous geometry of modern helical CT scanners. A CNN-based initialization method is introduced to reduce the computational time of iterative DECT algorithms. DEAM is used as an example of iterative DECT algorithms in this work. The simulation results show that our method generates denoised images with greatly improved estimation accuracy for adipose, tonsils, and muscle tissue. Also, it reduces elapsed time by approximately 5-fold for DEAM to reach the same objective function value for both simulated and real data.
We propose a fully unsupervised multi-modal deformable image registration method (UMDIR), which does not require any ground truth deformation fields or any aligned multi-modal image pairs during training. Multi-modal registration is a key problem in many medical image analysis applications. It is very challenging due to complicated and unknown relationships between different modalities. In this paper, we propose an unsupervised learning approach to reduce the multi-modal registration problem to a mono-modal one through image disentangling. In particular, we decompose images of both modalities into a common latent shape space and separate latent appearance spaces via an unsupervised multi-modal image-to-image translation approach. The proposed registration approach is then built on the factorized latent shape code, with the assumption that the intrinsic shape deformation existing in original image domain is preserved in this latent space. Specifically, two metrics have been proposed for training the proposed network: a latent similarity metric defined in the common shape space and a learningbased image similarity metric based on an adversarial loss. We examined different variations of our proposed approach and compared them with conventional state-of-the-art multi-modal registration methods. Results show that our proposed methods achieve competitive performance against other methods at substantially reduced computation time.
Automatic parsing of anatomical objects in X-ray images is critical to many clinical applications in particular towards image-guided invention and workflow automation. Existing deep network models require a large amount of labeled data. However, obtaining accurate pixel-wise labeling in X-ray images relies heavily on skilled clinicians due to the large overlaps of anatomy and the complex texture patterns. On the other hand, organs in 3D CT scans preserve clearer structures as well as sharper boundaries and thus can be easily delineated. In this paper, we propose a novel model framework for learning automatic X-ray image parsing from labeled CT scans. Specifically, a Dense Image-to-Image network (DI2I) for multi-organ segmentation is first trained on X-ray like Digitally Reconstructed Radiographs (DRRs) rendered from 3D CT volumes. Then we introduce a Task Driven Generative Adversarial Network (TD-GAN) architecture to achieve simultaneous style transfer and parsing for unseen real X-ray images. TD-GAN consists of a modified cycle-GAN substructure for pixel-to-pixel translation between DRRs and X-ray images and an added module leveraging the pre-trained DI2I to enforce segmentation consistency. The TD-GAN framework is general and can be easily adapted to other learning tasks. In the numerical experiments, we validate the proposed model on 815 DRRs and 153 topograms. While the vanilla DI2I without any adaptation fails completely on segmenting the topograms, the proposed model does not require any topogram labels and is able to provide a promising average dice of 85% which achieves the same level accuracy of supervised training (88%).
2D/3D image registration to align a 3D volume and 2D X-ray images is a challenging problem due to its ill-posed nature and various artifacts presented in 2D X-ray images. In this paper, we propose a multi-agent system with an auto attention mechanism for robust and efficient 2D/3D image registration. Specifically, an individual agent is trained with dilated Fully Convolutional Network (FCN) to perform registration in a Markov Decision Process (MDP) by observing a local region, and the final action is then taken based on the proposals from multiple agents and weighted by their corresponding confidence levels. The contributions of this paper are threefold. First, we formulate 2D/3D registration as a MDP with observations, actions, and rewards properly defined with respect to X-ray imaging systems. Second, to handle various artifacts in 2D X-ray images, multiple local agents are employed efficiently via FCN-based structures, and an auto attention mechanism is proposed to favor the proposals from regions with more reliable visual cues. Third, a dilated FCN-based training mechanism is proposed to significantly reduce the Degree of Freedom in the simulation of registration environment, and drastically improve training efficiency by an order of magnitude compared to standard CNN-based training method. We demonstrate that the proposed method achieves high robustness on both spine cone beam Computed Tomography data with a low signal-to-noise ratio and data from minimally invasive spine surgery where severe image artifacts and occlusions are presented due to metal screws and guide wires, outperforming other state-of-the-art methods (single agent-based and optimization-based) by a large margin.
3-D image registration, which involves aligning two or more images, is a critical step in a variety of medical applications from diagnosis to therapy. Image registration is commonly performed by optimizing an image matching metric as a cost function. However, this task is challenging due to the non-convex nature of the matching metric over the plausible registration parameter space and insufficient approaches for a robust optimization. As a result, current approaches are often customized to a specific problem and sensitive to image quality and artifacts. In this paper, we propose a completely different approach to image registration, inspired by how experts perform the task. We first cast the image registration problem as a "strategy learning" process, where the goal is to find the best sequence of motion actions (e.g. up, down, etc.) that yields image alignment. Within this approach, an artificial agent is learned, modeled using deep convolutional neural networks, with 3D raw image data as the input, and the next optimal action as the output. To cope with the dimensionality of the problem, we propose a greedy supervised approach for an end-to-end training, coupled with attention-driven hierarchical strategy. The resulting registration approach inherently encodes both a data-driven matching metric and an optimal registration strategy (policy). We demonstrate, on two 3-D/3-D medical image registration examples with drastically different nature of challenges, that the artificial agent outperforms several state-of-art registration methods by a large margin in terms of both accuracy and robustness.
In this paper, we present a Convolutional Neural Network (CNN) regression approach for real-time 2-D/3-D registration. Different from optimization-based methods, which iteratively optimize the transformation parameters over a scalar-valued metric function representing the quality of the registration, the proposed method exploits the information embedded in the appearances of the Digitally Reconstructed Radiograph and X-ray images, and employs CNN regressors to directly estimate the transformation parameters. The CNN regressors are trained for local zones and applied in a hierarchical manner to break down the complex regression task into simpler sub-tasks that can be learned separately. Our experiment results demonstrate the advantage of the proposed method in computational efficiency with negligible degradation of registration accuracy compared to intensity-based methods.