Diffusion models are a family of generative models that yield record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse denoising process, remains a challenge due to slow convergence rates and high computational costs. In this work, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with denoising probabilistic diffusion models, our framework operates with approximately a quarter of the parameters and 30% of the Floating Point Operations (FLOPs) compared to standard U-Nets in Denoising Diffusion Probabilistic Models (DDPMs). Furthermore, our model is up to 70% faster in inference than the baseline models when measured in equal conditions while converging to better quality solutions.
In vivo cardiac diffusion tensor imaging (cDTI) is a promising Magnetic Resonance Imaging (MRI) technique for evaluating the micro-structure of myocardial tissue in the living heart, providing insights into cardiac function and enabling the development of innovative therapeutic strategies. However, the integration of cDTI into routine clinical practice is challenging due to the technical obstacles involved in the acquisition, such as low signal-to-noise ratio and long scanning times. In this paper, we investigate and implement three different types of deep learning-based MRI reconstruction models for cDTI reconstruction. We evaluate the performance of these models based on reconstruction quality assessment and diffusion tensor parameter assessment. Our results indicate that the models we discussed in this study can be applied for clinical use at an acceleration factor (AF) of $\times 2$ and $\times 4$, with the D5C5 model showing superior fidelity for reconstruction and the SwinMR model providing higher perceptual scores. There is no statistical difference with the reference for all diffusion tensor parameters at AF $\times 2$ or most DT parameters at AF $\times 4$, and the quality of most diffusion tensor parameter maps are visually acceptable. SwinMR is recommended as the optimal approach for reconstruction at AF $\times 2$ and AF $\times 4$. However, we believed the models discussed in this studies are not prepared for clinical use at a higher AF. At AF $\times 8$, the performance of all models discussed remains limited, with only half of the diffusion tensor parameters being recovered to a level with no statistical difference from the reference. Some diffusion tensor parameter maps even provide wrong and misleading information.
Traffic flow analysis is revolutionising traffic management. Qualifying traffic flow data, traffic control bureaus could provide drivers with real-time alerts, advising the fastest routes and therefore optimising transportation logistics and reducing congestion. The existing traffic flow datasets have two major limitations. They feature a limited number of classes, usually limited to one type of vehicle, and the scarcity of unlabelled data. In this paper, we introduce a new benchmark traffic flow image dataset called TrafficCAM. Our dataset distinguishes itself by two major highlights. Firstly, TrafficCAM provides both pixel-level and instance-level semantic labelling along with a large range of types of vehicles and pedestrians. It is composed of a large and diverse set of video sequences recorded in streets from eight Indian cities with stationary cameras. Secondly, TrafficCAM aims to establish a new benchmark for developing fully-supervised tasks, and importantly, semi-supervised learning techniques. It is the first dataset that provides a vast amount of unlabelled data, helping to better capture traffic flow qualification under a low cost annotation requirement. More precisely, our dataset has 4,402 image frames with semantic and instance annotations along with 59,944 unlabelled image frames. We validate our new dataset through a large and comprehensive range of experiments on several state-of-the-art approaches under four different settings: fully-supervised semantic and instance segmentation, and semi-supervised semantic and instance segmentation tasks. Our benchmark dataset will be released.
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this work we introduce a new framework for SSL named NorMatch. Firstly, we introduce a new uncertainty estimation scheme based on normalizing flows, as an auxiliary classifier, to enforce highly certain pseudo-labels yielding a boost of the discriminative classifiers. Secondly, we introduce a threshold-free sample weighting strategy to exploit better both high and low confidence pseudo-labels. Furthermore, we utilize normalizing flows to model, in an unsupervised fashion, the distribution of unlabeled data. This modelling assumption can further improve the performance of generative classifiers via unlabeled data, and thus, implicitly contributing to training a better discriminative classifier. We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
We study the problem of deconvolution for light-sheet microscopy, where the data is corrupted by spatially varying blur and a combination of Poisson and Gaussian noise. The spatial variation of the point spread function (PSF) of a light-sheet microscope is determined by the interaction between the excitation sheet and the detection objective PSF. First, we introduce a model of the image formation process that incorporates this interaction, therefore capturing the main characteristics of this imaging modality. Then, we formulate a variational model that accounts for the combination of Poisson and Gaussian noise through a data fidelity term consisting of the infimal convolution of the single noise fidelities, first introduced in L. Calatroni et al. "Infimal convolution of data discrepancies for mixed noise removal", SIAM Journal on Imaging Sciences 10.3 (2017), 1196-1233. We establish convergence rates in a Bregman distance under a source condition for the infimal convolution fidelity and a discrepancy principle for choosing the value of the regularisation parameter. The inverse problem is solved by applying the primal-dual hybrid gradient (PDHG) algorithm in a novel way. Finally, numerical experiments performed on both simulated and real data show superior reconstruction results in comparison with other methods.
Superpixels serve as a powerful preprocessing tool in many computer vision tasks. By using superpixel representation, the number of image primitives can be largely reduced by orders of magnitudes. The majority of superpixel methods use handcrafted features, which usually do not translate well into strong adherence to object boundaries. A few recent superpixel methods have introduced deep learning into the superpixel segmentation process. However, none of these methods is able to produce superpixels in near real-time, which is crucial to the applicability of a superpixel method in practice. In this work, we propose a two-stage graph-based framework for superpixel segmentation. In the first stage, we introduce an efficient Deep Affinity Learning (DAL) network that learns pairwise pixel affinities by aggregating multi-scale information. In the second stage, we propose a highly efficient superpixel method called Hierarchical Entropy Rate Segmentation (HERS). Using the learned affinities from the first stage, HERS builds a hierarchical tree structure that can produce any number of highly adaptive superpixels instantaneously. We demonstrate, through visual and numerical experiments, the effectiveness and efficiency of our method compared to various state-of-the-art superpixel methods.
Wasserstein GANs are based on the idea of minimising the Wasserstein distance between a real and a generated distribution. We provide an in-depth mathematical analysis of differences between the theoretical setup and the reality of training Wasserstein GANs. In this work, we gather both theoretical and empirical evidence that the WGAN loss is not a meaningful approximation of the Wasserstein distance. Moreover, we argue that the Wasserstein distance is not even a desirable loss function for deep generative models, and conclude that the success of Wasserstein GANs can in truth be attributed to a failure to approximate the Wasserstein distance.
We consider the task of classifying when an extremely reduced amount of labelled data is available. This problem is of a great interest, in several real-world problems, as obtaining large amounts of labelled data is expensive and time consuming. We present a novel semi-supervised framework for multi-class classification that is based on the normalised and non-smooth graph 1-Laplacian. Our transductive framework is framed under a novel functional with carefully selected class priors - that enforces a sufficiently smooth solution that strengthens the intrinsic relation between the labelled and unlabelled data. We demonstrate through extensive experimental results on large datasets CIFAR-10 and ChestX-ray14, that our method outperforms classic methods and readily competes with recent deep-learning approaches.
Purpose: To obtain high-quality reconstructions from highly undersampled dynamic MRI data with the goal of reducing the acquisition time and towards improving physicians' outcome in clinical practice in a range of clinical applications. Theory and Methods: In dynamic MRI scans, the interaction between the target structure and the physical motion affects the acquired measurements. We exploit the strong repercussion of motion in MRI by proposing a variational framework - called Compressed Sensing Plus Motion (CS+M) - that links in a single model, simultaneously and explicitly, the computation of the algorithmic MRI reconstruction and the physical motion. Most precisely, we recast the image reconstruction and motion estimation problems as a single optimisation problem that is solved, iteratively, by breaking it up into two more computationally tractable problems. The potentials and generalisation capabilities of our approach are demonstrated in different clinical applications including cardiac cine, cardiac perfusion and brain perfusion imaging. Results: The proposed scheme reduces blurring artefacts and preserves the target shape and fine details whilst observing the lowest reconstruction error under highly undersampling up to 12x. This results in lower residual aliasing artefacts than the compared reconstructions algorithms. Overall, the results coming from our scheme exhibit more stable behaviour and generate a reconstruction closer to the gold-standard. Conclusion: We show that incorporating physical motion to the CS computation yields a significant improvement of the MR image reconstruction, that in fact, is closer to the gold-standard. This translates to higher reconstruction quality whilst requiring less measurements.
The piecewise constant Mumford-Shah (PCMS) model and the Rudin-Osher-Fatemi (ROF) model are two of the most famous variational models in image segmentation and image restoration, respectively. They have ubiquitous applications in image processing. In this paper, we explore the linkage between these two important models. We prove that for two-phase segmentation problem the optimal solution of the PCMS model can be obtained by thresholding the minimizer of the ROF model. This linkage is still valid for multiphase segmentation under mild assumptions. Thus it opens a new segmentation paradigm: image segmentation can be done via image restoration plus thresholding. This new paradigm, which circumvents the innate non-convex property of the PCMS model, therefore improves the segmentation performance in both efficiency (much faster than state-of-the-art methods based on PCMS model, particularly when the phase number is high) and effectiveness (producing segmentation results with better quality) due to the flexibility of the ROF model in tackling degraded images, such as noisy images, blurry images or images with information loss. As a by-product of the new paradigm, we derive a novel segmentation method, coined thresholded-ROF (T-ROF) method, to illustrate the virtue of manipulating image segmentation through image restoration techniques. The convergence of the T-ROF method under certain conditions is proved, and elaborate experimental results and comparisons are presented.