Objective: Cardiovascular magnetic resonance-feature tracking (CMR-FT) represents a group of methods for myocardial strain estimation from cardiac cine MRI images. Established CMR-FT methods are mainly based on optical flow or pairwise registration. However, these methods suffer from either inaccurate estimation of large motion or drift effect caused by accumulative tracking errors. In this work, we propose a deformable groupwise registration method using a locally low-rank (LLR) dissimilarity metric for CMR-FT. Methods: The proposed method (Groupwise-LLR) tracks the feature points by a groupwise registration-based two-step strategy. Unlike the globally low-rank (GLR) dissimilarity metric, the proposed LLR metric imposes low-rankness on local image patches rather than the whole image. We quantitatively compared Groupwise-LLR with the Farneback optical flow, a pairwise registration method, and a GLR-based groupwise registration method on simulated and in vivo datasets. Results: Results from the simulated dataset showed that Groupwise-LLR achieved more accurate tracking and strain estimation compared with the other methods. Results from the in vivo dataset showed that Groupwise-LLR achieved more accurate tracking and elimination of the drift effect in late-diastole. Inter-observer reproducibility of strain estimates was similar between all studied methods. Conclusion: The proposed method estimates myocardial strains more accurately due to the application of a groupwise registration-based tracking strategy and an LLR-based dissimilarity metric. Significance: The proposed CMR-FT method may facilitate more accurate estimation of myocardial strains, especially in diastole, for clinical assessments of cardiac dysfunction.
Due to the lack of efficient mpox diagnostic technology, mpox cases continue to increase. Recently, the great potential of deep learning models in detecting mpox and non-mpox has been proven. However, existing models learn image representations via image classification, which results in they may be easily susceptible to interference from real-world noise, require diverse non-mpox images, and fail to detect abnormal input. These drawbacks make classification models inapplicable in real-world settings. To address these challenges, we propose "Mask, Inpainting, and Measure" (MIM). In MIM's pipeline, a generative adversarial network only learns mpox image representations by inpainting the masked mpox images. Then, MIM determines whether the input belongs to mpox by measuring the similarity between the inpainted image and the original image. The underlying intuition is that since MIM solely models mpox images, it struggles to accurately inpaint non-mpox images in real-world settings. Without utilizing any non-mpox images, MIM cleverly detects mpox and non-mpox and can handle abnormal inputs. We used the recognized mpox dataset (MSLD) and images of eighteen non-mpox skin diseases to verify the effectiveness and robustness of MIM. Experimental results show that the average AUROC of MIM achieves 0.8237. In addition, we demonstrated the drawbacks of classification models and buttressed the potential of MIM through clinical validation. Finally, we developed an online smartphone app to provide free testing to the public in affected areas. This work first employs generative models to improve mpox detection and provides new insights into binary decision-making tasks in medical images.
This paper develops an in-depth treatment concerning the problem of approximating the Gaussian smoothing and Gaussian derivative computations in scale-space theory for application on discrete data. With close connections to previous axiomatic treatments of continuous and discrete scale-space theory, we consider three main ways discretizing these scale-space operations in terms of explicit discrete convolutions, based on either (i) sampling the Gaussian kernels and the Gaussian derivative kernels, (ii) locally integrating the Gaussian kernels and the Gaussian derivative kernels over each pixel support region and (iii) basing the scale-space analysis on the discrete analogue of the Gaussian kernel, and then computing derivative approximations by applying small-support central difference operators to the spatially smoothed image data. We study the properties of these three main discretization methods both theoretically and experimentally, and characterize their performance by quantitative measures, including the results they give rise to with respect to the task of scale selection, investigated for four different use cases, and with emphasis on the behaviour at fine scales. The results show that the sampled Gaussian kernels and derivatives as well as the integrated Gaussian kernels and derivatives perform very poorly at very fine scales. At very fine scales, the discrete analogue of the Gaussian kernel with its corresponding discrete derivative approximations performs substantially better. The sampled Gaussian kernel and the sampled Gaussian derivatives do, on the other hand, lead to numerically very good approximations of the corresponding continuous results, when the scale parameter is sufficiently large, in the experiments presented in the paper, when the scale parameter is greater than a value of about 1, in units of the grid spacing.
The problem of image blurring is one of the most studied topics in the field of image processing. Image blurring is caused by various factors such as hand or camera shake. To restore the blurred image, it is necessary to know information about the point spread function (PSF). And because in the most cases it is not possible to accurately calculate the PSF, we are dealing with an approximate kernel. In this paper, the semi-blind image deblurring problem are studied. Due to the fact that the model of the deblurring problems is an ill-conditioned problem, it is not possible to solve this problem directly. One of the most efficient ways to solve this problem is to use the total variation (TV) method. In the proposed algorithm, by using the framelet transform and fractional calculations, the TV method is improved. The proposed method is used on different types of images and is compared with existing methods with different types of tests.
Most multilingual vision-and-language (V&L) research aims to accomplish multilingual and multimodal capabilities within one model. However, the scarcity of multilingual captions for images has hindered the development. To overcome this obstacle, we propose ICU, Image Caption Understanding, which divides a V&L task into two stages: a V&L model performs image captioning in English, and a multilingual language model (mLM), in turn, takes the caption as the alt text and performs crosslingual language understanding. The burden of multilingual processing is lifted off V&L model and placed on mLM. Since the multilingual text data is relatively of higher abundance and quality, ICU can facilitate the conquering of language barriers for V&L models. In experiments on two tasks across 9 languages in the IGLUE benchmark, we show that ICU can achieve new state-of-the-art results for five languages, and comparable results for the rest.
Recently, generative AI technologies have emerged as a significant advancement in artificial intelligence field, renowned for their language and image generation capabilities. Meantime, space-air-ground integrated network (SAGIN) is an integral part of future B5G/6G for achieving ubiquitous connectivity. Inspired by this, this article explores an integration of generative AI in SAGIN, focusing on potential applications and case study. We first provide a comprehensive review of SAGIN and generative AI models, highlighting their capabilities and opportunities of their integration. Benefiting from generative AI's ability to generate useful data and facilitate advanced decision-making processes, it can be applied to various scenarios of SAGIN. Accordingly, we present a concise survey on their integration, including channel modeling and channel state information (CSI) estimation, joint air-space-ground resource allocation, intelligent network deployment, semantic communications, image extraction and processing, security and privacy enhancement. Next, we propose a framework that utilizes a Generative Diffusion Model (GDM) to construct channel information map to enhance quality of service for SAGIN. Simulation results demonstrate the effectiveness of the proposed framework. Finally, we discuss potential research directions for generative AI-enabled SAGIN.
In the anomaly detection field, the scarcity of anomalous samples has directed the current research emphasis towards unsupervised anomaly detection. While these unsupervised anomaly detection methods offer convenience, they also overlook the crucial prior information embedded within anomalous samples. Moreover, among numerous deep learning methods, supervised methods generally exhibit superior performance compared to unsupervised methods. Considering the reasons mentioned above, we propose a self-supervised anomaly detection approach that combines contrastive learning with 2D-Flow to achieve more precise detection outcomes and expedited inference processes. On one hand, we introduce a novel approach to anomaly synthesis, yielding anomalous samples in accordance with authentic industrial scenarios, alongside their surrogate annotations. On the other hand, having obtained a substantial number of anomalous samples, we enhance the 2D-Flow framework by incorporating contrastive learning, leveraging diverse proxy tasks to fine-tune the network. Our approach enables the network to learn more precise mapping relationships from self-generated labels while retaining the lightweight characteristics of the 2D-Flow. Compared to mainstream unsupervised approaches, our self-supervised method demonstrates superior detection accuracy, fewer additional model parameters, and faster inference speed. Furthermore, the entire training and inference process is end-to-end. Our approach showcases new state-of-the-art results, achieving a performance of 99.6\% in image-level AUROC on the MVTecAD dataset and 96.8\% in image-level AUROC on the BTAD dataset.
This paper presents a Transformer-based image compression system that allows for a variable image quality objective according to the user's preference. Optimizing a learned codec for different quality objectives leads to reconstructed images with varying visual characteristics. Our method provides the user with the flexibility to choose a trade-off between two image quality objectives using a single, shared model. Motivated by the success of prompt-tuning techniques, we introduce prompt tokens to condition our Transformer-based autoencoder. These prompt tokens are generated adaptively based on the user's preference and input image through learning a prompt generation network. Extensive experiments on commonly used quality metrics demonstrate the effectiveness of our method in adapting the encoding and/or decoding processes to a variable quality objective. While offering the additional flexibility, our proposed method performs comparably to the single-objective methods in terms of rate-distortion performance.
Accurately measuring translations between images is essential in many fields, including biology, medicine, geography, and physics. Existing methods, including the popular FFT-based cross-correlation, are not suitable for real-time analysis, which is especially vital in feedback control systems. To fill this gap, we introduce a new algorithm which approaches shot-noise limited displacement detection and a GPU-based implementation for real-time image analysis.
Mis- and disinformation online have become a major societal problem as major sources of online harms of different kinds. One common form of mis- and disinformation is out-of-context (OOC) information, where different pieces of information are falsely associated, e.g., a real image combined with a false textual caption or a misleading textual description. Although some past studies have attempted to defend against OOC mis- and disinformation through external evidence, they tend to disregard the role of different pieces of evidence with different stances. Motivated by the intuition that the stance of evidence represents a bias towards different detection results, we propose a stance extraction network (SEN) that can extract the stances of different pieces of multi-modal evidence in a unified framework. Moreover, we introduce a support-refutation score calculated based on the co-occurrence relations of named entities into the textual SEN. Extensive experiments on a public large-scale dataset demonstrated that our proposed method outperformed the state-of-the-art baselines, with the best model achieving a performance gain of 3.2% in accuracy.