Alert button
Picture for Jing Qin

Jing Qin

Alert button

Feature-oriented Deep Learning Framework for Pulmonary Cone-beam CT (CBCT) Enhancement with Multi-task Customized Perceptual Loss

Nov 01, 2023
Jiarui Zhu, Werxing Chen, Hongfei Sun, Shaohua Zhi, Jing Qin, Jing Cai, Ge Ren

Cone-beam computed tomography (CBCT) is routinely collected during image-guided radiation therapy (IGRT) to provide updated patient anatomy information for cancer treatments. However, CBCT images often suffer from streaking artifacts and noise caused by under-rate sampling projections and low-dose exposure, resulting in low clarity and information loss. While recent deep learning-based CBCT enhancement methods have shown promising results in suppressing artifacts, they have limited performance on preserving anatomical details since conventional pixel-to-pixel loss functions are incapable of describing detailed anatomy. To address this issue, we propose a novel feature-oriented deep learning framework that translates low-quality CBCT images into high-quality CT-like imaging via a multi-task customized feature-to-feature perceptual loss function. The framework comprises two main components: a multi-task learning feature-selection network(MTFS-Net) for customizing the perceptual loss function; and a CBCT-to-CT translation network guided by feature-to-feature perceptual loss, which uses advanced generative models such as U-Net, GAN and CycleGAN. Our experiments showed that the proposed framework can generate synthesized CT (sCT) images for the lung that achieved a high similarity to CT images, with an average SSIM index of 0.9869 and an average PSNR index of 39.9621. The sCT images also achieved visually pleasing performance with effective artifacts suppression, noise reduction, and distinctive anatomical details preservation. Our experiment results indicate that the proposed framework outperforms the state-of-the-art models for pulmonary CBCT enhancement. This framework holds great promise for generating high-quality anatomical imaging from CBCT that is suitable for various clinical applications.

* 32 pages,7 figures,journal 
Viaarxiv icon

DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations

Oct 17, 2023
Yazhou Zhang, Mengyao Wang, Prayag Tiwari, Qiuchi Li, Benyou Wang, Jing Qin

Figure 1 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations
Figure 2 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations
Figure 3 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations
Figure 4 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations

Large language models (LLMs) and their variants have shown extraordinary efficacy across numerous downstream natural language processing (NLP) tasks, which has presented a new vision for the development of NLP. Despite their remarkable performance in natural language generating (NLG), LLMs lack a distinct focus on the emotion understanding domain. As a result, using LLMs for emotion recognition may lead to suboptimal and inadequate precision. Another limitation of LLMs is that they are typical trained without leveraging multi-modal information. To overcome these limitations, we propose DialogueLLM, a context and emotion knowledge tuned LLM that is obtained by fine-tuning LLaMA models with 13,638 multi-modal (i.e., texts and videos) emotional dialogues. The visual information is considered as the supplementary knowledge to construct high-quality instructions. We offer a comprehensive evaluation of our proposed model on three benchmarking emotion recognition in conversations (ERC) datasets and compare the results against the SOTA baselines and other SOTA LLMs. Additionally, DialogueLLM-7B can be easily trained using LoRA on a 40GB A100 GPU in 5 hours, facilitating reproducibility for other researchers.

Viaarxiv icon

CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

Sep 19, 2023
Chengyan Wang, Jun Lyu, Shuo Wang, Chen Qin, Kunyuan Guo, Xinyu Zhang, Xiaotong Yu, Yan Li, Fanwen Wang, Jianhua Jin, Zhang Shi, Ziqiang Xu, Yapeng Tian, Sha Hua, Zhensen Chen, Meng Liu, Mengting Sun, Xutong Kuang, Kang Wang, Haoran Wang, Hao Li, Yinghua Chu, Guang Yang, Wenjia Bai, Xiahai Zhuang, He Wang, Jing Qin, Xiaobo Qu

Figure 1 for CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction
Figure 2 for CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction
Figure 3 for CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction
Figure 4 for CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

Cardiac magnetic resonance imaging (CMR) has emerged as a valuable diagnostic tool for cardiac diseases. However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images. There has been growing interest in deep learning-based CMR imaging algorithms that can reconstruct high-quality images from highly under-sampled k-space data. However, the development of deep learning methods requires large training datasets, which have not been publicly available for CMR. To address this gap, we released a dataset that includes multi-contrast, multi-view, multi-slice and multi-coil CMR imaging data from 300 subjects. Imaging studies include cardiac cine and mapping sequences. Manual segmentations of the myocardium and chambers of all the subjects are also provided within the dataset. Scripts of state-of-the-art reconstruction algorithms were also provided as a point of reference. Our aim is to facilitate the advancement of state-of-the-art CMR image reconstruction by introducing standardized evaluation criteria and making the dataset freely accessible to the research community. Researchers can access the dataset at https://www.synapse.org/#!Synapse:syn51471091/wiki/.

* 14 pages, 8 figures 
Viaarxiv icon

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Aug 23, 2023
Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He

Figure 1 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Figure 2 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Figure 3 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Figure 4 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies. This work aims to take a step forward to understand how humans perceive non-photorealistic videos with eye fixation (\ie, saliency detection), which is critical for enhancing media production, artistic design, and game user experience. To fill in the gap of missing a suitable dataset for this research line, we present NPF-200, the first large-scale multi-modal dataset of purely non-photorealistic videos with eye fixations. Our dataset has three characteristics: 1) it contains soundtracks that are essential according to vision and psychological studies; 2) it includes diverse semantic content and videos are of high-quality; 3) it has rich motions across and within videos. We conduct a series of analyses to gain deeper insights into this task and compare several state-of-the-art methods to explore the gap between natural images and non-photorealistic data. Additionally, as the human attention system tends to extract visual and audio features with different frequencies, we propose a universal frequency-aware multi-modal non-photorealistic saliency detection model called NPSNet, demonstrating the state-of-the-art performance of our task. The results uncover strengths and weaknesses of multi-modal network design and multi-domain training, opening up promising directions for future works. {Our dataset and code can be found at \url{https://github.com/Yangziyu/NPF200}}.

* Accepted by ACM MM 2023 
Viaarxiv icon

Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation

Aug 10, 2023
Jun Zhou, Kai Chen, Linlin Xu, Qi Dou, Jing Qin

Figure 1 for Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation
Figure 2 for Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation
Figure 3 for Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation
Figure 4 for Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation

One critical challenge in 6D object pose estimation from a single RGBD image is efficient integration of two different modalities, i.e., color and depth. In this work, we tackle this problem by a novel Deep Fusion Transformer~(DFTr) block that can aggregate cross-modality features for improving pose estimation. Unlike existing fusion methods, the proposed DFTr can better model cross-modality semantic correlation by leveraging their semantic similarity, such that globally enhanced features from different modalities can be better integrated for improved information extraction. Moreover, to further improve robustness and efficiency, we introduce a novel weighted vector-wise voting algorithm that employs a non-iterative global optimization strategy for precise 3D keypoint localization while achieving near real-time inference. Extensive experiments show the effectiveness and strong generalization capability of our proposed 3D keypoint voting algorithm. Results on four widely used benchmarks also demonstrate that our method outperforms the state-of-the-art methods by large margins.

* Accepted by ICCV2023 
Viaarxiv icon

SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator

Jul 17, 2023
Zhe Zhu, Honghua Chen, Xing He, Weiming Wang, Jing Qin, Mingqiang Wei

Figure 1 for SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator
Figure 2 for SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator
Figure 3 for SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator
Figure 4 for SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator

In this paper, we propose a novel network, SVDFormer, to tackle two specific challenges in point cloud completion: understanding faithful global shapes from incomplete point clouds and generating high-accuracy local structures. Current methods either perceive shape patterns using only 3D coordinates or import extra images with well-calibrated intrinsic parameters to guide the geometry estimation of the missing parts. However, these approaches do not always fully leverage the cross-modal self-structures available for accurate and high-quality point cloud completion. To this end, we first design a Self-view Fusion Network that leverages multiple-view depth image information to observe incomplete self-shape and generate a compact global shape. To reveal highly detailed structures, we then introduce a refinement module, called Self-structure Dual-generator, in which we incorporate learned shape priors and geometric self-similarities for producing new points. By perceiving the incompleteness of each point, the dual-path design disentangles refinement strategies conditioned on the structural type of each point. SVDFormer absorbs the wisdom of self-structures, avoiding any additional paired information such as color images with precisely calibrated camera intrinsic parameters. Comprehensive experiments indicate that our method achieves state-of-the-art performance on widely-used benchmarks. Code will be available at https://github.com/czvvd/SVDFormer.

* Accepted by ICCV2023 
Viaarxiv icon

Stochastic Natural Thresholding Algorithms

Jun 07, 2023
Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, Jing Qin

Figure 1 for Stochastic Natural Thresholding Algorithms
Figure 2 for Stochastic Natural Thresholding Algorithms
Figure 3 for Stochastic Natural Thresholding Algorithms
Figure 4 for Stochastic Natural Thresholding Algorithms

Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT.

Viaarxiv icon

Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Apr 20, 2023
Yang Zhou, Hanjie Wu, Wenxi Liu, Zheng Xiong, Jing Qin, Shengfeng He

Figure 1 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo
Figure 2 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo
Figure 3 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo
Figure 4 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Synthesizing novel views from a single view image is a highly ill-posed problem. We discover an effective solution to reduce the learning ambiguity by expanding the single-view view synthesis problem to a multi-view setting. Specifically, we leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint, which serves as an auxiliary input to construct the 3D space. In this way, the challenging novel view synthesis process is decoupled into two simpler problems of stereo synthesis and 3D reconstruction. In order to synthesize a structurally correct and detail-preserved stereo image, we propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner. Hard-to-train and incorrect warping samples are first discovered by two strategies, 1) pruning the network to reveal low-confident predictions; and 2) bidirectionally matching between stereo images to allow the discovery of improper mapping. These regions are then inpainted to form the final pseudo-stereo. With the aid of this extra input, a preferable 3D reconstruction can be easily obtained, and our method can work with arbitrary 3D representations. Extensive experiments show that our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.

Viaarxiv icon

Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations

Apr 10, 2023
Jing Qin, Biyun Xie

Figure 1 for Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations
Figure 2 for Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations
Figure 3 for Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations
Figure 4 for Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations

Motion detection has been widely used in many applications, such as surveillance and robotics. Due to the presence of the static background, a motion video can be decomposed into a low-rank background and a sparse foreground. Many regularization techniques that preserve low-rankness of matrices can therefore be imposed on the background. In the meanwhile, geometry-based regularizations, such as graph regularizations, can be imposed on the foreground. Recently, weighted regularization techniques including the weighted nuclear norm regularization have been proposed in the image processing community to promote adaptive sparsity while achieving efficient performance. In this paper, we propose a robust dual graph regularized moving object detection model based on a novel weighted nuclear norm regularization and spatiotemporal graph Laplacians. Numerical experiments on realistic human motion data sets have demonstrated the effectiveness and robustness of this approach in separating moving objects from background, and the enormous potential in robotic applications.

* arXiv admin note: substantial text overlap with arXiv:2204.11939 
Viaarxiv icon

Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation

Mar 14, 2023
Jing Zou, Noémie Debroux, Lihao Liu, Jing Qin, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Figure 1 for Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation
Figure 2 for Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation
Figure 3 for Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation
Figure 4 for Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation

Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving transformations. This is a key property to preserve anatomical structures and achieve plausible transformations that can be used in real clinical settings. We propose a novel framework for deformable image registration. Firstly, we introduce a novel regulariser based on conformal-invariant properties in a nonlinear elasticity setting. Our regulariser enforces the deformation field to be smooth, invertible and orientation-preserving. More importantly, we strictly guarantee topology preservation yielding to a clinical meaningful registration. Secondly, we boost the performance of our regulariser through coordinate MLPs, where one can view the to-be-registered images as continuously differentiable entities. We demonstrate, through numerical and visual experiments, that our framework is able to outperform current techniques for image registration.

* 11 pages, 3 figures 
Viaarxiv icon