Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Chen

Department of Radiology, Zhejiang Cancer Hospital, Hangzhou, 310022, China, Hangzhou Institute of Medicine

Spatial-Temporal Networks for Antibiogram Pattern Prediction

May 02, 2023

Xingbo Fu, Chen Chen, Yushun Dong, Anil Vullikanti, Eili Klein, Gregory Madden, Jundong Li

Abstract:An antibiogram is a periodic summary of antibiotic resistance results of organisms from infected patients to selected antimicrobial drugs. Antibiograms help clinicians to understand regional resistance rates and select appropriate antibiotics in prescriptions. In practice, significant combinations of antibiotic resistance may appear in different antibiograms, forming antibiogram patterns. Such patterns may imply the prevalence of some infectious diseases in certain regions. Thus it is of crucial importance to monitor antibiotic resistance trends and track the spread of multi-drug resistant organisms. In this paper, we propose a novel problem of antibiogram pattern prediction that aims to predict which patterns will appear in the future. Despite its importance, tackling this problem encounters a series of challenges and has not yet been explored in the literature. First of all, antibiogram patterns are not i.i.d as they may have strong relations with each other due to genomic similarities of the underlying organisms. Second, antibiogram patterns are often temporally dependent on the ones that are previously detected. Furthermore, the spread of antibiotic resistance can be significantly influenced by nearby or similar regions. To address the above challenges, we propose a novel Spatial-Temporal Antibiogram Pattern Prediction framework, STAPP, that can effectively leverage the pattern correlations and exploit the temporal and spatial information. We conduct extensive experiments on a real-world dataset with antibiogram reports of patients from 1999 to 2012 for 203 cities in the United States. The experimental results show the superiority of STAPP against several competitive baselines.

* Accepted by the 11th IEEE International Conference on Healthcare Informatics (IEEE ICHI 2023)

Via

Access Paper or Ask Questions

Secret Key Generation for IRS-Assisted Multi-Antenna Systems: A Machine Learning-Based Approach

Apr 28, 2023

Chen Chen, Junqing Zhang, Tianyu Lu, Magnus Sandell, Liquan Chen

Figure 1 for Secret Key Generation for IRS-Assisted Multi-Antenna Systems: A Machine Learning-Based Approach

Figure 2 for Secret Key Generation for IRS-Assisted Multi-Antenna Systems: A Machine Learning-Based Approach

Figure 3 for Secret Key Generation for IRS-Assisted Multi-Antenna Systems: A Machine Learning-Based Approach

Figure 4 for Secret Key Generation for IRS-Assisted Multi-Antenna Systems: A Machine Learning-Based Approach

Abstract:Physical-layer key generation (PKG) based on wireless channels is a lightweight technique to establish secure keys between legitimate communication nodes. Recently, intelligent reflecting surfaces (IRSs) have been leveraged to enhance the performance of PKG in terms of secret key rate (SKR), as it can reconfigure the wireless propagation environment and introduce more channel randomness. In this paper, we investigate an IRS-assisted PKG system, taking into account the channel spatial correlation at both the base station (BS) and the IRS. Based on the considered system model, the closed-form expression of SKR is derived analytically considering correlated eavesdropping channels. Aiming to maximise the SKR, a joint design problem of the BS precoding matrix and the IRS phase shift vector is formulated. To address this high-dimensional non-convex optimisation problem, we propose a novel unsupervised deep neural network (DNN)-based algorithm with a simple structure. Different from most previous works that adopt iterative optimisation to solve the problem, the proposed DNN-based algorithm directly obtains the BS precoding and IRS phase shifts as the output of the DNN. Simulation results reveal that the proposed DNN-based algorithm outperforms the benchmark methods with regard to SKR.

* This paper has been submitted to IEEE Transactions for possible publications. arXiv admin note: substantial text overlap with arXiv:2301.08179

Via

Access Paper or Ask Questions

Edit Everything: A Text-Guided Generative System for Images Editing

Apr 27, 2023

Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin

Figure 1 for Edit Everything: A Text-Guided Generative System for Images Editing

Figure 2 for Edit Everything: A Text-Guided Generative System for Images Editing

Figure 3 for Edit Everything: A Text-Guided Generative System for Images Editing

Figure 4 for Edit Everything: A Text-Guided Generative System for Images Editing

Abstract:We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.

Via

Access Paper or Ask Questions

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Apr 23, 2023

Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng

Figure 1 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Figure 2 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Figure 3 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Figure 4 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Abstract:Automatic speech recognition (ASR) has gained a remarkable success thanks to recent advances of deep learning, but it usually degrades significantly under real-world noisy conditions. Recent works introduce speech enhancement (SE) as front-end to improve speech quality, which is proved effective but may not be optimal for downstream ASR due to speech distortion problem. Based on that, latest works combine SE and currently popular self-supervised learning (SSL) to alleviate distortion and improve noise robustness. Despite the effectiveness, the speech distortion caused by conventional SE still cannot be completely eliminated. In this paper, we propose a self-supervised framework named Wav2code to implement a generalized SE without distortions for noise-robust ASR. First, in pre-training stage the clean speech representations from SSL model are sent to lookup a discrete codebook via nearest-neighbor feature matching, the resulted code sequence are then exploited to reconstruct the original clean representations, in order to store them in codebook as prior. Second, during finetuning we propose a Transformer-based code predictor to accurately predict clean codes by modeling the global dependency of input noisy representations, which enables discovery and restoration of high-quality clean representations without distortions. Furthermore, we propose an interactive feature fusion network to combine original noisy and the restored clean representations to consider both fidelity and quality, resulting in even more informative features for downstream ASR. Finally, experiments on both synthetic and real noisy datasets demonstrate that Wav2code can solve the speech distortion and improve ASR performance under various noisy conditions, resulting in stronger robustness.

* 12 pages, 7 figures, Submitted to IEEE/ACM TASLP

Via

Access Paper or Ask Questions

Med-Tuning: Exploring Parameter-Efficient Transfer Learning for Medical Volumetric Segmentation

Apr 21, 2023

Wenxuan Wang, Jiachen Shen, Chen Chen, Jianbo Jiao, Yan Zhang, Shanshan Song, Jiangyun Li

Abstract:Deep learning based medical volumetric segmentation methods either train the model from scratch or follow the standard "pre-training then finetuning" paradigm. Although finetuning a well pre-trained model on downstream tasks can harness its representation power, the standard full finetuning is costly in terms of computation and memory footprint. In this paper, we present the first study on parameter-efficient transfer learning for medical volumetric segmentation and propose a novel framework named Med-Tuning based on intra-stage feature enhancement and inter-stage feature interaction. Given a large-scale pre-trained model on 2D natural images, our method can exploit both the multi-scale spatial feature representations and temporal correlations along image slices, which are crucial for accurate medical volumetric segmentation. Extensive experiments on three benchmark datasets (including CT and MRI) show that our method can achieve better results than previous state-of-the-art parameter-efficient transfer learning methods and full finetuning for the segmentation task, with much less tuned parameter costs. Compared to full finetuning, our method reduces the finetuned model parameters by up to 4x, with even better segmentation performance.

Via

Access Paper or Ask Questions

FreMAE: Fourier Transform Meets Masked Autoencoders for Medical Image Segmentation

Apr 21, 2023

Wenxuan Wang, Jing Wang, Chen Chen, Jianbo Jiao, Lichao Sun, Yuanxiu Cai, Shanshan Song, Jiangyun Li

Abstract:The research community has witnessed the powerful potential of self-supervised Masked Image Modeling (MIM), which enables the models capable of learning visual representation from unlabeled data. In this paper, to incorporate both the crucial global structural information and local details for dense prediction tasks, we alter the perspective to the frequency domain and present a new MIM-based framework named FreMAE for self-supervised pre-training for medical image segmentation. Based on the observations that the detailed structural information mainly lies in the high-frequency components and the high-level semantics are abundant in the low-frequency counterparts, we further incorporate multi-stage supervision to guide the representation learning during the pre-training phase. Extensive experiments on three benchmark datasets show the superior advantage of our proposed FreMAE over previous state-of-the-art MIM methods. Compared with various baselines trained from scratch, our FreMAE could consistently bring considerable improvements to the model performance. To the best our knowledge, this is the first attempt towards MIM with Fourier Transform in medical image segmentation.

Via

Access Paper or Ask Questions

UDTIRI: An Open-Source Road Pothole Detection Benchmark Suite

Apr 18, 2023

Sicen Guo, Jiahang Li, Shuai Su, Yi Feng, Dacheng Zhou, Chen Chen, Denghuang Zhang, Xingyi Zhu, Qijun Chen, Rui Fan

Figure 1 for UDTIRI: An Open-Source Road Pothole Detection Benchmark Suite

Figure 2 for UDTIRI: An Open-Source Road Pothole Detection Benchmark Suite

Figure 3 for UDTIRI: An Open-Source Road Pothole Detection Benchmark Suite

Figure 4 for UDTIRI: An Open-Source Road Pothole Detection Benchmark Suite

Abstract:It is seen that there is enormous potential to leverage powerful deep learning methods in the emerging field of urban digital twins. It is particularly in the area of intelligent road inspection where there is currently limited research and data available. To facilitate progress in this field, we have developed a well-labeled road pothole dataset named Urban Digital Twins Intelligent Road Inspection (UDTIRI) dataset. We hope this dataset will enable the use of powerful deep learning methods in urban road inspection, providing algorithms with a more comprehensive understanding of the scene and maximizing their potential. Our dataset comprises 1000 images of potholes, captured in various scenarios with different lighting and humidity conditions. Our intention is to employ this dataset for object detection, semantic segmentation, and instance segmentation tasks. Our team has devoted significant effort to conducting a detailed statistical analysis, and benchmarking a selection of representative algorithms from recent years. We also provide a multi-task platform for researchers to fully exploit the performance of various algorithms with the support of UDTIRI dataset.

* Database webpage: https://www.udtiri.com/, Kaggle webpage: https://www.kaggle.com/datasets/jiahangli617/udtiri

Via

Access Paper or Ask Questions

A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Apr 07, 2023

Xinshun Wang, Shen Zhao, Chen Chen, Mengyuan Liu

Figure 1 for A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Figure 2 for A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Figure 3 for A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Figure 4 for A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Abstract:The past few years has witnessed the dominance of Graph Convolutional Networks (GCNs) over human motion prediction, while their performance is still far from satisfactory. Recently, MLP-Mixers show competitive results on top of being more efficient and simple. To extract features, GCNs typically follow an aggregate-and-update paradigm, while Mixers rely on token mixing and channel mixing operations. The two research paths have been independently established in the community. In this paper, we develop a novel perspective by unifying Mixers and GCNs. We show that a mixer layer can be seen as a graph convolutional layer applied to a fully-connected graph with parameterized adjacency. Extending this theoretical finding to the practical side, we propose Meta-Mixing Network (M$^2$-Net). Assisted with a novel zero aggregation operation, our network is capable of capturing both the structure-agnostic and the structure-sensitive dependencies in a collaborative manner. Not only is it computationally efficient, but most importantly, it also achieves state-of-the-art performance. An extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that M$^2$-Net consistently outperforms all other approaches. We hope our work brings the community one step further towards truly predictable human motion. Our code will be publicly available.

Via

Access Paper or Ask Questions

TopNet: Transformer-based Object Placement Network for Image Compositing

Apr 06, 2023

Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen

Figure 1 for TopNet: Transformer-based Object Placement Network for Image Compositing

Figure 2 for TopNet: Transformer-based Object Placement Network for Image Compositing

Figure 3 for TopNet: Transformer-based Object Placement Network for Image Compositing

Figure 4 for TopNet: Transformer-based Object Placement Network for Image Compositing

Abstract:We investigate the problem of automatically placing an object into a background image for image compositing. Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing. The quality of the composite image highly depends on the predicted location/scale. Existing works either generate candidate bounding boxes or apply sliding-window search using global representations from background and object images, which fail to model local information in background images. However, local clues in background images are important to determine the compatibility of placing the objects with certain locations/scales. In this paper, we propose to learn the correlation between object features and all local background features with a transformer module so that detailed information can be provided on all possible location/scale configurations. A sparse contrastive loss is further proposed to train our model with sparse supervision. Our new formulation generates a 3D heatmap indicating the plausibility of all location/scale combinations in one network forward pass, which is over 10 times faster than the previous sliding-window method. It also supports interactive search when users provide a pre-defined location or scale. The proposed method can be trained with explicit annotation or in a self-supervised manner using an off-the-shelf inpainting model, and it outperforms state-of-the-art methods significantly. The user study shows that the trained model generalizes well to real-world images with diverse challenging scenes and object categories.

* CVPR

Via

Access Paper or Ask Questions

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

Apr 06, 2023

Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

$Figure 1 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition$

$Figure 2 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition$

$Figure 3 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition$

$Figure 4 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition$

Abstract:Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database. Conventional methods generally adopt aggregated CNN features for global retrieval and RANSAC-based geometric verification for reranking. However, RANSAC only employs geometric information but ignores other possible information that could be useful for reranking, e.g. local feature correlations, and attention values. In this paper, we propose a unified place recognition framework that handles both retrieval and reranking with a novel transformer model, named $R^{2}$Former. The proposed reranking module takes feature correlation, attention value, and xy coordinates into account, and learns to determine whether the image pair is from the same location. The whole pipeline is end-to-end trainable and the reranking module alone can also be adopted on other CNN or transformer backbones as a generic component. Remarkably, $R^{2}$Former significantly outperforms state-of-the-art methods on major VPR datasets with much less inference time and memory consumption. It also achieves the state-of-the-art on the hold-out MSLS challenge set and could serve as a simple yet strong solution for real-world large-scale applications. Experiments also show vision transformer tokens are comparable and sometimes better than CNN local features on local matching. The code is released at https://github.com/Jeff-Zilence/R2Former.

* CVPR

Via

Access Paper or Ask Questions