Alert button
Picture for Yan Wang

Yan Wang

Alert button

Bandwidth-efficient Inference for Neural Image Compression

Sep 07, 2023
Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu

Figure 1 for Bandwidth-efficient Inference for Neural Image Compression
Figure 2 for Bandwidth-efficient Inference for Neural Image Compression
Figure 3 for Bandwidth-efficient Inference for Neural Image Compression
Figure 4 for Bandwidth-efficient Inference for Neural Image Compression

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.

* 9 pages, 6 figures, submitted to ICASSP 2024 
Viaarxiv icon

Enhancing Event Sequence Modeling with Contrastive Relational Inference

Sep 06, 2023
Yan Wang, Zhixuan Chu, Tao Zhou, Caigao Jiang, Hongyan Hao, Minjie Zhu, Xindong Cai, Qing Cui, Longfei Li, James Y Zhang, Siqiao Xue, Jun Zhou

Neural temporal point processes(TPPs) have shown promise for modeling continuous-time event sequences. However, capturing the interactions between events is challenging yet critical for performing inference tasks like forecasting on event sequence data. Existing TPP models have focused on parameterizing the conditional distribution of future events but struggle to model event interactions. In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data. Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework. It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks.

* 6 pages, 2 figures 
Viaarxiv icon

Gene-induced Multimodal Pre-training for Image-omic Classification

Sep 06, 2023
Ting Jin, Xingran Xie, Renjie Wan, Qingli Li, Yan Wang

Figure 1 for Gene-induced Multimodal Pre-training for Image-omic Classification
Figure 2 for Gene-induced Multimodal Pre-training for Image-omic Classification
Figure 3 for Gene-induced Multimodal Pre-training for Image-omic Classification
Figure 4 for Gene-induced Multimodal Pre-training for Image-omic Classification

Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information.After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification. The code is publicly available at https://github.com/huangwudiduan/GIMP.

Viaarxiv icon

Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals

Aug 29, 2023
Guanghui Fu, Qing Zhao, Jianqiang Li, Dan Luo, Changwei Song, Wei Zhai, Shuo Liu, Fan Wang, Yan Wang, Lijuan Cheng, Juan Zhang, Bing Xiang Yang

Figure 1 for Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals
Figure 2 for Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals
Figure 3 for Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals
Figure 4 for Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals

In the contemporary landscape of social media, an alarming number of users express negative emotions, some of which manifest as strong suicidal intentions. This situation underscores a profound need for trained psychological counselors who can enact effective mental interventions. However, the development of these professionals is often an imperative but time-consuming task. Consequently, the mobilization of non-professionals or volunteers in this capacity emerges as a pressing concern. Leveraging the capabilities of artificial intelligence, and in particular, the recent advances in large language models, offers a viable solution to this challenge. This paper introduces a novel model constructed on the foundation of large language models to fully assist non-professionals in providing psychological interventions on online user discourses. This framework makes it plausible to harness the power of non-professional counselors in a meaningful way. A comprehensive study was conducted involving ten professional psychological counselors of varying expertise, evaluating the system across five critical dimensions. The findings affirm that our system is capable of analyzing patients' issues with relative accuracy and proffering professional-level strategies recommendations, thereby enhancing support for non-professionals. This research serves as a compelling validation of the application of large language models in the field of psychology and lays the groundwork for a new paradigm of community-based mental health support.

Viaarxiv icon

Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models

Aug 24, 2023
Yachao Zhao, Bo Wang, Dongming Zhao, Kun Huang, Yan Wang, Ruifang He, Yuexian Hou

Figure 1 for Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models

Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans, prompting researchers to investigate the cognitive aspects of LLMs. This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology. It posits that individuals' explicit social bias, which is their conscious expression of bias in the statements, may differ from their implicit social bias, which represents their unconscious bias. We propose a two-stage approach and discover a parallel phenomenon in LLMs known as "re-judge inconsistency" in social bias. In the initial stage, the LLM is tasked with automatically completing statements, potentially incorporating implicit social bias. However, in the subsequent stage, the same LLM re-judges the biased statement generated by itself but contradicts it. We propose that this re-judge inconsistency can be similar to the inconsistency between human's unaware implicit social bias and their aware explicit social bias. Experimental investigations on ChatGPT and GPT-4 concerning common gender biases examined in psychology corroborate the highly stable nature of the re-judge inconsistency. This finding may suggest that diverse cognitive constructs emerge as LLMs' capabilities strengthen. Consequently, leveraging psychological theories can provide enhanced insights into the underlying mechanisms governing the expressions of explicit and implicit constructs in LLMs.

Viaarxiv icon

A Unified Framework for 3D Point Cloud Visual Grounding

Aug 23, 2023
Haojia Lin, Yongdong Luo, Xiawu Zheng, Lijiang Li, Fei Chao, Taisong Jin, Donghao Luo, Chengjie Wang, Yan Wang, Liujuan Cao

Figure 1 for A Unified Framework for 3D Point Cloud Visual Grounding
Figure 2 for A Unified Framework for 3D Point Cloud Visual Grounding
Figure 3 for A Unified Framework for 3D Point Cloud Visual Grounding
Figure 4 for A Unified Framework for 3D Point Cloud Visual Grounding

3D point cloud visual grounding plays a critical role in 3D scene comprehension, encompassing 3D referring expression comprehension (3DREC) and segmentation (3DRES). We argue that 3DREC and 3DRES should be unified in one framework, which is also a natural progression in the community. To explain, 3DREC can help 3DRES locate the referent, while 3DRES can also facilitate 3DREC via more finegrained language-visual alignment. To achieve this, this paper takes the initiative step to integrate 3DREC and 3DRES into a unified framework, termed 3D Referring Transformer (3DRefTR). Its key idea is to build upon a mature 3DREC model and leverage ready query embeddings and visual tokens from the 3DREC model to construct a dedicated mask branch. Specially, we propose Superpoint Mask Branch, which serves a dual purpose: i) By leveraging the heterogeneous CPU-GPU parallelism, while the GPU is occupied generating visual tokens, the CPU concurrently produces superpoints, equivalently accomplishing the upsampling computation; ii) By harnessing on the inherent association between the superpoints and point cloud, it eliminates the heavy computational overhead on the high-resolution visual features for upsampling. This elegant design enables 3DRefTR to achieve both well-performing 3DRES and 3DREC capacities with only a 6% additional latency compared to the original 3DREC model. Empirical evaluations affirm the superiority of 3DRefTR. Specifically, on the ScanRefer dataset, 3DRefTR surpasses the state-of-the-art 3DRES method by 12.43% in mIoU and improves upon the SOTA 3DREC method by 0.6% Acc@0.25IoU.

Viaarxiv icon

Leveraging Large Language Models for Pre-trained Recommender Systems

Aug 21, 2023
Zhixuan Chu, Hongyan Hao, Xin Ouyang, Simeng Wang, Yan Wang, Yue Shen, Jinjie Gu, Qing Cui, Longfei Li, Siqiao Xue, James Y Zhang, Sheng Li

Figure 1 for Leveraging Large Language Models for Pre-trained Recommender Systems
Figure 2 for Leveraging Large Language Models for Pre-trained Recommender Systems
Figure 3 for Leveraging Large Language Models for Pre-trained Recommender Systems
Figure 4 for Leveraging Large Language Models for Pre-trained Recommender Systems

Recent advancements in recommendation systems have shifted towards more comprehensive and personalized recommendations by utilizing large language models (LLM). However, effectively integrating LLM's commonsense knowledge and reasoning abilities into recommendation systems remains a challenging problem. In this paper, we propose RecSysLLM, a novel pre-trained recommendation model based on LLMs. RecSysLLM retains LLM reasoning and knowledge while integrating recommendation domain knowledge through unique designs of data, training, and inference. This allows RecSysLLM to leverage LLMs' capabilities for recommendation tasks in an efficient, unified framework. We demonstrate the effectiveness of RecSysLLM on benchmarks and real-world scenarios. RecSysLLM provides a promising approach to developing unified recommendation systems by fully exploiting the power of pre-trained language models.

* 13 pages, 4 figures 
Viaarxiv icon

Enhancing Recommender Systems with Large Language Model Reasoning Graphs

Aug 21, 2023
Yan Wang, Zhixuan Chu, Xin Ouyang, Simeng Wang, Hongyan Hao, Yue Shen, Jinjie Gu, Siqiao Xue, James Y Zhang, Qing Cui, Longfei Li, Jun Zhou, Sheng Li

Figure 1 for Enhancing Recommender Systems with Large Language Model Reasoning Graphs
Figure 2 for Enhancing Recommender Systems with Large Language Model Reasoning Graphs
Figure 3 for Enhancing Recommender Systems with Large Language Model Reasoning Graphs
Figure 4 for Enhancing Recommender Systems with Large Language Model Reasoning Graphs

Recommendation systems aim to provide users with relevant suggestions, but often lack interpretability and fail to capture higher-level semantic relationships between user behaviors and profiles. In this paper, we propose a novel approach that leverages large language models (LLMs) to construct personalized reasoning graphs. These graphs link a user's profile and behavioral sequences through causal and logical inferences, representing the user's interests in an interpretable way. Our approach, LLM reasoning graphs (LLMRG), has four components: chained graph reasoning, divergent extension, self-verification and scoring, and knowledge base self-improvement. The resulting reasoning graph is encoded using graph neural networks, which serves as additional input to improve conventional recommender systems, without requiring extra user or item information. Our approach demonstrates how LLMs can enable more logical and interpretable recommender systems through personalized reasoning graphs. LLMRG allows recommendations to benefit from both engineered recommendation systems and LLM-derived reasoning graphs. We demonstrate the effectiveness of LLMRG on benchmarks and real-world scenarios in enhancing base recommendation models.

* 12 pages, 6 figures 
Viaarxiv icon

Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction

Aug 20, 2023
Zeyu Han, Yuhan Wang, Luping Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen

Figure 1 for Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction
Figure 2 for Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction
Figure 3 for Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction
Figure 4 for Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction

To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images. One widely adopted technique is the generative adversarial networks (GANs), yet recently, diffusion probabilistic models (DPMs) have emerged as a compelling alternative due to their improved sample quality and higher log-likelihood scores compared to GANs. Despite this, DPMs suffer from two major drawbacks in real clinical settings, i.e., the computationally expensive sampling process and the insufficient preservation of correspondence between the conditioning LPET image and the reconstructed PET (RPET) image. To address the above limitations, this paper presents a coarse-to-fine PET reconstruction framework that consists of a coarse prediction module (CPM) and an iterative refinement module (IRM). The CPM generates a coarse PET image via a deterministic process, and the IRM samples the residual iteratively. By delegating most of the computational overhead to the CPM, the overall sampling speed of our method can be significantly improved. Furthermore, two additional strategies, i.e., an auxiliary guidance strategy and a contrastive diffusion strategy, are proposed and integrated into the reconstruction process, which can enhance the correspondence between the LPET image and the RPET image, further improving clinical reliability. Extensive experiments on two human brain PET datasets demonstrate that our method outperforms the state-of-the-art PET reconstruction methods. The source code is available at \url{https://github.com/Show-han/PET-Reconstruction}.

* Accepted and presented in MICCAI 2023. To be published in Proceedings 
Viaarxiv icon

Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction

Aug 20, 2023
Jie Zeng, Zeyu Han, Xingchen Peng, Jianghong Xiao, Peng Wang, Yan Wang

Figure 1 for Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction
Figure 2 for Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction
Figure 3 for Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction
Figure 4 for Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction

Recently, deep learning (DL) has automated and accelerated the clinical radiation therapy (RT) planning significantly by predicting accurate dose maps. However, most DL-based dose map prediction methods are data-driven and not applicable for cervical cancer where only a small amount of data is available. To address this problem, this paper proposes to transfer the rich knowledge learned from another cancer, i.e., rectum cancer, which has the same scanning area and more clinically available data, to improve the dose map prediction performance for cervical cancer through domain adaptation. In order to close the congenital domain gap between the source (i.e., rectum cancer) and the target (i.e., cervical cancer) domains, we develop an effective Transformer-based polymerized feature module (PFM), which can generate an optimal polymerized feature distribution to smoothly align the two input distributions. Experimental results on two in-house clinical datasets demonstrate the superiority of the proposed method compared with state-of-the-art methods.

* Accepted and presented in ISBI 2023. To be published in Proceedings 
Viaarxiv icon