Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guan Wang

State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing, China, Key Laboratory of Photonic Control Technology

CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Mar 26, 2024

Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao(+1 more)

Figure 1 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Figure 2 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Figure 3 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Figure 4 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Abstract:Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node generation and the nnU-Net model for lymph node segmentation to improve the segmentation performance of abdominal lymph nodes through synthesizing a diversity of realistic abdominal lymph node data. We propose LN-DDPM, a conditional denoising diffusion probabilistic model (DDPM) for lymph node (LN) generation. LN-DDPM utilizes lymph node masks and anatomical structure masks as model conditions. These conditions work in two conditioning mechanisms: global structure conditioning and local detail conditioning, to distinguish between lymph nodes and their surroundings and better capture lymph node characteristics. The obtained paired abdominal lymph node images and masks are used for the downstream segmentation task. Experimental results on the abdominal lymph node datasets demonstrate that LN-DDPM outperforms other generative methods in the abdominal lymph node image synthesis and better assists the downstream abdominal lymph node segmentation task.

Via

Access Paper or Ask Questions

Detecting misinformation through Framing Theory: the Frame Element-based Model

Feb 19, 2024

Guan Wang, Rebecca Frederick, Jinglong Duan, William Wong, Verica Rupar, Weihua Li, Quan Bai

Figure 1 for Detecting misinformation through Framing Theory: the Frame Element-based Model

Figure 2 for Detecting misinformation through Framing Theory: the Frame Element-based Model

Figure 3 for Detecting misinformation through Framing Theory: the Frame Element-based Model

Figure 4 for Detecting misinformation through Framing Theory: the Frame Element-based Model

Abstract:In this paper, we delve into the rapidly evolving challenge of misinformation detection, with a specific focus on the nuanced manipulation of narrative frames - an under-explored area within the AI community. The potential for Generative AI models to generate misleading narratives underscores the urgency of this problem. Drawing from communication and framing theories, we posit that the presentation or 'framing' of accurate information can dramatically alter its interpretation, potentially leading to misinformation. We highlight this issue through real-world examples, demonstrating how shifts in narrative frames can transmute fact-based information into misinformation. To tackle this challenge, we propose an innovative approach leveraging the power of pre-trained Large Language Models and deep neural networks to detect misinformation originating from accurate facts portrayed under different frames. These advanced AI techniques offer unprecedented capabilities in identifying complex patterns within unstructured data critical for examining the subtleties of narrative frames. The objective of this paper is to bridge a significant research gap in the AI domain, providing valuable insights and methodologies for tackling framing-induced misinformation, thus contributing to the advancement of responsible and trustworthy AI technologies. Several experiments are intensively conducted and experimental results explicitly demonstrate the various impact of elements of framing theory proving the rationale of applying framing theory to increase the performance in misinformation detection.

* 17 pages, 9 figures, 7 tables

Via

Access Paper or Ask Questions

Training A Multi-stage Deep Classifier with Feedback Signals

Nov 12, 2023

Chao Xu, Yu Yang, Rongzhao Wang, Guan Wang, Bojia Lin

Abstract:Multi-Stage Classifier (MSC) - several classifiers working sequentially in an arranged order and classification decision is partially made at each step - is widely used in industrial applications for various resource limitation reasons. The classifiers of a multi-stage process are usually Neural Network (NN) models trained independently or in their inference order without considering the signals from the latter stages. Aimed at two-stage binary classification process, the most common type of MSC, we propose a novel training framework, named Feedback Training. The classifiers are trained in an order reverse to their actual working order, and the classifier at the later stage is used to guide the training of initial-stage classifier via a sample weighting method. We experimentally show the efficacy of our proposed approach, and its great superiority under the scenario of few-shot training.

Via

Access Paper or Ask Questions

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Sep 20, 2023

Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu

Figure 1 for OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Figure 2 for OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Figure 3 for OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Figure 4 for OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Abstract:Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat.

Via

Access Paper or Ask Questions

Evolving Connectivity for Recurrent Spiking Neural Networks

May 28, 2023

Guan Wang, Yuhao Sun, Sijie Cheng, Sen Song

Figure 1 for Evolving Connectivity for Recurrent Spiking Neural Networks

Figure 2 for Evolving Connectivity for Recurrent Spiking Neural Networks

Figure 3 for Evolving Connectivity for Recurrent Spiking Neural Networks

Figure 4 for Evolving Connectivity for Recurrent Spiking Neural Networks

Abstract:Recurrent spiking neural networks (RSNNs) hold great potential for advancing artificial general intelligence, as they draw inspiration from the biological nervous system and show promise in modeling complex dynamics. However, the widely-used surrogate gradient-based training methods for RSNNs are inherently inaccurate and unfriendly to neuromorphic hardware. To address these limitations, we propose the evolving connectivity (EC) framework, an inference-only method for training RSNNs. The EC framework reformulates weight-tuning as a search into parameterized connection probability distributions, and employs Natural Evolution Strategies (NES) for optimizing these distributions. Our EC framework circumvents the need for gradients and features hardware-friendly characteristics, including sparse boolean connections and high scalability. We evaluate EC on a series of standard robotic locomotion tasks, where it achieves comparable performance with deep neural networks and outperforms gradient-trained RSNNs, even solving the complex 17-DoF humanoid task. Additionally, the EC framework demonstrates a two to three fold speedup in efficiency compared to directly evolving parameters. By providing a performant and hardware-friendly alternative, the EC framework lays the groundwork for further energy-efficient applications of RSNNs and advances the development of neuromorphic devices.

Via

Access Paper or Ask Questions

AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization

May 26, 2023

Guan Wang, Weihua Li, Edmund M-K. Lai, Quan Bai

Abstract:The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant information by generating short and salient content from long or multiple documents. Recent advances in pre-trained language models, such as ChatGPT, have demonstrated the potential of Large Language Models (LLMs) in text generation. However, LLMs require massive amounts of data and resources and are challenging to implement as offline applications. Furthermore, existing text summarization approaches often lack the ``adaptive" nature required to capture diverse aspects in opinion summarization, which is particularly detrimental to users with specific requirements or preferences. In this paper, we propose an Aspect-adaptive Knowledge-based Opinion Summarization model for product reviews, which effectively captures the adaptive nature required for opinion summarization. The model generates aspect-oriented summaries given a set of reviews for a particular product, efficiently providing users with useful information on specific aspects they are interested in, ensuring the generated summaries are more personalized and informative. Extensive experiments have been conducted using real-world datasets to evaluate the proposed model. The results demonstrate that our model outperforms state-of-the-art approaches and is adaptive and efficient in generating summaries that focus on particular aspects, enabling users to make well-informed decisions and catering to their diverse interests and preferences.

* 21 pages, 4 figures, 7 tables

Via

Access Paper or Ask Questions

PO-VINS: An Efficient Pose-Only LiDAR-Enhanced Visual-Inertial State Estimator

May 22, 2023

Hailiang Tang, Xiaoji Niu, Tisheng Zhang, Liqiang Wang, Guan Wang, Jingnan Liu

Figure 1 for PO-VINS: An Efficient Pose-Only LiDAR-Enhanced Visual-Inertial State Estimator

Figure 2 for PO-VINS: An Efficient Pose-Only LiDAR-Enhanced Visual-Inertial State Estimator

Figure 3 for PO-VINS: An Efficient Pose-Only LiDAR-Enhanced Visual-Inertial State Estimator

Figure 4 for PO-VINS: An Efficient Pose-Only LiDAR-Enhanced Visual-Inertial State Estimator

Abstract:The pose-only (PO) visual representation has been proven to be equivalent to the classical multiple-view geometry, while significantly improving computational efficiency. However, its applicability for real-world navigation in large-scale complex environments has not yet been demonstrated. In this study, we present an efficient pose-only LiDAR-enhanced visual-inertial navigation system (PO-VINS) to enhance the real-time performance of the state estimator. In the visual-inertial state estimator (VISE), we propose a pose-only visual-reprojection measurement model that only contains the inertial measurement unit (IMU) pose and extrinsic-parameter states. We further integrated the LiDAR-enhanced method to construct a pose-only LiDAR-depth measurement model. Real-world experiments were conducted in large-scale complex environments, demonstrating that the proposed PO-VISE and LiDAR-enhanced PO-VISE reduce computational complexity by more than 50% and over 20%, respectively. Additionally, the PO-VINS yields the same accuracy as conventional methods. These results indicate that the pose-only solution is efficient and applicable for real-time visual-inertial state estimation.

Via

Access Paper or Ask Questions

Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal Retriveal

May 07, 2023

Zhitao Liu, Zengyu Liu, Jiwei Wei, Guan Wang, Zhenjiang Du, Ning Xie, Heng Tao Shen

Figure 1 for Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal Retriveal

Figure 2 for Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal Retriveal

Figure 3 for Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal Retriveal

Figure 4 for Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal Retriveal

Abstract:3D cross-modal retrieval is gaining attention in the multimedia community. Central to this topic is learning a joint embedding space to represent data from different modalities, such as images, 3D point clouds, and polygon meshes, to extract modality-invariant and discriminative features. Hence, the performance of cross-modal retrieval methods heavily depends on the representational capacity of this embedding space. Existing methods treat all instances equally, applying the same penalty strength to instances with varying degrees of difficulty, ignoring the differences between instances. This can result in ambiguous convergence or local optima, severely compromising the separability of the feature space. To address this limitation, we propose an Instance-Variant loss to assign different penalty strengths to different instances, improving the space separability. Specifically, we assign different penalty weights to instances positively related to their intra-class distance. Simultaneously, we reduce the cross-modal discrepancy between features by learning a shared weight vector for the same class data from different modalities. By leveraging the Gaussian RBF kernel to evaluate sample similarity, we further propose an Intra-Class loss function that minimizes the intra-class distance among same-class instances. Extensive experiments on three 3D cross-modal datasets show that our proposed method surpasses recent state-of-the-art approaches.

Via

Access Paper or Ask Questions

KATSum: Knowledge-aware Abstractive Text Summarization

Dec 06, 2022

Guan Wang, Weihua Li, Edmund Lai, Jianhua Jiang

Figure 1 for KATSum: Knowledge-aware Abstractive Text Summarization

Figure 2 for KATSum: Knowledge-aware Abstractive Text Summarization

Figure 3 for KATSum: Knowledge-aware Abstractive Text Summarization

Figure 4 for KATSum: Knowledge-aware Abstractive Text Summarization

Abstract:Text Summarization is recognised as one of the NLP downstream tasks and it has been extensively investigated in recent years. It can assist people with perceiving the information rapidly from the Internet, including news articles, social posts, videos, etc. Most existing research works attempt to develop summarization models to produce a better output. However, advent limitations of most existing models emerge, including unfaithfulness and factual errors. In this paper, we propose a novel model, named as Knowledge-aware Abstractive Text Summarization, which leverages the advantages offered by Knowledge Graph to enhance the standard Seq2Seq model. On top of that, the Knowledge Graph triplets are extracted from the source text and utilised to provide keywords with relational information, producing coherent and factually errorless summaries. We conduct extensive experiments by using real-world data sets. The results reveal that the proposed framework can effectively utilise the information from Knowledge Graph and significantly reduce the factual errors in the summary.

* Presented at PKAW 2022 (arXiv:2211.03888) Report-no: PKAW/2022/02

Via

Access Paper or Ask Questions

PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection

Nov 15, 2022

Hao Liu, Zhuoran Xu, Dan Wang, Baofeng Zhang, Guan Wang, Bo Dong, Xin Wen, Xinyu Xu

Figure 1 for PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection

Figure 2 for PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection

Figure 3 for PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection

Figure 4 for PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection

Abstract:3D object detection is a critical task in autonomous driving. Recently multi-modal fusion-based 3D object detection methods, which combine the complementary advantages of LiDAR and camera, have shown great performance improvements over mono-modal methods. However, so far, no methods have attempted to utilize the instance-level contextual image semantics to guide the 3D object detection. In this paper, we propose a simple and effective Painting Adaptive Instance-prior for 3D object detection (PAI3D) to fuse instance-level image semantics flexibly with point cloud features. PAI3D is a multi-modal sequential instance-level fusion framework. It first extracts instance-level semantic information from images, the extracted information, including objects categorical label, point-to-object membership and object position, are then used to augment each LiDAR point in the subsequent 3D detection network to guide and improve detection performance. PAI3D outperforms the state-of-the-art with a large margin on the nuScenes dataset, achieving 71.4 in mAP and 74.2 in NDS on the test split. Our comprehensive experiments show that instance-level image semantics contribute the most to the performance gain, and PAI3D works well with any good-quality instance segmentation models and any modern point cloud 3D encoders, making it a strong candidate for deployment on autonomous vehicles.

Via

Access Paper or Ask Questions