Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiao Chen

Federated Data Model

Mar 13, 2024

Xiao Chen, Shunan Zhang, Eric Z. Chen, Yikang Liu, Lin Zhao, Terrence Chen, Shanhui Sun

Abstract:In artificial intelligence (AI), especially deep learning, data diversity and volume play a pivotal role in model development. However, training a robust deep learning model often faces challenges due to data privacy, regulations, and the difficulty of sharing data between different locations, especially for medical applications. To address this, we developed a method called the Federated Data Model (FDM). This method uses diffusion models to learn the characteristics of data at one site and then creates synthetic data that can be used at another site without sharing the actual data. We tested this approach with a medical image segmentation task, focusing on cardiac magnetic resonance images from different hospitals. Our results show that models trained with this method perform well both on the data they were originally trained on and on data from other sites. This approach offers a promising way to train accurate and privacy-respecting AI models across different locations.

Via

Access Paper or Ask Questions

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Feb 25, 2024

Xiao Chen, Quanyi Li, Tai Wang, Tianfan Xue, Jiangmiao Pang

Figure 1 for GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Figure 2 for GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Figure 3 for GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Figure 4 for GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Abstract:While recent advances in neural radiance field enable realistic digitization for large-scale scenes, the image-capturing process is still time-consuming and labor-intensive. Previous works attempt to automate this process using the Next-Best-View (NBV) policy for active 3D reconstruction. However, the existing NBV policies heavily rely on hand-crafted criteria, limited action space, or per-scene optimized representations. These constraints limit their cross-dataset generalizability. To overcome them, we propose GenNBV, an end-to-end generalizable NBV policy. Our policy adopts a reinforcement learning (RL)-based framework and extends typical limited action space to 5D free space. It empowers our agent drone to scan from any viewpoint, and even interact with unseen geometries during training. To boost the cross-dataset generalizability, we also propose a novel multi-source state embedding, including geometric, semantic, and action representations. We establish a benchmark using the Isaac Gym simulator with the Houses3K and OmniObject3D datasets to evaluate this NBV policy. Experiments demonstrate that our policy achieves a 98.26% and 97.12% coverage ratio on unseen building-scale objects from these datasets, respectively, outperforming prior solutions.

Via

Access Paper or Ask Questions

Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

Feb 07, 2024

Lu Wen, Qihun Zhang, Zhenghao Feng, Yuanyuan Xu, Xiao Chen, Jiliu Zhou, Yan Wang

Figure 1 for Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

Figure 2 for Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

Figure 3 for Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

Figure 4 for Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

Abstract:Radiotherapy is a primary treatment for cancers with the aim of applying sufficient radiation dose to the planning target volume (PTV) while minimizing dose hazards to the organs at risk (OARs). Convolutional neural networks (CNNs) have automated the radiotherapy plan-making by predicting the dose maps. However, current CNN-based methods ignore the remarkable dose difference in the dose map, i.e., high dose value in the interior PTV while low value in the exterior PTV, leading to a suboptimal prediction. In this paper, we propose a triplet-constraint transformer (TCtrans) with multi-scale refinement to predict the high-quality dose distribution. Concretely, a novel PTV-guided triplet constraint is designed to refine dose feature representations in the interior and exterior PTV by utilizing the explicit geometry of PTV. Furthermore, we introduce a multi-scale refinement (MSR) module to effectively fulfill the triplet constraint in different decoding layers with multiple scales. Besides, a transformer encoder is devised to learn the important global dosimetric knowledge. Experiments on a clinical cervical cancer dataset demonstrate the superiority of our method.

* accepted by 2024 IEEE ISBI

Via

Access Paper or Ask Questions

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Dec 26, 2023

Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue(+4 more)

Figure 1 for EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Figure 2 for EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Figure 3 for EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Figure 4 for EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Abstract:In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions. This necessitates the ability to fully understand 3D scenes given their first-person observations and contextualize them into language for interaction. However, traditional research focuses more on scene-level input and output setups from a global view. To address the gap, we introduce EmbodiedScan, a multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. It encompasses over 5k scans encapsulating 1M ego-centric RGB-D views, 1M language prompts, 160k 3D-oriented boxes spanning over 760 categories, some of which partially align with LVIS, and dense semantic occupancy with 80 common categories. Building upon this database, we introduce a baseline framework named Embodied Perceptron. It is capable of processing an arbitrary number of multi-modal inputs and demonstrates remarkable 3D perception capabilities, both within the two series of benchmarks we set up, i.e., fundamental 3D perception tasks and language-grounded tasks, and in the wild. Codes, datasets, and benchmarks will be available at https://github.com/OpenRobotLab/EmbodiedScan.

* A multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. Project page: http://tai-wang.github.io/embodiedscan

Via

Access Paper or Ask Questions

Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models

Dec 09, 2023

Hyuna Kwon, Tim Hsu, Wenyu Sun, Wonseok Jeong, Fikret Aydin, James Chapman, Xiao Chen, Matthew R. Carbone, Deyu Lu, Fei Zhou(+1 more)

Figure 1 for Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models

Figure 2 for Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models

Figure 3 for Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models

Figure 4 for Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models

Abstract:The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of amorphous carbons ($a$-C) as a representative material system from the target X-ray absorption near edge structure (XANES) spectra--a common experimental technique to probe atomic structures of materials. We show that conditional generation guided by XANES spectra reproduces key features of the target structures. Furthermore, we show that our model can steer the generative process to tailor atomic arrangements for a specific XANES spectrum. Finally, our generative model exhibits a remarkable scale-agnostic property, thereby enabling generation of realistic, large-scale structures through learning from a small-scale dataset (i.e., with small unit cells). Our work represents a significant stride in bridging the gap between materials characterization and atomic structure determination; in addition, it can be leveraged for materials discovery in exploring various material properties as targeted.

Via

Access Paper or Ask Questions

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Oct 24, 2023

Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

Figure 1 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Figure 2 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Figure 3 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Figure 4 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Abstract:Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models. We train a latent synthesizer to convert textual data into an intermediate latent representation of a pre-trained speech model. These pseudo acoustic representations of textual data augment acoustic data for model training. We evaluate LaSyn on low-resource automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. For ASR, LaSyn improves an E2E baseline trained on LibriSpeech train-clean-100, with relative word error rate reductions over 22.3% on different test sets. For SLU, LaSyn improves our E2E baseline by absolute 4.1% for intent classification accuracy and 3.8% for slot filling SLU-F1 on SLURP, and absolute 4.49% and 2.25% for exact match (EM) and EM-Tree accuracies on STOP respectively. With fewer parameters, the results of LaSyn are competitive to published state-of-the-art works. The results demonstrate the quality of the augmented training data.

* 15 pages, 8 figures, 8 tables, Accepted to EMNLP 2023 Findings

Via

Access Paper or Ask Questions

CAMEL2: Enhancing weakly supervised learning for histopathology images by incorporating the significance ratio

Oct 09, 2023

Gang Xu, Shuhao Wang, Lingyu Zhao, Xiao Chen, Tongwei Wang, Lang Wang, Zhenwei Luo, Dahan Wang, Zewen Zhang, Aijun Liu(+5 more)

Abstract:Histopathology image analysis plays a crucial role in cancer diagnosis. However, training a clinically applicable segmentation algorithm requires pathologists to engage in labour-intensive labelling. In contrast, weakly supervised learning methods, which only require coarse-grained labels at the image level, can significantly reduce the labeling efforts. Unfortunately, while these methods perform reasonably well in slide-level prediction, their ability to locate cancerous regions, which is essential for many clinical applications, remains unsatisfactory. Previously, we proposed CAMEL, which achieves comparable results to those of fully supervised baselines in pixel-level segmentation. However, CAMEL requires 1,280x1,280 image-level binary annotations for positive WSIs. Here, we present CAMEL2, by introducing a threshold of the cancerous ratio for positive bags, it allows us to better utilize the information, consequently enabling us to scale up the image-level setting from 1,280x1,280 to 5,120x5,120 while maintaining the accuracy. Our results with various datasets, demonstrate that CAMEL2, with the help of 5,120x5,120 image-level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance- and slide-level classifications.

* 41 pages, 13 figures

Via

Access Paper or Ask Questions

Enhancing Cardiac MRI Segmentation via Classifier-Guided Two-Stage Network and All-Slice Information Fusion Transformer

Sep 02, 2023

Zihao Chen, Xiao Chen, Yikang Liu, Eric Z. Chen, Terrence Chen, Shanhui Sun

Abstract:Cardiac Magnetic Resonance imaging (CMR) is the gold standard for assessing cardiac function. Segmenting the left ventricle (LV), right ventricle (RV), and LV myocardium (MYO) in CMR images is crucial but time-consuming. Deep learning-based segmentation methods have emerged as effective tools for automating this process. However, CMR images present additional challenges due to irregular and varying heart shapes, particularly in basal and apical slices. In this study, we propose a classifier-guided two-stage network with an all-slice fusion transformer to enhance CMR segmentation accuracy, particularly in basal and apical slices. Our method was evaluated on extensive clinical datasets and demonstrated better performance in terms of Dice score compared to previous CNN-based and transformer-based models. Moreover, our method produces visually appealing segmentation shapes resembling human annotations and avoids common issues like holes or fragments in other models' segmentations.

* Accepted by 2023 MICCAI AMAI workshop

Via

Access Paper or Ask Questions

DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation

Aug 02, 2023

Jingfan Chen, Yuxi Wang, Pengfei Wang, Xiao Chen, Zhaoxiang Zhang, Zhen Lei, Qing Li

Abstract:The Class Incremental Semantic Segmentation (CISS) extends the traditional segmentation task by incrementally learning newly added classes. Previous work has introduced generative replay, which involves replaying old class samples generated from a pre-trained GAN, to address the issues of catastrophic forgetting and privacy concerns. However, the generated images lack semantic precision and exhibit out-of-distribution characteristics, resulting in inaccurate masks that further degrade the segmentation performance. To tackle these challenges, we propose DiffusePast, a novel framework featuring a diffusion-based generative replay module that generates semantically accurate images with more reliable masks guided by different instructions (e.g., text prompts or edge maps). Specifically, DiffusePast introduces a dual-generator paradigm, which focuses on generating old class images that align with the distribution of downstream datasets while preserving the structure and layout of the original images, enabling more precise masks. To adapt to the novel visual concepts of newly added classes continuously, we incorporate class-wise token embedding when updating the dual-generator. Moreover, we assign adequate pseudo-labels of old classes to the background pixels in the new step images, further mitigating the forgetting of previously learned knowledge. Through comprehensive experiments, our method demonstrates competitive performance across mainstream benchmarks, striking a better balance between the performance of old and novel classes.

* e.g.: 13 pages, 7 figures

Via

Access Paper or Ask Questions

Data-Driven Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and Applications

Jun 26, 2023

Xuanfeng Li, Shengcai Liu, Jin Wang, Xiao Chen, Yew-Soon Ong, Ke Tang

Abstract:The multiple-choice knapsack problem (MCKP) is a classic NP-hard combinatorial optimization problem. Motivated by several significant practical applications, this work investigates a novel variant of MCKP called data-driven chance-constrained multiple-choice knapsack problem (DDCCMCKP), where the item weight is a random variable with unknown probability distribution. We first present the problem formulation of DDCCMCKP, and then establish two benchmark sets. The first set contains synthetic instances, and the second set is devised to simulate a real-world application scenario of a certain telecommunication company. To solve DDCCMCKP, we propose a data-driven adaptive local search (DDALS) algorithm. The main merit of DDALS lies in evaluating solutions with chance constraints by data-driven methods, under the condition of unknown distributions and only historical sample data being available. The experimental results demonstrate the effectiveness of the proposed algorithm and show that it is superior to other baselines. Additionally, ablation experiments confirm the necessity of each component in the algorithm. Our proposed algorithm can serve as the baseline for future research, and the code and benchmark sets will be open-sourced to further promote research on this challenging problem.

Via

Access Paper or Ask Questions