Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Chen

Univ. California, Santa Barbara

Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

Jul 11, 2024

Juren Li, Fanzhe Fu, Ran Wei, Yifei Sun, Zeyu Lai, Ning Song, Xin Chen, Yang Yang

Figure 1 for Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

Figure 2 for Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

Figure 3 for Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

Figure 4 for Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

Abstract:Pathogenic chromosome abnormalities are very common among the general population. While numerical chromosome abnormalities can be quickly and precisely detected, structural chromosome abnormalities are far more complex and typically require considerable efforts by human experts for identification. This paper focuses on investigating the modeling of chromosome features and the identification of chromosomes with structural abnormalities. Most existing data-driven methods concentrate on a single chromosome and consider each chromosome independently, overlooking the crucial aspect of homologous chromosomes. In normal cases, homologous chromosomes share identical structures, with the exception that one of them is abnormal. Therefore, we propose an adaptive method to align homologous chromosomes and diagnose structural abnormalities through homologous similarity. Inspired by the process of human expert diagnosis, we incorporate information from multiple pairs of homologous chromosomes simultaneously, aiming to reduce noise disturbance and improve prediction performance. Extensive experiments on real-world datasets validate the effectiveness of our model compared to baselines.

Via

Access Paper or Ask Questions

Depth-Guided Semi-Supervised Instance Segmentation

Jun 25, 2024

Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

Figure 1 for Depth-Guided Semi-Supervised Instance Segmentation

Figure 2 for Depth-Guided Semi-Supervised Instance Segmentation

Figure 3 for Depth-Guided Semi-Supervised Instance Segmentation

Figure 4 for Depth-Guided Semi-Supervised Instance Segmentation

Abstract:Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framework. This framework uses depth maps extracted from input images, which represent individual instances with closely associated distance values, offering precise contours for distinct instances. Unlike RGB data, depth maps provide a unique perspective, making their integration into the SSIS process complex. To this end, we propose Depth Feature Fusion, which integrates features extracted from depth estimation. This integration allows the model to understand depth information better and ensure its effective utilization. Additionally, to manage the variability of depth images during training, we introduce the Depth Controller. This component enables adaptive adjustments of the depth map, enhancing convergence speed and dynamically balancing the loss weights between RGB and depth maps. Extensive experiments conducted on the COCO and Cityscapes datasets validate the efficacy of our proposed method. Our approach establishes a new benchmark for SSIS, outperforming previous methods. Specifically, our DG achieves 22.29%, 31.47%, and 35.14% mAP for 1%, 5%, and 10% labeled data on the COCO dataset, respectively.

* 12 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

In Tree Structure Should Sentence Be Generated

Jun 20, 2024

Yaguang Li, Xin Chen

Abstract:Generative models reliant on sequential autoregression have been at the forefront of language generation for an extensive period, particularly following the introduction of widely acclaimed transformers. Despite its excellent performance, there are always some issues that we face today. For example, problems such as hallucinations and getting trapped in a logic loop may occur. To enhance the performance of existing systems, this paper introduces a new method for generating sequences in natural language, which involves generating the targeted sentence in a tree-traversing order. The paper includes an illustration of the theoretical basis and validity of the approach, as well as a comparison of its fundamentals with the diffusion model in graphic generation. Finally, a module called SenTree is introduced for generating an approximating binary tree. It is already available at https://github.com/arklyg/sentree. Additionally, a joint training framework based on this approach is proposed, incorporating the intrinsics of generative adversarial networks.

Via

Access Paper or Ask Questions

A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

Jun 18, 2024

Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Hongru Xiao, Mengdi Li, Pan Zhou, Muhammad Asif Ali, Di Wang

Figure 1 for A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

Figure 2 for A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

Figure 3 for A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

Figure 4 for A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

Abstract:Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1) For zero-shot CoT, why does prompting the model with "let's think step by step" significantly impact its outputs? (2) For few-shot CoT, why does providing examples before questioning the model could substantially improve its reasoning ability? To answer these questions, we conduct a top-down explainable analysis from the Hopfieldian view and propose a Read-and-Control approach for controlling the accuracy of CoT. Through extensive experiments on seven datasets for three different tasks, we demonstrate that our framework can decipher the inner workings of CoT, provide reasoning error localization, and control to come up with the correct reasoning path.

* 21 pages

Via

Access Paper or Ask Questions

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Jun 14, 2024

Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu(+2 more)

Figure 1 for MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Figure 2 for MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Figure 3 for MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Figure 4 for MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Abstract:Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

* Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

Via

Access Paper or Ask Questions

SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

Jun 02, 2024

Benjamin Towle, Xin Chen, Ke Zhou

Figure 1 for SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

Figure 2 for SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

Figure 3 for SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

Abstract:The recently released Segment Anything Model (SAM) has shown powerful zero-shot segmentation capabilities through a semi-automatic annotation setup in which the user can provide a prompt in the form of clicks or bounding boxes. There is growing interest around applying this to medical imaging, where the cost of obtaining expert annotations is high, privacy restrictions may limit sharing of patient data, and model generalisation is often poor. However, there are large amounts of inherent uncertainty in medical images, due to unclear object boundaries, low-contrast media, and differences in expert labelling style. Currently, SAM is known to struggle in a zero-shot setting to adequately annotate the contours of the structure of interest in medical images, where the uncertainty is often greatest, thus requiring significant manual correction. To mitigate this, we introduce \textbf{Sim}ulated Interaction for \textbf{S}egment \textbf{A}nything \textbf{M}odel (\textsc{\textbf{SimSAM}}), an approach that leverages simulated user interaction to generate an arbitrary number of candidate masks, and uses a novel aggregation approach to output the most compatible mask. Crucially, our method can be used during inference directly on top of SAM, without any additional training requirement. Quantitatively, we evaluate our method across three publicly available medical imaging datasets, and find that our approach leads to up to a 15.5\% improvement in contour segmentation accuracy compared to zero-shot SAM. Our code is available at \url{https://github.com/BenjaminTowle/SimSAM}.

* Published at ISBI 2024. Awarded Top 12 Oral Presentation

Via

Access Paper or Ask Questions

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

May 31, 2024

Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang(+4 more)

Figure 1 for MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

Figure 2 for MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

Figure 3 for MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

Figure 4 for MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

Abstract:The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate the Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.

Via

Access Paper or Ask Questions

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

May 24, 2024

Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, Sijia Liu

Figure 1 for Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

Figure 2 for Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

Figure 3 for Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

Figure 4 for Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

Abstract:Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt DMs post-unlearning to regenerate undesired images containing concepts (such as nudity) meant to be erased. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning, resulting in the robust unlearning framework referred to as AdvUnlearn. However, achieving this effectively and efficiently is highly nontrivial. First, we find that a straightforward implementation of AT compromises DMs' image generation quality post-unlearning. To address this, we develop a utility-retaining regularization on an additional retain set, optimizing the trade-off between concept erasure robustness and model utility in AdvUnlearn. Moreover, we identify the text encoder as a more suitable module for robustification compared to UNet, ensuring unlearning effectiveness. And the acquired text encoder can serve as a plug-and-play robust unlearner for various DM types. Empirically, we perform extensive experiments to demonstrate the robustness advantage of AdvUnlearn across various DM unlearning scenarios, including the erasure of nudity, objects, and style concepts. In addition to robustness, AdvUnlearn also achieves a balanced tradeoff with model utility. To our knowledge, this is the first work to systematically explore robust DM unlearning through AT, setting it apart from existing methods that overlook robustness in concept erasing. Codes are available at: https://github.com/OPTML-Group/AdvUnlearn

* Codes are available at https://github.com/OPTML-Group/AdvUnlearn

Via

Access Paper or Ask Questions

LocMoE+: Enhanced Router with Token Feature Awareness for Efficient LLM Pre-Training

May 24, 2024

Jing Li, Zhijie Sun, Dachao Lin, Xuan He, Yi Lin, Binfan Zheng, Li Zeng, Rongqian Zhao, Xin Chen

Abstract:Mixture-of-Experts (MoE) architectures have recently gained increasing popularity within the domain of large language models (LLMs) due to their ability to significantly reduce training and inference overhead. However, MoE architectures face challenges, such as significant disparities in the number of tokens assigned to each expert and a tendency toward homogenization among experts, which adversely affects the model's semantic generation capabilities. In this paper, we introduce LocMoE+, a refined version of the low-overhead LocMoE, incorporating the following enhancements: (1) Quantification and definition of the affinity between experts and tokens. (2) Implementation of a global-level adaptive routing strategy to rearrange tokens based on their affinity scores. (3) Reestimation of the lower bound for expert capacity, which has been shown to progressively decrease as the token feature distribution evolves. Experimental results demonstrate that, without compromising model convergence or efficacy, the number of tokens each expert processes can be reduced by over 60%. Combined with communication optimizations, this leads to an average improvement in training efficiency ranging from 5.4% to 46.6%. After fine-tuning, LocMoE+ exhibits a performance improvement of 9.7% to 14.1% across the GDAD, C-Eval, and TeleQnA datasets.

Via

Access Paper or Ask Questions

MrRegNet: Multi-resolution Mask Guided Convolutional Neural Network for Medical Image Registration with Large Deformations

May 16, 2024

Ruizhe Li, Grazziela Figueredo, Dorothee Auer, Christian Wagner, Xin Chen

Figure 1 for MrRegNet: Multi-resolution Mask Guided Convolutional Neural Network for Medical Image Registration with Large Deformations

Figure 2 for MrRegNet: Multi-resolution Mask Guided Convolutional Neural Network for Medical Image Registration with Large Deformations

Figure 3 for MrRegNet: Multi-resolution Mask Guided Convolutional Neural Network for Medical Image Registration with Large Deformations

Figure 4 for MrRegNet: Multi-resolution Mask Guided Convolutional Neural Network for Medical Image Registration with Large Deformations

Abstract:Deformable image registration (alignment) is highly sought after in numerous clinical applications, such as computer aided diagnosis and disease progression analysis. Deep Convolutional Neural Network (DCNN)-based image registration methods have demonstrated advantages in terms of registration accuracy and computational speed. However, while most methods excel at global alignment, they often perform worse in aligning local regions. To address this challenge, this paper proposes a mask-guided encoder-decoder DCNN-based image registration method, named as MrRegNet. This approach employs a multi-resolution encoder for feature extraction and subsequently estimates multi-resolution displacement fields in the decoder to handle the substantial deformation of images. Furthermore, segmentation masks are employed to direct the model's attention toward aligning local regions. The results show that the proposed method outperforms traditional methods like Demons and a well-known deep learning method, VoxelMorph, on a public 3D brain MRI dataset (OASIS) and a local 2D brain MRI dataset with large deformations. Importantly, the image alignment accuracies are significantly improved at local regions guided by segmentation masks. Github link:https://github.com/ruizhe-l/MrRegNet.

* Accepted for publication at IEEE International Symposium on Biomedical Imaging (ISBI) 2024

Via

Access Paper or Ask Questions