Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Xu

PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

Aug 13, 2024

Xiaomin Wu, Rui Xu, Pengchen Wei, Wenkang Qin, Peixiang Huang, Ziheng Li, Lin Luo

Figure 1 for PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

Figure 2 for PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

Figure 3 for PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

Figure 4 for PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

Abstract:Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant divide between cutting-edge technology and its application in the clinical setting. We had meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks, including the classification of organ tissues, generating pathology report descriptions, and addressing pathology-related questions and answers. We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance. We conducted a qualitative assessment of the capabilities of the base model and the fine-tuned model in performing image captioning and classification tasks on the specific dataset. The evaluation results demonstrate that the fine-tuned model exhibits proficiency in addressing typical pathological questions. We hope that by making both our models and datasets publicly available, they can be valuable to the medical and research communities.

* 10 pages, 2 figures

Via

Access Paper or Ask Questions

Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

Jul 25, 2024

Jiacheng Wang, Hao Li, Dewei Hu, Rui Xu, Xing Yao, Yuankai K. Tao, Ipek Oguz

Figure 1 for Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

Figure 2 for Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

Figure 3 for Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

Figure 4 for Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

Abstract:We propose a novel framework for retinal feature point alignment, designed for learning cross-modality features to enhance matching and registration across multi-modality retinal images. Our model draws on the success of previous learning-based feature detection and description methods. To better leverage unlabeled data and constrain the model to reproduce relevant keypoints, we integrate a keypoint-based segmentation task. It is trained in a self-supervised manner by enforcing segmentation consistency between different augmentations of the same image. By incorporating a keypoint augmented self-supervised layer, we achieve robust feature extraction across modalities. Extensive evaluation on two public datasets and one in-house dataset demonstrates significant improvements in performance for modality-agnostic retinal feature alignment. Our code and model weights are publicly available at \url{https://github.com/MedICL-VU/RetinaIPA}.

Via

Access Paper or Ask Questions

MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Jul 07, 2024

Rui Xu, Dakuan Lu, Xiaoyu Tan, Xintao Wang, Siyu Yuan, Jiangjie Chen, Wei Chu, Xu Yinghui

Figure 1 for MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Figure 2 for MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Figure 3 for MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Figure 4 for MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Abstract:Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this paper, we hence introduce MINDECHO, a comprehensive framework for the development and evaluation of KOL RPLAs. MINDECHO collects KOL data from Internet video transcripts in various professional fields, and synthesizes their conversations leveraging GPT-4. Then, the conversations and the transcripts are used for individualized model training and inference-time retrieval, respectively. Our evaluation covers both general dimensions (\ie, knowledge and tones) and fan-centric dimensions for KOLs. Extensive experiments validate the effectiveness of MINDECHO in developing and evaluating KOL RPLAs.

Via

Access Paper or Ask Questions

Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data

Jun 27, 2024

Yiting Ran, Xintao Wang, Rui Xu, Xinfeng Yuan, Jiaqing Liang, Yanghua Xiao, Deqing Yang

Abstract:Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indicative data. Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters. Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations. Code and data are available at \href{https://github.com/alienet1109/RolePersonality}{this URL}.

* 10pages

Via

Access Paper or Ask Questions

PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking

May 23, 2024

Chongwei Liu, Haojie Li, Zhihui Wang, Rui Xu

Figure 1 for PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking

Figure 2 for PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking

Figure 3 for PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking

Figure 4 for PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking

Abstract:Recent advances in Multi-Object Tracking (MOT) have achieved remarkable success in short-term association within the decoupled tracking-by-detection online paradigm. However, long-term tracking still remains a challenging task. Although graph-based approaches can address this issue by modeling trajectories as a graph in the decoupled manner, their non-online nature poses obstacles for real-time applications. In this paper, we demonstrate that the trajectory graph is a directed acyclic graph, which can be represented by an object sequence arranged by frame and a binary adjacency matrix. It is a coincidence that the binary matrix matches the attention mask in the Transformer, and the object sequence serves exactly as a natural input sequence. Intuitively, we propose that a pure Transformer can naturally unify short- and long-term associations in a decoupled and online manner. Our experiments show that a classic Transformer architecture naturally suits the association problem and achieves a strong baseline compared to existing foundational methods across four datasets: DanceTrack, SportsMOT, MOT17, and MOT20, as well as superior generalizability in domain shift. Moreover, the decoupled property also enables efficient training and inference. This work pioneers a promising Transformer-based approach for the MOT task, and provides code to facilitate further research. https://github.com/chongweiliu/PuTR

Via

Access Paper or Ask Questions

NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation

May 22, 2024

Qiujie Dong, Huibiao Wen, Rui Xu, Xiaokang Yu, Jiaran Zhou, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wenping Wang

Figure 1 for NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation

Figure 2 for NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation

Figure 3 for NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation

Figure 4 for NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation

Abstract:Quadrilateral mesh generation plays a crucial role in numerical simulations within Computer-Aided Design and Engineering (CAD/E). The quality of the cross field is essential for generating a quadrilateral mesh. In this paper, we propose a self-supervised neural representation of the cross field, named NeurCross, comprising two modules: one to fit the signed distance function (SDF) and another to predict the cross field. Unlike most existing approaches that operate directly on the given polygonal surface, NeurCross takes the SDF as a bridge to allow for SDF overfitting and the prediction of the cross field to proceed simultaneously. By utilizing a neural SDF, we achieve a smooth representation of the base surface, minimizing the impact of piecewise planar discretization and minor surface variations. Moreover, the principal curvatures and directions are fully encoded by the Hessian of the SDF, enabling the regularization of the overall cross field through minor adjustments to the SDF. Compared to state-of-the-art methods, NeurCross significantly improves the placement of singular points and the approximation accuracy between the input triangular surface and the output quad mesh, as demonstrated in the teaser figure.

Via

Access Paper or Ask Questions

AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field

May 22, 2024

Rong Liu, Rui Xu, Yue Hu, Meida Chen, Andrew Feng

Abstract:3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (https://rongliu-leo.github.io/AtomGS/).

Via

Access Paper or Ask Questions

ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

May 22, 2024

Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang

Figure 1 for ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Figure 2 for ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Figure 3 for ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Figure 4 for ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Abstract:In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models, causing degraded test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses insynchronized time steps for different dimensions and attributes, thus allowing for varying degrees of control over them.

Via

Access Paper or Ask Questions

A Survey on Vision Mamba: Models, Applications and Challenges

Apr 29, 2024

Rui Xu, Shu Yang, Yihui Wang, Bo Du, Hao Chen

Abstract:Mamba, a recent selective structured state space model, performs excellently on long sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional neural networks and offers advanced modeling capabilities similar to those of Transformers, through global receptive fields and dynamic weighting. Crucially, it achieves this without incurring the quadratic computational complexity typically associated with Transformers. Due to its advantages over the former two mainstream foundation models, Mamba exhibits great potential to be a visual foundation model. Researchers are actively applying Mamba to various computer vision tasks, leading to numerous emerging works. To help keep pace with the rapid advancements in computer vision, this paper aims to provide a comprehensive review of visual Mamba approaches. This paper begins by delineating the formulation of the original Mamba model. Subsequently, our review of visual Mamba delves into several representative backbone networks to elucidate the core insights of the visual Mamba. We then categorize related works using different modalities, including image, video, point cloud, multi-modal, and others. Specifically, for image applications, we further organize them into distinct tasks to facilitate a more structured discussion. Finally, we discuss the challenges and future research directions for visual Mamba, providing insights for future research in this quickly evolving area. A comprehensive list of visual Mamba models reviewed in this work is available at https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models.

Via

Access Paper or Ask Questions

From Persona to Personalization: A Survey on Role-Playing Language Agents

Apr 28, 2024

Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu(+8 more)

Figure 1 for From Persona to Personalization: A Survey on Role-Playing Language Agents

Figure 2 for From Persona to Personalization: A Survey on Role-Playing Language Agents

Figure 3 for From Persona to Personalization: A Survey on Role-Playing Language Agents

Figure 4 for From Persona to Personalization: A Survey on Role-Playing Language Agents

Abstract:Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playing performance. RPLAs can mimic a wide range of personas, ranging from historical figures and fictional characters to real-life individuals. Consequently, they have catalyzed numerous AI applications, such as emotional companions, interactive video games, personalized assistants and copilots, and digital clones. In this paper, we conduct a comprehensive survey of this field, illustrating the evolution and recent progress in RPLAs integrating with cutting-edge LLM technologies. We categorize personas into three types: 1) Demographic Persona, which leverages statistical stereotypes; 2) Character Persona, focused on well-established figures; and 3) Individualized Persona, customized through ongoing user interactions for personalized services. We begin by presenting a comprehensive overview of current methodologies for RPLAs, followed by the details for each persona type, covering corresponding data sourcing, agent construction, and evaluation. Afterward, we discuss the fundamental risks, existing limitations, and future prospects of RPLAs. Additionally, we provide a brief review of RPLAs in AI applications, which reflects practical user demands that shape and drive RPLA research. Through this work, we aim to establish a clear taxonomy of RPLA research and applications, and facilitate future research in this critical and ever-evolving field, and pave the way for a future where humans and RPLAs coexist in harmony.

* Preprint

Via

Access Paper or Ask Questions