Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuxuan Sun

C-MASS: Combinatorial Mobility-Aware Sensor Scheduling for Collaborative Perception with Second-Order Topology Approximation

Jun 29, 2024

Yukuan Jia, Yuxuan Sun, Ruiqing Mao, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

Figure 1 for C-MASS: Combinatorial Mobility-Aware Sensor Scheduling for Collaborative Perception with Second-Order Topology Approximation

Figure 2 for C-MASS: Combinatorial Mobility-Aware Sensor Scheduling for Collaborative Perception with Second-Order Topology Approximation

Figure 3 for C-MASS: Combinatorial Mobility-Aware Sensor Scheduling for Collaborative Perception with Second-Order Topology Approximation

Figure 4 for C-MASS: Combinatorial Mobility-Aware Sensor Scheduling for Collaborative Perception with Second-Order Topology Approximation

Abstract:Collaborative Perception (CP) has been a promising solution to address occlusions in the traffic environment by sharing sensor data among collaborative vehicles (CoV) via vehicle-to-everything (V2X) network. With limited wireless bandwidth, CP necessitates task-oriented and receiver-aware sensor scheduling to prioritize important and complementary sensor data. However, due to vehicular mobility, it is challenging and costly to obtain the up-to-date perception topology, i.e., whether a combination of CoVs can jointly detect an object. In this paper, we propose a combinatorial mobility-aware sensor scheduling (C-MASS) framework for CP with minimal communication overhead. Specifically, detections are replayed with sensor data from individual CoVs and pairs of CoVs to maintain an empirical perception topology up to the second order, which approximately represents the complete perception topology. A hybrid greedy algorithm is then proposed to solve a variant of the budgeted maximum coverage problem with a worst-case performance guarantee. The C-MASS scheduling algorithm adapts the greedy algorithm by incorporating the topological uncertainty and the unexplored time of CoVs to balance exploration and exploitation, addressing the mobility challenge. Extensive numerical experiments demonstrate the near-optimality of the proposed C-MASS framework in both edge-assisted and distributed CP configurations. The weighted recall improvements over object-level CP are 5.8% and 4.2%, respectively. Compared to distance-based and area-based greedy heuristics, the gaps to the offline optimal solutions are reduced by up to 75% and 71%, respectively.

* 14 pages, 10 figures

Via

Access Paper or Ask Questions

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Jun 28, 2024

Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai Zhang, Jingxiong Li, Xingheng Lyu, Tao Lin, Lin Yang

Figure 1 for PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Figure 2 for PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Figure 3 for PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Figure 4 for PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Abstract:Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology image-text pairs from platforms like PubMed, YouTube, and Twitter, which provide limited, unscalable data with generally suboptimal image quality. In this work, we leverage large-scale WSI datasets like TCGA to extract numerous high-quality image patches. We then train a large multimodal model to generate captions for these images, creating PathGen-1.6M, a dataset containing 1.6 million high-quality image-caption pairs. Our approach involves multiple agent models collaborating to extract representative WSI patches, generating and refining captions to obtain high-quality image-text pairs. Extensive experiments show that integrating these generated pairs with existing datasets to train a pathology-specific CLIP model, PathGen-CLIP, significantly enhances its ability to analyze pathological images, with substantial improvements across nine pathology-related zero-shot image classification tasks and three whole-slide image tasks. Furthermore, we construct 200K instruction-tuning data based on PathGen-1.6M and integrate PathGen-CLIP with the Vicuna LLM to create more powerful multimodal models through instruction tuning. Overall, we provide a scalable pathway for high-quality data generation in pathology, paving the way for next-generation general pathology models.

* 13 pages, 3 figures

Via

Access Paper or Ask Questions

Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

Jun 25, 2024

Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

Figure 1 for Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

Figure 2 for Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

Figure 3 for Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

Figure 4 for Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

Abstract:Leveraging the computing and sensing capabilities of vehicles, vehicular federated learning (VFL) has been applied to edge training for connected vehicles. The dynamic and interconnected nature of vehicular networks presents unique opportunities to harness direct vehicle-to-vehicle (V2V) communications, enhancing VFL training efficiency. In this paper, we formulate a stochastic optimization problem to optimize the VFL training performance, considering the energy constraints and mobility of vehicles, and propose a V2V-enhanced dynamic scheduling (VEDS) algorithm to solve it. The model aggregation requirements of VFL and the limited transmission time due to mobility result in a stepwise objective function, which presents challenges in solving the problem. We thus propose a derivative-based drift-plus-penalty method to convert the long-term stochastic optimization problem to an online mixed integer nonlinear programming (MINLP) problem, and provide a theoretical analysis to bound the performance gap between the online solution and the offline optimal solution. Further analysis of the scheduling priority reduces the original problem into a set of convex optimization problems, which are efficiently solved using the interior-point method. Experimental results demonstrate that compared with the state-of-the-art benchmarks, the proposed algorithm enhances the image classification accuracy on the CIFAR-10 dataset by 3.18% and reduces the average displacement errors on the Argoverse trajectory prediction dataset by 10.21%.

* Submitted to IEEE for possible publication

Via

Access Paper or Ask Questions

Task-Oriented Wireless Communications for Collaborative Perception in Intelligent Unmanned Systems

Jun 05, 2024

Sheng Zhou, Yukuan Jia, Ruiqing Mao, Zhaojun Nan, Yuxuan Sun, Zhisheng Niu

Figure 1 for Task-Oriented Wireless Communications for Collaborative Perception in Intelligent Unmanned Systems

Figure 2 for Task-Oriented Wireless Communications for Collaborative Perception in Intelligent Unmanned Systems

Figure 3 for Task-Oriented Wireless Communications for Collaborative Perception in Intelligent Unmanned Systems

Figure 4 for Task-Oriented Wireless Communications for Collaborative Perception in Intelligent Unmanned Systems

Abstract:Collaborative Perception (CP) has shown great potential to achieve more holistic and reliable environmental perception in intelligent unmanned systems (IUSs). However, implementing CP still faces key challenges due to the characteristics of the CP task and the dynamics of wireless channels. In this article, a task-oriented wireless communication framework is proposed to jointly optimize the communication scheme and the CP procedure. We first propose channel-adaptive compression and robust fusion approaches to extract and exploit the most valuable semantic information under wireless communication constraints. We then propose a task-oriented distributed scheduling algorithm to identify the best collaborators for CP under dynamic environments. The main idea is learning while scheduling, where the collaboration utility is effectively learned with low computation and communication overhead. Case studies are carried out in connected autonomous driving scenarios to verify the proposed framework. Finally, we identify several future research directions.

* Accepted by IEEE Network Magazine

Via

Access Paper or Ask Questions

PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

Feb 05, 2024

Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li(+3 more)

Figure 1 for PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

Figure 2 for PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

Figure 3 for PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

Figure 4 for PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

Abstract:The emergence of large multimodal models has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert-validated pathology benchmark for LMMs. It comprises 33,573 multimodal multi-choice questions and 21,599 images from various sources, and an explanation for the correct answer accompanies each question. The construction of PathMMU capitalizes on the robust capabilities of GPT-4V, utilizing approximately 30,000 gathered image-caption pairs to generate Q\&As. Significantly, to maximize PathMMU's authority, we invite six pathologists to scrutinize each question under strict standards in PathMMU's validation and test sets, while simultaneously setting an expert-level performance benchmark for PathMMU. We conduct extensive evaluations, including zero-shot assessments of 14 open-sourced and three closed-sourced LMMs and their robustness to image corruption. We also fine-tune representative LMMs to assess their adaptability to PathMMU. The empirical findings indicate that advanced LMMs struggle with the challenging PathMMU benchmark, with the top-performing LMM, GPT-4V, achieving only a 51.7\% zero-shot performance, significantly lower than the 71.4\% demonstrated by human pathologists. After fine-tuning, even open-sourced LMMs can surpass GPT-4V with a performance of over 60\%, but still fall short of the expertise shown by pathologists. We hope that the PathMMU will offer valuable insights and foster the development of more specialized, next-generation LLMs for pathology.

* make source and method updates before resubmission

Via

Access Paper or Ask Questions

Mobility Accelerates Learning: Convergence Analysis on Hierarchical Federated Learning in Vehicular Networks

Jan 18, 2024

Tan Chen, Jintao Yan, Yuxuan Sun, Sheng Zhou, Deniz Gündüz, Zhisheng Niu

Abstract:Hierarchical federated learning (HFL) enables distributed training of models across multiple devices with the help of several edge servers and a cloud edge server in a privacy-preserving manner. In this paper, we consider HFL with highly mobile devices, mainly targeting at vehicular networks. Through convergence analysis, we show that mobility influences the convergence speed by both fusing the edge data and shuffling the edge models. While mobility is usually considered as a challenge from the perspective of communication, we prove that it increases the convergence speed of HFL with edge-level heterogeneous data, since more diverse data can be incorporated. Furthermore, we demonstrate that a higher speed leads to faster convergence, since it accelerates the fusion of data. Simulation results show that mobility increases the model accuracy of HFL by up to 15.1% when training a convolutional neural network on the CIFAR-10 dataset.

* Submitted to IEEE for possible publication

Via

Access Paper or Ask Questions

Benchmarking PathCLIP for Pathology Image Analysis

Jan 05, 2024

Sunyi Zheng, Xiaonan Cui, Yuxuan Sun, Jingxiong Li, Honglin Li, Yunlong Zhang, Pingyi Chen, Xueping Jing, Zhaoxiang Ye, Lin Yang

Figure 1 for Benchmarking PathCLIP for Pathology Image Analysis

Figure 2 for Benchmarking PathCLIP for Pathology Image Analysis

Figure 3 for Benchmarking PathCLIP for Pathology Image Analysis

Figure 4 for Benchmarking PathCLIP for Pathology Image Analysis

Abstract:Accurate image classification and retrieval are of importance for clinical diagnosis and treatment decision-making. The recent contrastive language-image pretraining (CLIP) model has shown remarkable proficiency in understanding natural images. Drawing inspiration from CLIP, PathCLIP is specifically designed for pathology image analysis, utilizing over 200,000 image and text pairs in training. While the performance the PathCLIP is impressive, its robustness under a wide range of image corruptions remains unknown. Therefore, we conduct an extensive evaluation to analyze the performance of PathCLIP on various corrupted images from the datasets of Osteosarcoma and WSSS4LUAD. In our experiments, we introduce seven corruption types including brightness, contrast, Gaussian blur, resolution, saturation, hue, and markup at four severity levels. Through experiments, we find that PathCLIP is relatively robustness to image corruptions and surpasses OpenAI-CLIP and PLIP in zero-shot classification. Among the seven corruptions, blur and resolution can cause server performance degradation of the PathCLIP. This indicates that ensuring the quality of images is crucial before conducting a clinical test. Additionally, we assess the robustness of PathCLIP in the task of image-image retrieval, revealing that PathCLIP performs less effectively than PLIP on Osteosarcoma but performs better on WSSS4LUAD under diverse corruptions. Overall, PathCLIP presents impressive zero-shot classification and retrieval performance for pathology images, but appropriate care needs to be taken when using it. We hope this study provides a qualitative impression of PathCLIP and helps understand its differences from other CLIP models.

Via

Access Paper or Ask Questions

MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following

Dec 05, 2023

Renze Lou, Kai Zhang, Jian Xie, Yuxuan Sun, Janice Ahn, Hanzi Xu, Yu Su, Wenpeng Yin

Figure 1 for MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following

Figure 2 for MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following

Figure 3 for MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following

Figure 4 for MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following

Abstract:In the realm of large language models (LLMs), enhancing instruction-following capability often involves curating expansive training data. This is achieved through two primary schemes: i) Scaling-Inputs: Amplifying (input, output) pairs per task instruction, aiming for better instruction adherence. ii) Scaling Input-Free Tasks: Enlarging tasks, each composed of an (instruction, output) pair (without requiring a separate input anymore). However, LLMs under Scaling-Inputs tend to be overly sensitive to inputs, leading to misinterpretation or non-compliance with instructions. Conversely, Scaling Input-Free Tasks demands a substantial number of tasks but is less effective in instruction following when dealing with instances in Scaling-Inputs. This work introduces MUFFIN, a new scheme of instruction-following dataset curation. Specifically, we automatically Scale Tasks per Input by diversifying these tasks with various input facets. Experimental results across four zero-shot benchmarks, spanning both Scaling-Inputs and Scaling Input-Free Tasks schemes, reveal that LLMs, at various scales, trained on MUFFIN generally demonstrate superior instruction-following capabilities compared to those trained on the two aforementioned schemes.

* Website: https://renzelou.github.io/Muffin/

Via

Access Paper or Ask Questions

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Nov 27, 2023

Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun(+12 more)

Figure 1 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Figure 2 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Figure 3 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Figure 4 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Abstract:We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. Our evaluation of 14 open-source LMMs and the proprietary GPT-4V(ision) highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V only achieves a 56% accuracy, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence.

* 115 pages, 99 figures

Via

Access Paper or Ask Questions

Unleashing the Power of Prompt-driven Nucleus Instance Segmentation

Nov 27, 2023

Zhongyi Shui, Yunlong Zhang, Kai Yao, Chenglu Zhu, Yuxuan Sun, Lin Yang

Abstract:Nuclear instance segmentation in histology images is crucial for a broad spectrum of clinical applications. Current prevailing nuclear instance segmentation algorithms rely on regression of nuclei contours, distance maps, watershed markers or a proxy nuclear representation of star-convex polygons. Consequently, these methods necessitate sophisticated post-processing operations to distinguish nuclei instances, which are commonly acknowledged to be error-prone and parameter-sensitive. Recently, the segment anything model (SAM) has earned attracted huge attention within the domain of medical image segmentation due to its impressive generalization ability and promptable property. Nevertheless, its potential on nuclear instance segmentation remains largely underexplored. In this paper, we present a novel prompt-driven framework that consists of a point prompter and a SAM for automatic nuclei instance segmentation. Specifically, the prompter learns to generate a unique point prompt for each nucleus while the SAM is fine tuned to output the corresponding mask of the cued nucleus. Furthermore, we propose to add adjacent nuclei as negative prompts to promote the model's ability to recognize overlapping nuclei. Without bells and whistles, our proposed method sets a new state-of-the-art performance on three challenging benchmarks. Our code is available at \textcolor{magenta}{\url{https://github.com/windygoo/PromptNucSeg}} .

Via

Access Paper or Ask Questions