Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yong Liu

Renmin University of China

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Oct 29, 2024

Yang Zhou, Tan Li Hui Faith, Yanyu Xu, Sicong Leng, Xinxing Xu, Yong Liu, Rick Siow Mong Goh

Figure 1 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Figure 2 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Figure 3 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Figure 4 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Abstract:Medical Vision-Language Pretraining (MedVLP) shows promise in learning generalizable and transferable visual representations from paired and unpaired medical images and reports. MedVLP can provide useful features to downstream tasks and facilitate adapting task-specific models to new setups using fewer examples. However, existing MedVLP methods often differ in terms of datasets, preprocessing, and finetuning implementations. This pose great challenges in evaluating how well a MedVLP method generalizes to various clinically-relevant tasks due to the lack of unified, standardized, and comprehensive benchmark. To fill this gap, we propose BenchX, a unified benchmark framework that enables head-to-head comparison and systematical analysis between MedVLP methods using public chest X-ray datasets. Specifically, BenchX is composed of three components: 1) Comprehensive datasets covering nine datasets and four medical tasks; 2) Benchmark suites to standardize data preprocessing, train-test splits, and parameter selection; 3) Unified finetuning protocols that accommodate heterogeneous MedVLP methods for consistent task adaptation in classification, segmentation, and report generation, respectively. Utilizing BenchX, we establish baselines for nine state-of-the-art MedVLP methods and found that the performance of some early MedVLP methods can be enhanced to surpass more recent ones, prompting a revisiting of the developments and conclusions from prior works in MedVLP. Our code are available at https://github.com/yangzhou12/BenchX.

* Accepted to NeurIPS24 Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

Enhancing CTR Prediction in Recommendation Domain with Search Query Representation

Oct 28, 2024

Yuening Wang, Man Chen, Yaochen Hu, Wei Guo, Yingxue Zhang, Huifeng Guo, Yong Liu, Mark Coates

Figure 1 for Enhancing CTR Prediction in Recommendation Domain with Search Query Representation

Figure 2 for Enhancing CTR Prediction in Recommendation Domain with Search Query Representation

Figure 3 for Enhancing CTR Prediction in Recommendation Domain with Search Query Representation

Figure 4 for Enhancing CTR Prediction in Recommendation Domain with Search Query Representation

Abstract:Many platforms, such as e-commerce websites, offer both search and recommendation services simultaneously to better meet users' diverse needs. Recommendation services suggest items based on user preferences, while search services allow users to search for items before providing recommendations. Since users and items are often shared between the search and recommendation domains, there is a valuable opportunity to enhance the recommendation domain by leveraging user preferences extracted from the search domain. Existing approaches either overlook the shift in user intention between these domains or fail to capture the significant impact of learning from users' search queries on understanding their interests. In this paper, we propose a framework that learns from user search query embeddings within the context of user preferences in the recommendation domain. Specifically, user search query sequences from the search domain are used to predict the items users will click at the next time point in the recommendation domain. Additionally, the relationship between queries and items is explored through contrastive learning. To address issues of data sparsity, the diffusion model is incorporated to infer positive items the user will select after searching with certain queries in a denoising manner, which is particularly effective in preventing false positives. Effectively extracting this information, the queries are integrated into click-through rate prediction in the recommendation domain. Experimental analysis demonstrates that our model outperforms state-of-the-art models in the recommendation domain.

* CIKM (2024) 2462-2471
* Accepted by CIKM 2024 Full Research Track

Via

Access Paper or Ask Questions

Enhancing Community Vision Screening -- AI Driven Retinal Photography for Early Disease Detection and Patient Trust

Oct 27, 2024

Xiaofeng Lei, Yih-Chung Tham, Jocelyn Hui Lin Goh, Yangqin Feng, Yang Bai, Zhi Da Soh, Rick Siow Mong Goh, Xinxing Xu, Yong Liu, Ching-Yu Cheng

Abstract:Community vision screening plays a crucial role in identifying individuals with vision loss and preventing avoidable blindness, particularly in rural communities where access to eye care services is limited. Currently, there is a pressing need for a simple and efficient process to screen and refer individuals with significant eye disease-related vision loss to tertiary eye care centers for further care. An ideal solution should seamlessly and readily integrate with existing workflows, providing comprehensive initial screening results to service providers, thereby enabling precise patient referrals for timely treatment. This paper introduces the Enhancing Community Vision Screening (ECVS) solution, which addresses the aforementioned concerns with a novel and feasible solution based on simple, non-invasive retinal photography for the detection of pathology-based visual impairment. Our study employs four distinct deep learning models: RETinal photo Quality Assessment (RETQA), Pathology Visual Impairment detection (PVI), Eye Disease Diagnosis (EDD) and Visualization of Lesion Regions of the eye (VLR). We conducted experiments on over 10 datasets, totaling more than 80,000 fundus photos collected from various sources. The models integrated into ECVS achieved impressive AUC scores of 0.98 for RETQA, 0.95 for PVI, and 0.90 for EDD, along with a DICE coefficient of 0.48 for VLR. These results underscore the promising capabilities of ECVS as a straightforward and scalable method for community-based vision screening.

* 11 pages, 4 figures, published in MICCAI2024 OMIA XI workshop

Via

Access Paper or Ask Questions

3D Distance-color-coded Assessment of PCI Stent Apposition via Deep-learning-based Three-dimensional Multi-object Segmentation

Oct 26, 2024

Xiaoyang Qin, Hao Huang, Shuaichen Lin, Xinhao Zeng, Kaizhi Cao, Renxiong Wu, Yuming Huang, Junqing Yang, Yong Liu, Gang Li(+1 more)

Abstract:Coronary artery disease poses a significant global health challenge, often necessitating percutaneous coronary intervention (PCI) with stent implantation. Assessing stent apposition holds pivotal importance in averting and identifying PCI complications that lead to in-stent restenosis. Here we proposed a novel three-dimensional (3D) distance-color-coded assessment (DccA)for PCI stent apposition via deep-learning-based 3D multi-object segmentation in intravascular optical coherence tomography (IV-OCT). Our proposed 3D DccA accurately segments 3D vessel lumens and stents in IV-OCT images, using a spatial matching network and dual-layer training with style transfer. It quantifies and maps stent-lumen distances into a 3D color space, facilitating 3D visual assessment of PCI stent apposition. Achieving over 95% segmentation precision, our proposed DccA enhances clinical evaluation of PCI stent deployment and supports personalized treatment planning.

Via

Access Paper or Ask Questions

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Oct 24, 2024

Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong Liu, Feng Tian, Guang Dai, Jingdong Wang, Qianying Wang

Figure 1 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Figure 2 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Figure 3 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Figure 4 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Abstract:Text-guided diffusion models have significantly advanced image editing, enabling high-quality and diverse modifications driven by text prompts. However, effective editing requires inverting the source image into a latent space, a process often hindered by prediction errors inherent in DDIM inversion. These errors accumulate during the diffusion process, resulting in inferior content preservation and edit fidelity, especially with conditional inputs. We address these challenges by investigating the primary contributors to error accumulation in DDIM inversion and identify the singularity problem in traditional noise schedules as a key issue. To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing. This schedule reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image. Our approach requires no additional retraining and is compatible with various existing editing methods. Experiments across eight editing tasks demonstrate the Logistic Schedule's superior performance in content preservation and edit fidelity compared to traditional noise schedules, highlighting its adaptability and effectiveness.

* Accepted in NeurIPS 2024

Via

Access Paper or Ask Questions

DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models

Oct 22, 2024

Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao

Figure 1 for DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models

Figure 2 for DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models

Figure 3 for DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models

Figure 4 for DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models

Abstract:Ensuring awareness of fairness and privacy in Large Language Models (LLMs) is critical. Interestingly, we discover a counter-intuitive trade-off phenomenon that enhancing an LLM's privacy awareness through Supervised Fine-Tuning (SFT) methods significantly decreases its fairness awareness with thousands of samples. To address this issue, inspired by the information theory, we introduce a training-free method to \textbf{DEA}ctivate the fairness and privacy coupled \textbf{N}eurons (\textbf{DEAN}), which theoretically and empirically decrease the mutual information between fairness and privacy awareness. Extensive experimental results demonstrate that DEAN eliminates the trade-off phenomenon and significantly improves LLMs' fairness and privacy awareness simultaneously, \eg improving Qwen-2-7B-Instruct's fairness awareness by 12.2\% and privacy awareness by 14.0\%. More crucially, DEAN remains robust and effective with limited annotated data or even when only malicious fine-tuning data is available, whereas SFT methods may fail to perform properly in such scenarios. We hope this study provides valuable insights into concurrently addressing fairness and privacy concerns in LLMs and can be integrated into comprehensive frameworks to develop more ethical and responsible AI systems. Our code is available at \url{https://github.com/ChnQ/DEAN}.

Via

Access Paper or Ask Questions

Hierarchical Search-Based Cooperative Motion Planning

Oct 21, 2024

Yuchen Wu, Yifan Yang, Gang Xu, Junjie Cao, Yansong Chen, Licheng Wen, Yong Liu

Figure 1 for Hierarchical Search-Based Cooperative Motion Planning

Figure 2 for Hierarchical Search-Based Cooperative Motion Planning

Figure 3 for Hierarchical Search-Based Cooperative Motion Planning

Figure 4 for Hierarchical Search-Based Cooperative Motion Planning

Abstract:Cooperative path planning, a crucial aspect of multi-agent systems research, serves a variety of sectors, including military, agriculture, and industry. Many existing algorithms, however, come with certain limitations, such as simplified kinematic models and inadequate support for multiple group scenarios. Focusing on the planning problem associated with a nonholonomic Ackermann model for Unmanned Ground Vehicles (UGV), we propose a leaderless, hierarchical Search-Based Cooperative Motion Planning (SCMP) method. The high-level utilizes a binary conflict search tree to minimize runtime, while the low-level fabricates kinematically feasible, collision-free paths that are shape-constrained. Our algorithm can adapt to scenarios featuring multiple groups with different shapes, outlier agents, and elaborate obstacles. We conduct algorithm comparisons, performance testing, simulation, and real-world testing, verifying the effectiveness and applicability of our algorithm. The implementation of our method will be open-sourced at https://github.com/WYCUniverStar/SCMP.

Via

Access Paper or Ask Questions

REEF: Representation Encoding Fingerprints for Large Language Models

Oct 18, 2024

Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao

Figure 1 for REEF: Representation Encoding Fingerprints for Large Language Models

Figure 2 for REEF: Representation Encoding Fingerprints for Large Language Models

Figure 3 for REEF: Representation Encoding Fingerprints for Large Language Models

Figure 4 for REEF: Representation Encoding Fingerprints for Large Language Models

Abstract:Protecting the intellectual property of open-source Large Language Models (LLMs) is very important, because training LLMs costs extensive computational resources and data. Therefore, model owners and third parties need to identify whether a suspect model is a subsequent development of the victim model. To this end, we propose a training-free REEF to identify the relationship between the suspect and victim models from the perspective of LLMs' feature representations. Specifically, REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model on the same samples. This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations. In this way, REEF provides a simple and effective way for third parties and models' owners to protect LLMs' intellectual property together. The code is available at https://github.com/tmylla/REEF.

Via

Access Paper or Ask Questions

Stability and Sharper Risk Bounds with Convergence Rate $O(1/n^2)$

Oct 13, 2024

Bowei Zhu, Shaojie Li, Yong Liu

Abstract:The sharpest known high probability excess risk bounds are up to $O\left( 1/n \right)$ for empirical risk minimization and projected gradient descent via algorithmic stability (Klochkov \& Zhivotovskiy, 2021). In this paper, we show that high probability excess risk bounds of order up to $O\left( 1/n^2 \right)$ are possible. We discuss how high probability excess risk bounds reach $O\left( 1/n^2 \right)$ under strongly convexity, smoothness and Lipschitz continuity assumptions for empirical risk minimization, projected gradient descent and stochastic gradient descent. Besides, to the best of our knowledge, our high probability results on the generalization gap measured by gradients for nonconvex problems are also the sharpest.

Via

Access Paper or Ask Questions

MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Oct 12, 2024

Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, Feng Zheng

Figure 1 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Figure 2 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Figure 3 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Figure 4 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Abstract:In the field of industrial inspection, Multimodal Large Language Models (MLLMs) have a high potential to renew the paradigms in practical applications due to their robust language capabilities and generalization abilities. However, despite their impressive problem-solving skills in many domains, MLLMs' ability in industrial anomaly detection has not been systematically studied. To bridge this gap, we present MMAD, the first-ever full-spectrum MLLMs benchmark in industrial Anomaly Detection. We defined seven key subtasks of MLLMs in industrial inspection and designed a novel pipeline to generate the MMAD dataset with 39,672 questions for 8,366 industrial images. With MMAD, we have conducted a comprehensive, quantitative evaluation of various state-of-the-art MLLMs. The commercial models performed the best, with the average accuracy of GPT-4o models reaching 74.9%. However, this result falls far short of industrial requirements. Our analysis reveals that current MLLMs still have significant room for improvement in answering questions related to industrial anomalies and defects. We further explore two training-free performance enhancement strategies to help models improve in industrial scenarios, highlighting their promising potential for future research.

* The code and data are available at https://github.com/jam-cc/MMAD

Via

Access Paper or Ask Questions