Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yun Li

SAM-Swin: SAM-Driven Dual-Swin Transformers with Adaptive Lesion Enhancement for Laryngo-Pharyngeal Tumor Detection

Oct 29, 2024

Jia Wei, Yun Li, Xiaomao Fan, Wenjun Ma, Meiyu Qiu, Hongyu Chen, Wenbin Lei

Abstract:Laryngo-pharyngeal cancer (LPC) is a highly lethal malignancy in the head and neck region. Recent advancements in tumor detection, particularly through dual-branch network architectures, have significantly improved diagnostic accuracy by integrating global and local feature extraction. However, challenges remain in accurately localizing lesions and fully capitalizing on the complementary nature of features within these branches. To address these issues, we propose SAM-Swin, an innovative SAM-driven Dual-Swin Transformer for laryngo-pharyngeal tumor detection. This model leverages the robust segmentation capabilities of the Segment Anything Model 2 (SAM2) to achieve precise lesion segmentation. Meanwhile, we present a multi-scale lesion-aware enhancement module (MS-LAEM) designed to adaptively enhance the learning of nuanced complementary features across various scales, improving the quality of feature extraction and representation. Furthermore, we implement a multi-scale class-aware guidance (CAG) loss that delivers multi-scale targeted supervision, thereby enhancing the model's capacity to extract class-specific features. To validate our approach, we compiled three LPC datasets from the First Affiliated Hospital (FAHSYSU), the Sixth Affiliated Hospital (SAHSYSU) of Sun Yat-sen University, and Nanfang Hospital of Southern Medical University (NHSMU). The FAHSYSU dataset is utilized for internal training, while the SAHSYSU and NHSMU datasets serve for external evaluation. Extensive experiments demonstrate that SAM-Swin outperforms state-of-the-art methods, showcasing its potential for advancing LPC detection and improving patient outcomes. The source code of SAM-Swin is available at the URL of \href{https://github.com/VVJia/SAM-Swin}{https://github.com/VVJia/SAM-Swin}.

Via

Access Paper or Ask Questions

Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

Oct 15, 2024

Jinyu Liu, Hongye Guo, Yun Li, Qinghu Tang, Fuquan Huang, Tunan Chen, Haiwang Zhong, Qixin Chen

Figure 1 for Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

Figure 2 for Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

Figure 3 for Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

Figure 4 for Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

Abstract:Over the past decade, bidding in power markets has attracted widespread attention. Reinforcement Learning (RL) has been widely used for power market bidding as a powerful AI tool to make decisions under real-world uncertainties. However, current RL methods mostly employ low dimensional bids, which significantly diverge from the N price-power pairs commonly used in the current power markets. The N-pair bidding format is denoted as High Dimensional Bids (HDBs), which has not been fully integrated into the existing RL-based bidding methods. The loss of flexibility in current RL bidding methods could greatly limit the bidding profits and make it difficult to tackle the rising uncertainties brought by renewable energy generations. In this paper, we intend to propose a framework to fully utilize HDBs for RL-based bidding methods. First, we employ a special type of neural network called Neural Network Supply Functions (NNSFs) to generate HDBs in the form of N price-power pairs. Second, we embed the NNSF into a Markov Decision Process (MDP) to make it compatible with most existing RL methods. Finally, experiments on Energy Storage Systems (ESSs) in the PJM Real-Time (RT) power market show that the proposed bidding method with HDBs can significantly improve bidding flexibility, thereby improving the profit of the state-of-the-art RL bidding methods.

Via

Access Paper or Ask Questions

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Oct 09, 2024

Qi Guo, Zhen Tian, Minghao Yao, Yong Qi, Saiyu Qi, Yun Li, Jin Song Dong

Figure 1 for Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Figure 2 for Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Figure 3 for Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Figure 4 for Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Abstract:Federated Unlearning (FU) enables clients to selectively remove the influence of specific data from a trained federated learning model, addressing privacy concerns and regulatory requirements. However, existing FU methods often struggle to balance effective erasure with model utility preservation, especially for class-level unlearning in non-IID settings. We propose Federated Unlearning via Class-aware Representation Transformation (FUCRT), a novel method that achieves unlearning through class-aware representation transformation. FUCRT employs two key components: (1) a transformation class selection strategy to identify optimal forgetting directions, and (2) a transformation alignment technique using dual class-aware contrastive learning to ensure consistent transformations across clients. Extensive experiments on four datasets demonstrate FUCRT's superior performance in terms of erasure guarantee, model utility preservation, and efficiency. FUCRT achieves complete (100\%) erasure of unlearning classes while maintaining or improving performance on remaining classes, outperforming state-of-the-art baselines across both IID and Non-IID settings. Analysis of the representation space reveals FUCRT's ability to effectively merge unlearning class representations with the transformation class from remaining classes, closely mimicking the model retrained from scratch.

Via

Access Paper or Ask Questions

3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos

Sep 02, 2024

Meiyu Qiu, Yun Li, Wenjun Huang, Haoyun Zhang, Weiping Zheng, Wenbin Lei, Xiaomao Fan

Abstract:Laryngeal cancer is a malignant disease with a high morality rate in otorhinolaryngology, posing an significant threat to human health. Traditionally larygologists manually visual-inspect laryngeal cancer in laryngoscopic videos, which is quite time-consuming and subjective. In this study, we propose a novel automatic framework via 3D-large-scale pretrained models termed 3D-LSPTM for laryngeal cancer detection. Firstly, we collect 1,109 laryngoscopic videos from the First Affiliated Hospital Sun Yat-sen University with the approval of the Ethics Committee. Then we utilize the 3D-large-scale pretrained models of C3D, TimeSformer, and Video-Swin-Transformer, with the merit of advanced featuring videos, for laryngeal cancer detection with fine-tuning techniques. Extensive experiments show that our proposed 3D-LSPTM can achieve promising performance on the task of laryngeal cancer detection. Particularly, 3D-LSPTM with the backbone of Video-Swin-Transformer can achieve 92.4% accuracy, 95.6% sensitivity, 94.1% precision, and 94.8% F_1.

Via

Access Paper or Ask Questions

SAM-FNet: SAM-Guided Fusion Network for Laryngo-Pharyngeal Tumor Detection

Aug 15, 2024

Jia Wei, Yun Li, Meiyu Qiu, Hongyu Chen, Xiaomao Fan, Wenbin Lei

Abstract:Laryngo-pharyngeal cancer (LPC) is a highly fatal malignant disease affecting the head and neck region. Previous studies on endoscopic tumor detection, particularly those leveraging dual-branch network architectures, have shown significant advancements in tumor detection. These studies highlight the potential of dual-branch networks in improving diagnostic accuracy by effectively integrating global and local (lesion) feature extraction. However, they are still limited in their capabilities to accurately locate the lesion region and capture the discriminative feature information between the global and local branches. To address these issues, we propose a novel SAM-guided fusion network (SAM-FNet), a dual-branch network for laryngo-pharyngeal tumor detection. By leveraging the powerful object segmentation capabilities of the Segment Anything Model (SAM), we introduce the SAM into the SAM-FNet to accurately segment the lesion region. Furthermore, we propose a GAN-like feature optimization (GFO) module to capture the discriminative features between the global and local branches, enhancing the fusion feature complementarity. Additionally, we collect two LPC datasets from the First Affiliated Hospital (FAHSYSU) and the Sixth Affiliated Hospital (SAHSYSU) of Sun Yat-sen University. The FAHSYSU dataset is used as the internal dataset for training the model, while the SAHSYSU dataset is used as the external dataset for evaluating the model's performance. Extensive experiments on both datasets of FAHSYSU and SAHSYSU demonstrate that the SAM-FNet can achieve competitive results, outperforming the state-of-the-art counterparts. The source code of SAM-FNet is available at the URL of https://github.com/VVJia/SAM-FNet.

Via

Access Paper or Ask Questions

Parallel Speculative Decoding with Adaptive Draft Length

Aug 13, 2024

Tianyu Liu, Yun Li, Qitan Lv, Kai Liu, Jianchen Zhu, Winston Hu

Figure 1 for Parallel Speculative Decoding with Adaptive Draft Length

Figure 2 for Parallel Speculative Decoding with Adaptive Draft Length

Figure 3 for Parallel Speculative Decoding with Adaptive Draft Length

Figure 4 for Parallel Speculative Decoding with Adaptive Draft Length

Abstract:Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice versa. This problem is directly incurred by the asynchronous execution of the draft model and the target model, and is exacerbated due to the fixed draft length in speculative decoding. To address these challenges, we propose a conceptually simple, flexible, and general framework to boost speculative decoding, namely \textbf{P}arallel sp\textbf{E}culative decoding with \textbf{A}daptive d\textbf{R}aft \textbf{L}ength (PEARL). Specifically, PEARL proposes \textit{pre-verify} to verify the first draft token in advance during the drafting phase, and \textit{post-verify} to generate more draft tokens during the verification phase. PEARL parallels the drafting phase and the verification phase via applying the two strategies, and achieves adaptive draft length for different scenarios, which effectively alleviates the mutual waiting problem. Moreover, we theoretically demonstrate that the mean accepted tokens of PEARL is more than existing \textit{draft-then-verify} works. Experiments on various text generation benchmarks demonstrate the effectiveness of our \name, leading to a superior speedup performance up to \textbf{3.79$\times$} and \textbf{1.52$\times$}, compared to auto-regressive decoding and vanilla speculative decoding, respectively.

Via

Access Paper or Ask Questions

Large Language Models for Human-like Autonomous Driving: A Survey

Jul 27, 2024

Yun Li, Kai Katsumata, Ehsan Javanmardi, Manabu Tsukada

Figure 1 for Large Language Models for Human-like Autonomous Driving: A Survey

Figure 2 for Large Language Models for Human-like Autonomous Driving: A Survey

Figure 3 for Large Language Models for Human-like Autonomous Driving: A Survey

Figure 4 for Large Language Models for Human-like Autonomous Driving: A Survey

Abstract:Large Language Models (LLMs), AI models trained on massive text corpora with remarkable language understanding and generation capabilities, are transforming the field of Autonomous Driving (AD). As AD systems evolve from rule-based and optimization-based methods to learning-based techniques like deep reinforcement learning, they are now poised to embrace a third and more advanced category: knowledge-based AD empowered by LLMs. This shift promises to bring AD closer to human-like AD. However, integrating LLMs into AD systems poses challenges in real-time inference, safety assurance, and deployment costs. This survey provides a comprehensive and critical review of recent progress in leveraging LLMs for AD, focusing on their applications in modular AD pipelines and end-to-end AD systems. We highlight key advancements, identify pressing challenges, and propose promising research directions to bridge the gap between LLMs and AD, thereby facilitating the development of more human-like AD systems. The survey first introduces LLMs' key features and common training schemes, then delves into their applications in modular AD pipelines and end-to-end AD, respectively, followed by discussions on open challenges and future directions. Through this in-depth analysis, we aim to provide insights and inspiration for researchers and practitioners working at the intersection of AI and autonomous vehicles, ultimately contributing to safer, smarter, and more human-centric AD technologies.

* 8 pages, 2 figures, accepted at IEEE Intelligent Transportation Systems Conference (ITSC) 2024

Via

Access Paper or Ask Questions

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Jul 06, 2024

Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li, Gang Li, Linjun Zhang, Huaxiu Yao

Figure 1 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Figure 2 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Figure 3 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Figure 4 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Abstract:The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RULE on three medical VQA datasets, achieving an average improvement of 20.8% in factual accuracy. We publicly release our benchmark and code in https://github.com/richard-peng-xia/RULE.

Via

Access Paper or Ask Questions

Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey

Jul 05, 2024

Han Wang, Yuman Nie, Yun Li, Hongjie Liu, Min Liu, Wen Cheng, Yaoxiong Wang

Figure 1 for Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey

Figure 2 for Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey

Figure 3 for Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey

Figure 4 for Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey

Abstract:Event-based cameras, inspired by the biological retina, have evolved into cutting-edge sensors distinguished by their minimal power requirements, negligible latency, superior temporal resolution, and expansive dynamic range. At present, cameras used for pedestrian detection are mainly frame-based imaging sensors, which have suffered from lethargic response times and hefty data redundancy. In contrast, event-based cameras address these limitations by eschewing extraneous data transmissions and obviating motion blur in high-speed imaging scenarios. On pedestrian detection via event-based cameras, this paper offers an exhaustive review of research and applications particularly in the autonomous driving context. Through methodically scrutinizing relevant literature, the paper outlines the foundational principles, developmental trajectory, and the comparative merits and demerits of eventbased detection relative to traditional frame-based methodologies. This review conducts thorough analyses of various event stream inputs and their corresponding network models to evaluate their applicability across diverse operational environments. It also delves into pivotal elements such as crucial datasets and data acquisition techniques essential for advancing this technology, as well as advanced algorithms for processing event stream data. Culminating with a synthesis of the extant landscape, the review accentuates the unique advantages and persistent challenges inherent in event-based pedestrian detection, offering a prognostic view on potential future developments in this fast-progressing field.

Via

Access Paper or Ask Questions

PolygonGNN: Representation Learning for Polygonal Geometries with Heterogeneous Visibility Graph

Jun 30, 2024

Dazhou Yu, Yuntong Hu, Yun Li, Liang Zhao

Figure 1 for PolygonGNN: Representation Learning for Polygonal Geometries with Heterogeneous Visibility Graph

Figure 2 for PolygonGNN: Representation Learning for Polygonal Geometries with Heterogeneous Visibility Graph

Figure 3 for PolygonGNN: Representation Learning for Polygonal Geometries with Heterogeneous Visibility Graph

Figure 4 for PolygonGNN: Representation Learning for Polygonal Geometries with Heterogeneous Visibility Graph

Abstract:Polygon representation learning is essential for diverse applications, encompassing tasks such as shape coding, building pattern classification, and geographic question answering. While recent years have seen considerable advancements in this field, much of the focus has been on single polygons, overlooking the intricate inner- and inter-polygonal relationships inherent in multipolygons. To address this gap, our study introduces a comprehensive framework specifically designed for learning representations of polygonal geometries, particularly multipolygons. Central to our approach is the incorporation of a heterogeneous visibility graph, which seamlessly integrates both inner- and inter-polygonal relationships. To enhance computational efficiency and minimize graph redundancy, we implement a heterogeneous spanning tree sampling method. Additionally, we devise a rotation-translation invariant geometric representation, ensuring broader applicability across diverse scenarios. Finally, we introduce Multipolygon-GNN, a novel model tailored to leverage the spatial and semantic heterogeneity inherent in the visibility graph. Experiments on five real-world and synthetic datasets demonstrate its ability to capture informative representations for polygonal geometries.

Via

Access Paper or Ask Questions