Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Pan

Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

Nov 16, 2024

Shaochen Xu, Yifan Zhou, Zhengliang Liu, Zihao Wu, Tianyang Zhong, Huaqin Zhao, Yiwei Li, Hanqi Jiang, Yi Pan, Junhao Chen(+12 more)

Figure 1 for Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

Figure 2 for Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

Figure 3 for Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

Figure 4 for Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

Abstract:Artificial Intelligence (AI) has become essential in modern healthcare, with large language models (LLMs) offering promising advances in clinical decision-making. Traditional model-based approaches, including those leveraging in-context demonstrations and those with specialized medical fine-tuning, have demonstrated strong performance in medical language processing but struggle with real-time adaptability, multi-step reasoning, and handling complex medical tasks. Agent-based AI systems address these limitations by incorporating reasoning traces, tool selection based on context, knowledge retrieval, and both short- and long-term memory. These additional features enable the medical AI agent to handle complex medical scenarios where decision-making should be built on real-time interaction with the environment. Therefore, unlike conventional model-based approaches that treat medical queries as isolated questions, medical AI agents approach them as complex tasks and behave more like human doctors. In this paper, we study the choice of the backbone LLM for medical AI agents, which is the foundation for the agent's overall reasoning and action generation. In particular, we consider the emergent o1 model and examine its impact on agents' reasoning, tool-use adaptability, and real-time information retrieval across diverse clinical scenarios, including high-stakes settings such as intensive care units (ICUs). Our findings demonstrate o1's ability to enhance diagnostic accuracy and consistency, paving the way for smarter, more responsive AI tools that support better patient outcomes and decision-making efficacy in clinical practice.

Via

Access Paper or Ask Questions

Large Language Models for Manufacturing

Oct 28, 2024

Yiwei Li, Huaqin Zhao, Hanqi Jiang, Yi Pan, Zhengliang Liu, Zihao Wu, Peng Shu, Jie Tian, Tianze Yang, Shaochen Xu(+14 more)

Figure 1 for Large Language Models for Manufacturing

Figure 2 for Large Language Models for Manufacturing

Figure 3 for Large Language Models for Manufacturing

Figure 4 for Large Language Models for Manufacturing

Abstract:The rapid advances in Large Language Models (LLMs) have the potential to transform manufacturing industry, offering new opportunities to optimize processes, improve efficiency, and drive innovation. This paper provides a comprehensive exploration of the integration of LLMs into the manufacturing domain, focusing on their potential to automate and enhance various aspects of manufacturing, from product design and development to quality control, supply chain optimization, and talent management. Through extensive evaluations across multiple manufacturing tasks, we demonstrate the remarkable capabilities of state-of-the-art LLMs, such as GPT-4V, in understanding and executing complex instructions, extracting valuable insights from vast amounts of data, and facilitating knowledge sharing. We also delve into the transformative potential of LLMs in reshaping manufacturing education, automating coding processes, enhancing robot control systems, and enabling the creation of immersive, data-rich virtual environments through the industrial metaverse. By highlighting the practical applications and emerging use cases of LLMs in manufacturing, this paper aims to provide a valuable resource for professionals, researchers, and decision-makers seeking to harness the power of these technologies to address real-world challenges, drive operational excellence, and unlock sustainable growth in an increasingly competitive landscape.

Via

Access Paper or Ask Questions

A Survey of Hallucination in Large Visual Language Models

Oct 20, 2024

Wei Lan, Wenyi Chen, Qingfeng Chen, Shirui Pan, Huiyu Zhou, Yi Pan

Figure 1 for A Survey of Hallucination in Large Visual Language Models

Figure 2 for A Survey of Hallucination in Large Visual Language Models

Figure 3 for A Survey of Hallucination in Large Visual Language Models

Figure 4 for A Survey of Hallucination in Large Visual Language Models

Abstract:The Large Visual Language Models (LVLMs) enhances user interaction and enriches user experience by integrating visual modality on the basis of the Large Language Models (LLMs). It has demonstrated their powerful information processing and generation capabilities. However, the existence of hallucinations has limited the potential and practical effectiveness of LVLM in various fields. Although lots of work has been devoted to the issue of hallucination mitigation and correction, there are few reviews to summary this issue. In this survey, we first introduce the background of LVLMs and hallucinations. Then, the structure of LVLMs and main causes of hallucination generation are introduced. Further, we summary recent works on hallucination correction and mitigation. In addition, the available hallucination evaluation benchmarks for LVLMs are presented from judgmental and generative perspectives. Finally, we suggest some future research directions to enhance the dependability and utility of LVLMs.

Via

Access Paper or Ask Questions

EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Oct 12, 2024

Yi Pan, Hanqi Jiang, Junhao Chen, Yiwei Li, Huaqin Zhao, Yifan Zhou, Peng Shu, Zihao Wu, Zhengliang Liu, Dajiang Zhu(+3 more)

Figure 1 for EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Figure 2 for EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Figure 3 for EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Figure 4 for EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Abstract:Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware. Significant advancements have been made in SNN-based convolutional neural networks (CNNs) and Transformer architectures. However, their applications in the medical imaging domain remain underexplored. In this study, we introduce EG-SpikeFormer, an SNN architecture designed for clinical tasks that integrates eye-gaze data to guide the model's focus on diagnostically relevant regions in medical images. This approach effectively addresses shortcut learning issues commonly observed in conventional models, especially in scenarios with limited clinical data and high demands for model reliability, generalizability, and transparency. Our EG-SpikeFormer not only demonstrates superior energy efficiency and performance in medical image classification tasks but also enhances clinical relevance. By incorporating eye-gaze data, the model improves interpretability and generalization, opening new directions for the application of neuromorphic computing in healthcare.

Via

Access Paper or Ask Questions

ECHOPulse: ECG controlled echocardio-grams video generation

Oct 04, 2024

Yiwei Li, Sekeun Kim, Zihao Wu, Hanqi Jiang, Yi Pan, Pengfei Jin, Sifan Song, Yucheng Shi, Tianze Yang, Tianming Liu(+2 more)

Figure 1 for ECHOPulse: ECG controlled echocardio-grams video generation

Figure 2 for ECHOPulse: ECG controlled echocardio-grams video generation

Figure 3 for ECHOPulse: ECG controlled echocardio-grams video generation

Figure 4 for ECHOPulse: ECG controlled echocardio-grams video generation

Abstract:Echocardiography (ECHO) is essential for cardiac assessments, but its video quality and interpretation heavily relies on manual expertise, leading to inconsistent results from clinical and portable devices. ECHO video generation offers a solution by improving automated monitoring through synthetic data and generating high-quality videos from routine health data. However, existing models often face high computational costs, slow inference, and rely on complex conditional prompts that require experts' annotations. To address these challenges, we propose ECHOPULSE, an ECG-conditioned ECHO video generation model. ECHOPULSE introduces two key advancements: (1) it accelerates ECHO video generation by leveraging VQ-VAE tokenization and masked visual token modeling for fast decoding, and (2) it conditions on readily accessible ECG signals, which are highly coherent with ECHO videos, bypassing complex conditional prompts. To the best of our knowledge, this is the first work to use time-series prompts like ECG signals for ECHO video generation. ECHOPULSE not only enables controllable synthetic ECHO data generation but also provides updated cardiac function information for disease monitoring and prediction beyond ECG alone. Evaluations on three public and private datasets demonstrate state-of-the-art performance in ECHO video generation across both qualitative and quantitative measures. Additionally, ECHOPULSE can be easily generalized to other modality generation tasks, such as cardiac MRI, fMRI, and 3D CT generation. Demo can seen from \url{https://github.com/levyisthebest/ECHOPulse_Prelease}.

Via

Access Paper or Ask Questions

Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

Oct 01, 2024

Hanqi Jiang, Xixuan Hao, Yuzhou Huang, Chong Ma, Jiaxun Zhang, Yi Pan, Ruimao Zhang

Figure 1 for Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

Figure 2 for Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

Figure 3 for Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

Figure 4 for Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

Abstract:This paper introduces an innovative approach to Medical Vision-Language Pre-training (Med-VLP) area in the specialized context of radiograph representation learning. While conventional methods frequently merge textual annotations into unified reports, we acknowledge the intrinsic hierarchical relationship between the findings and impression section in radiograph datasets. To establish a targeted correspondence between images and texts, we propose a novel HybridMED framework to align global-level visual representations with impression and token-level visual representations with findings. Moreover, our framework incorporates a generation decoder that employs two proxy tasks, responsible for generating the impression from (1) images, via a captioning branch, and (2) findings, through a summarization branch. Additionally, knowledge distillation is leveraged to facilitate the training process. Experiments on the MIMIC-CXR dataset reveal that our summarization branch effectively distills knowledge to the captioning branch, enhancing model performance without significantly increasing parameter requirements due to the shared self-attention and feed-forward architecture.

* ECCV 2024 Workshop
* 18 pages

Via

Access Paper or Ask Questions

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Sep 27, 2024

Tianyang Zhong, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu, Yanjun Lyu, Peng Shu, Xiaowei Yu(+68 more)

Figure 1 for Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Figure 2 for Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Figure 3 for Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Figure 4 for Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Abstract:This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. Key findings include: -83.3% success rate in solving complex competitive programming problems, surpassing many human experts. -Superior ability in generating coherent and accurate radiology reports, outperforming other evaluated models. -100% accuracy in high school-level mathematical reasoning tasks, providing detailed step-by-step solutions. -Advanced natural language inference capabilities across general and specialized domains like medicine. -Impressive performance in chip design tasks, outperforming specialized models in areas such as EDA script generation and bug analysis. -Remarkable proficiency in anthropology and geology, demonstrating deep understanding and reasoning in these specialized fields. -Strong capabilities in quantitative investing. O1 has comprehensive financial knowledge and statistical modeling skills. -Effective performance in social media analysis, including sentiment analysis and emotion recognition. The model excelled particularly in tasks requiring intricate reasoning and knowledge integration across various fields. While some limitations were observed, including occasional errors on simpler problems and challenges with certain highly specialized concepts, the overall results indicate significant progress towards artificial general intelligence.

Via

Access Paper or Ask Questions

Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Aug 06, 2024

Xiaoqing Zhang, Qiushi Nie, Zunjie Xiao, Jilu Zhao, Xiao Wu, Pengxin Guo, Runzhi Li, Jin Liu, Yanjie Wei, Yi Pan

Figure 1 for Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Figure 2 for Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Figure 3 for Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Figure 4 for Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Abstract:Spatial pooling (SP) and cross-channel pooling (CCP) operators have been applied to aggregate spatial features and pixel-wise features from feature maps in deep neural networks (DNNs), respectively. Their main goal is to reduce computation and memory overhead without visibly weakening the performance of DNNs. However, SP often faces the problem of losing the subtle feature representations, while CCP has a high possibility of ignoring salient feature representations, which may lead to both miscalibration of confidence issues and suboptimal medical classification results. To address these problems, we propose a novel dual-view framework, the first to systematically investigate the relative roles of SP and CCP by analyzing the difference between spatial features and pixel-wise features. Based on this framework, we propose a new pooling method, termed dual-view pyramid pooling (DVPP), to aggregate multi-scale dual-view features. DVPP aims to boost both medical image classification and confidence calibration performance by fully leveraging the merits of SP and CCP operators from a dual-axis perspective. Additionally, we discuss how to fulfill DVPP with five parameter-free implementations. Extensive experiments on six 2D/3D medical image classification tasks show that our DVPP surpasses state-of-the-art pooling methods in terms of medical image classification results and confidence calibration across different DNNs.

* 27

Via

Access Paper or Ask Questions

A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

Aug 02, 2024

Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, Yunpeng Cai

Figure 1 for A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

Figure 2 for A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

Figure 3 for A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

Figure 4 for A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

Abstract:Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, but also provides explainability. Our framework effectively separates class-related features from class-unrelated features of the samples, and generate new samples that preserve identity features while altering class attributes by embedding different class-related features. We perform topological data analysis on the extracted class-related features and obtain a globally explainable manifold, and for each abnormal sample to be segmented, a meaningful normal sample could be effectively generated with the guidance of the rule-based paths designed within the manifold for comparison for identifying the tumor regions. We evaluate our proposed method on two datasets, which demonstrates superior performance of brain tumor segmentation. The code is available at https://github.com/xrt11/tumor-segmentation.

* 2024 IEEE International Conference on Multimedia and Expo

Via

Access Paper or Ask Questions

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

Aug 02, 2024

Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li(+14 more)

Figure 1 for A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

Abstract:In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM.

Via

Access Paper or Ask Questions