Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanming Zhang

Zhejiang Provincial People's Hospital Affiliated People's Hospital, Hangzhou Medical College, Hangzhou, 314408, China, Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, 314408, China

Diagnostic Performance of Universal-Learning Ultrasound AI Across Multiple Organs and Tasks: the UUSIC25 Challenge

Dec 19, 2025

Zehui Lin, Luyi Han, Xin Wang, Ying Zhou, Yanming Zhang, Tianyu Zhang, Lingyun Bao, Shandong Wu, Dong Xu, Tao Tan(+1 more)

Abstract:IMPORTANCE: Current ultrasound AI remains fragmented into single-task tools, limiting clinical utility compared to versatile modern ultrasound systems. OBJECTIVE: To evaluate the diagnostic accuracy and efficiency of single general-purpose deep learning models for multi-organ classification and segmentation. DESIGN: The Universal UltraSound Image Challenge 2025 (UUSIC25) involved developing algorithms on 11,644 images (public/private). Evaluation used an independent, multi-center test set of 2,479 images, including data from a center completely unseen during training to assess generalization. OUTCOMES: Diagnostic performance (Dice Similarity Coefficient [DSC]; Area Under the Receiver Operating Characteristic Curve [AUC]) and computational efficiency (inference time, GPU memory). RESULTS: Of 15 valid algorithms, the top model (SMART) achieved a macro-averaged DSC of 0.854 across 5 segmentation tasks and AUC of 0.766 for binary classification. Models showed high capability in segmentation (e.g., fetal head DSC: 0.942) but variability in complex tasks subject to domain shift. Notably, in breast cancer molecular subtyping, the top model's performance dropped from AUC 0.571 (internal) to 0.508 (unseen external center), highlighting generalization challenges. CONCLUSIONS: General-purpose AI models achieve high accuracy and efficiency across multiple tasks using a single architecture. However, performance degradation on unseen data suggests domain generalization is critical for future clinical deployment.

* 8 pages, 2 figures. Summary of the UUSIC25 Challenge held at MICCAI 2025. Extensive Supplementary Material (containing original team reports) is available in the "ancillary files" section

Via

Access Paper or Ask Questions

What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking

Sep 05, 2025

Yuan Sui, Yanming Zhang, Yi Liao, Yu Gu, Guohua Tang, Zhongqian Sun, Wei Yang, Bryan Hooi

Abstract:Large language models (LLMs) excel at processing information reactively but lack the ability to systemically explore hypothetical futures. They cannot ask, "what if we take this action? how will it affect the final outcome" and forecast its potential consequences before acting. This critical gap limits their utility in dynamic, high-stakes scenarios like strategic planning, risk assessment, and real-time decision making. To bridge this gap, we propose WiA-LLM, a new paradigm that equips LLMs with proactive thinking capabilities. Our approach integrates What-If Analysis (WIA), a systematic approach for evaluating hypothetical scenarios by changing input variables. By leveraging environmental feedback via reinforcement learning, WiA-LLM moves beyond reactive thinking. It dynamically simulates the outcomes of each potential action, enabling the model to anticipate future states rather than merely react to the present conditions. We validate WiA-LLM in Honor of Kings (HoK), a complex multiplayer game environment characterized by rapid state changes and intricate interactions. The game's real-time state changes require precise multi-step consequence prediction, making it an ideal testbed for our approach. Experimental results demonstrate WiA-LLM achieves a remarkable 74.2% accuracy in forecasting game-state changes (up to two times gain over baselines). The model shows particularly significant gains in high-difficulty scenarios where accurate foresight is critical. To our knowledge, this is the first work to formally explore and integrate what-if analysis capabilities within LLMs. WiA-LLM represents a fundamental advance toward proactive reasoning in LLMs, providing a scalable framework for robust decision-making in dynamic environments with broad implications for strategic applications.

* arXiv admin note: text overlap with arXiv:2508.21365

Via

Access Paper or Ask Questions

V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes

Mar 13, 2025

Yanming Zhang, Jun-Kun Chen, Jipeng Lyu, Yu-Xiong Wang

Abstract:This paper introduces V$^2$Edit, a novel training-free framework for instruction-guided video and 3D scene editing. Addressing the critical challenge of balancing original content preservation with editing task fulfillment, our approach employs a progressive strategy that decomposes complex editing tasks into a sequence of simpler subtasks. Each subtask is controlled through three key synergistic mechanisms: the initial noise, noise added at each denoising step, and cross-attention maps between text prompts and video content. This ensures robust preservation of original video elements while effectively applying the desired edits. Beyond its native video editing capability, we extend V$^2$Edit to 3D scene editing via a "render-edit-reconstruct" process, enabling high-quality, 3D-consistent edits even for tasks involving substantial geometric changes such as object insertion. Extensive experiments demonstrate that our V$^2$Edit achieves high-quality and successful edits across various challenging video editing tasks and complex 3D scene editing tasks, thereby establishing state-of-the-art performance in both domains.

* Project Website: https://immortalco.github.io/V2Edit/

Via

Access Paper or Ask Questions

Power-Efficient Deceptive Wireless Beamforming Against Eavesdroppers

Mar 06, 2025

Georgios Chrysanidis, Antonios Argyriou, Le-Nam Tran, Yanming Zhang, Yanwei Liu

Abstract:Eavesdroppers of wireless signals want to infer as much as possible regarding the transmitter (Tx). Popular methods to minimize information leakage to the eavesdropper include covert communication, directional modulation, and beamforming with nulling. In this paper we do not attempt to prevent information leakage to the eavesdropper like the previous methods. Instead we propose to beamform the wireless signal at the Tx in such a way that it incorporates deceptive information. The beamformed orthogonal frequency division multiplexing (OFDM) signal includes a deceptive value for the Doppler (velocity) and range of the Tx. To design the optimal baseband waveform with these characteristics, we define and solve an optimization problem for power-efficient deceptive wireless beamforming (DWB). The relaxed convex Quadratic Program (QP) is solved using a heuristic algorithm. Our simulation results indicate that our DWB scheme can successfully inject deceptive information with low power consumption, while preserving the shape of the created beam.

* IEEE Wireless Communications and Networking Conference (WCNC) 2025

Via

Access Paper or Ask Questions

A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery

Nov 16, 2024

Grace Sng, Yanming Zhang, Klaus Mueller

Figure 1 for A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery

Figure 2 for A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery

Figure 3 for A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery

Figure 4 for A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery

Abstract:The increasing use of large language models (LLMs) in causal discovery as a substitute for human domain experts highlights the need for optimal model selection. This paper presents the first hallucination survey of popular LLMs for causal discovery. We show that hallucinations exist when using LLMs in causal discovery so the choice of LLM is important. We propose using Retrieval Augmented Generation (RAG) to reduce hallucinations when quality data is available. Additionally, we introduce a novel method employing multiple LLMs with an arbiter in a debate to audit edges in causal graphs, achieving a comparable reduction in hallucinations to RAG.

Via

Access Paper or Ask Questions

CausalChat: Interactive Causal Model Development and Refinement Using Large Language Models

Oct 18, 2024

Yanming Zhang, Akshith Kota, Eric Papenhausen, Klaus Mueller

Figure 1 for CausalChat: Interactive Causal Model Development and Refinement Using Large Language Models

Figure 2 for CausalChat: Interactive Causal Model Development and Refinement Using Large Language Models

Figure 3 for CausalChat: Interactive Causal Model Development and Refinement Using Large Language Models

Figure 4 for CausalChat: Interactive Causal Model Development and Refinement Using Large Language Models

Abstract:Causal networks are widely used in many fields to model the complex relationships between variables. A recent approach has sought to construct causal networks by leveraging the wisdom of crowds through the collective participation of humans. While this can yield detailed causal networks that model the underlying phenomena quite well, it requires a large number of individuals with domain understanding. We adopt a different approach: leveraging the causal knowledge that large language models, such as OpenAI's GPT-4, have learned by ingesting massive amounts of literature. Within a dedicated visual analytics interface, called CausalChat, users explore single variables or variable pairs recursively to identify causal relations, latent variables, confounders, and mediators, constructing detailed causal networks through conversation. Each probing interaction is translated into a tailored GPT-4 prompt and the response is conveyed through visual representations which are linked to the generated text for explanations. We demonstrate the functionality of CausalChat across diverse data contexts and conduct user studies involving both domain experts and laypersons.

Via

Access Paper or Ask Questions

AI-Generated Content Enhanced Computer-Aided Diagnosis Model for Thyroid Nodules: A ChatGPT-Style Assistant

Feb 04, 2024

Jincao Yao, Yunpeng Wang, Zhikai Lei, Kai Wang, Xiaoxian Li, Jianhua Zhou, Xiang Hao, Jiafei Shen, Zhenping Wang, Rongrong Ru(+6 more)

Abstract:An artificial intelligence-generated content-enhanced computer-aided diagnosis (AIGC-CAD) model, designated as ThyGPT, has been developed. This model, inspired by the architecture of ChatGPT, could assist radiologists in assessing the risk of thyroid nodules through semantic-level human-machine interaction. A dataset comprising 19,165 thyroid nodule ultrasound cases from Zhejiang Cancer Hospital was assembled to facilitate the training and validation of the model. After training, ThyGPT could automatically evaluate thyroid nodule and engage in effective communication with physicians through human-computer interaction. The performance of ThyGPT was rigorously quantified using established metrics such as the receiver operating characteristic (ROC) curve, area under the curve (AUC), sensitivity, and specificity. The empirical findings revealed that radiologists, when supplemented with ThyGPT, markedly surpassed the diagnostic acumen of their peers utilizing traditional methods as well as the performance of the model in isolation. These findings suggest that AIGC-CAD systems, exemplified by ThyGPT, hold the promise to fundamentally transform the diagnostic workflows of radiologists in forthcoming years.

Via

Access Paper or Ask Questions

An Explainable AI Approach to Large Language Model Assisted Causal Model Auditing and Development

Dec 23, 2023

Yanming Zhang, Brette Fitzgibbon, Dino Garofolo, Akshith Kota, Eric Papenhausen, Klaus Mueller

Abstract:Causal networks are widely used in many fields, including epidemiology, social science, medicine, and engineering, to model the complex relationships between variables. While it can be convenient to algorithmically infer these models directly from observational data, the resulting networks are often plagued with erroneous edges. Auditing and correcting these networks may require domain expertise frequently unavailable to the analyst. We propose the use of large language models such as ChatGPT as an auditor for causal networks. Our method presents ChatGPT with a causal network, one edge at a time, to produce insights about edge directionality, possible confounders, and mediating variables. We ask ChatGPT to reflect on various aspects of each causal link and we then produce visualizations that summarize these viewpoints for the human analyst to direct the edge, gather more data, or test further hypotheses. We envision a system where large language models, automated causal inference, and the human analyst and domain expert work hand in hand as a team to derive holistic and comprehensive causal models for any given case scenario. This paper presents first results obtained with an emerging prototype.

Via

Access Paper or Ask Questions

SportsTrack: An Innovative Method for Tracking Athletes in Sports Scenes

Nov 14, 2022

Jie Wang, Yuzhou Peng, Xiaodong Yang, Ting Wang, Yanming Zhang

Abstract:The SportsMOT competition aims to solve multiple object tracking of athletes in different sports scenes such as basketball or soccer. The competition is challenging because of the unstable camera view, athletes' complex trajectory, and complicated background. Previous MOT methods can not match enough high-quality tracks of athletes. To pursue higher performance of MOT in sports scenes, we introduce an innovative tracker named SportsTrack, we utilize tracking by detection as our detection paradigm. Then we will introduce a three-stage matching process to solve the motion blur and body overlapping in sports scenes. Meanwhile, we present another innovation point: one-to-many correspondence between detection bboxes and crowded tracks to handle the overlap of athletes' bodies during sports competitions. Compared to other trackers such as BOT-SORT and ByteTrack, We carefully restored edge-lost tracks that were ignored by other trackers. Finally, we reached the top 1 tracking score (76.264 HOTA) in the ECCV 2022 DeepAction SportsMOT competition.

* 7 pages,9 figures

Via

Access Paper or Ask Questions