Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiafan Zhuang

MAGIS: Evidence-Based Multi-Agent Reasoning for Interpretable Strabismus Clinical Decision-Making

Jun 08, 2026

Xikai Tang, Yifan Wang, Jiafan Zhuang, Li Luo, Jinming Guo, Xiaoling Xie, Jiacheng Liu, Peiwei Wei, Lihao Zhong, Xiaoli Kang(+5 more)

Abstract:Strabismus is a common ocular disorder that requires fine-grained subtype diagnosis for individualized treatment planning. However, existing deep learning methods mainly provide diagnostic predictions without transparent reasoning, while recent large vision-language models (LVLMs), although promising for joint image understanding and report generation, remain highly prone to hallucination in this evidence-sensitive and rule-driven medical task. To address these challenges, we propose MAGIS, an evidence-based Multi-AGent reasoning for Interpretable Strabismus diagnosis framework. MAGIS transforms black-box end-to-end generation into a structured diagnostic process consisting of candidate hypothesis generation, dual-evidence constrained context, evidence-based corrective verification, and report generation. Specifically, we introduce a Dual-Evidence Constrained Context (DECC) mechanism that jointly organizes visual evidence from the photograph of the nine cardinal positions of gaze and evidence-based clinical diagnostic rules into a constrained context for reliable diagnostic reasoning. We further develop an Evidence-Based Corrective Verification (EBCV) mechanism that verifies whether the current diagnostic hypothesis is supported by visual evidence, heatmap-based visual cues, and evidence-based clinical diagnostic rules. Hypothesis refinement is triggered when inconsistency is detected. Experiments on a fine-grained strabismus benchmark demonstrate that MAGIS not only significantly outperforms other state-of-the-art diagnostic systems, improving the weighted F1 score from 72.0% to 91.3%, but also substantially improves the clinical reliability (consistency, alignment, and completeness) of generated diagnostic reports. These results demonstrate that MAGIS provides an effective solution for building accurate, evidence-based, and clinically interpretable strabismus diagnosis systems.

Via

Access Paper or Ask Questions

Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

Jul 04, 2024

Jiafan Zhuang, Zihao Xia, Gaofei Han

Figure 1 for Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

Figure 2 for Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

Figure 3 for Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

Figure 4 for Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

Abstract:Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie, causal representation disentanglement, which can identify the causal and non-causal factors in representations. After that, we only pass causal factors for subsequent policy learning and thus explicitly eliminate the influence of non-causal factors, which effectively improves the generalization ability of DRL models. Experimental results show that our proposed method can achieve robust navigation performance and effective collision avoidance especially in unseen scenarios, which significantly outperforms existing SOTA algorithms.

Via

Access Paper or Ask Questions

Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

Jul 04, 2024

Jiafan Zhuang, Gaofei Han

Abstract:In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in performance degradation in unseen environments. To address this issue, we investigate the cause of weak generalization ability in DRL and propose a novel causal feature selection module. This module can be integrated into the policy network and effectively filters out non-causal factors in representations, thereby reducing the influence of spurious correlations between non-causal factors and action predictions. Experimental results demonstrate that our proposed method can achieve robust navigation performance and effective collision avoidance especially in scenarios with unseen backgrounds and obstacles, which significantly outperforms existing state-of-the-art algorithms.

Via

Access Paper or Ask Questions

Why does Stereo Triangulation Not Work in UAV Distance Estimation

Jun 15, 2023

Jiafan Zhuang, Duan Yuan, Rihong Yan, Xiangyu Dong, Yutao Zhou, Weixin Huang, Zhun Fan

Figure 1 for Why does Stereo Triangulation Not Work in UAV Distance Estimation

Figure 2 for Why does Stereo Triangulation Not Work in UAV Distance Estimation

Figure 3 for Why does Stereo Triangulation Not Work in UAV Distance Estimation

Figure 4 for Why does Stereo Triangulation Not Work in UAV Distance Estimation

Abstract:UAV distance estimation plays an important role for path planning of swarm UAVs and collision avoidance. However, the lack of annotated data seriously hinder the related studies. In this paper, we build and present a UAVDE dataset for UAV distance estimation, in which distance between two UAVs is obtained by UWB sensors. During experiments, we surprisingly observe that the commonly used stereo triangulation can not stand for UAV scenes. The core reason is the position deviation issue of UAVs due to long shooting distance and camera vibration, which is common in UAV scenes. To tackle this issue, we propose a novel position correction module (PCM), which can directly predict the offset between the image positions and the actual ones of UAVs and perform calculation compensation in stereo triangulation. Besides, to further boost performance on hard samples, we propose a dynamic iterative correction mechanism, which is composed of multiple stacked PCMs and a gating mechanism to adaptively determine whether further correction is required according to the difficulty of data samples. Consequently, the position deviation issue can be effectively alleviated. We conduct extensive experiments on UAVDE, and our proposed method can achieve a 38.84% performance improvement, which demonstrates its effectiveness and superiority. The code and dataset would be released.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

Jan 10, 2023

Jiafan Zhuang, Zilei Wang, Junjie Li

Figure 1 for Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

Figure 2 for Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

Figure 3 for Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

Figure 4 for Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

Abstract:Video semantic segmentation aims to generate accurate semantic maps for each video frame. To this end, many works dedicate to integrate diverse information from consecutive frames to enhance the features for prediction, where a feature alignment procedure via estimated optical flow is usually required. However, the optical flow would inevitably suffer from inaccuracy, and then introduce noises in feature fusion and further result in unsatisfactory segmentation results. In this paper, to tackle the misalignment issue, we propose a spatial-temporal fusion (STF) module to model dense pairwise relationships among multi-frame features. Different from previous methods, STF uniformly and adaptively fuses features at different spatial and temporal positions, and avoids error-prone optical flow estimation. Besides, we further exploit feature refinement within a single frame and propose a novel memory-augmented refinement (MAR) module to tackle difficult predictions among semantic boundaries. Specifically, MAR can store the boundary features and prototypes extracted from the training samples, which together form the task-specific memory, and then use them to refine the features during inference. Essentially, MAR can move the hard features closer to the most likely category and thus make them more discriminative. We conduct extensive experiments on Cityscapes and CamVid, and the results show that our proposed methods significantly outperform previous methods and achieves the state-of-the-art performance. Code and pretrained models are available at https://github.com/jfzhuang/ST_Memory.

Via

Access Paper or Ask Questions

5th Place Solution for VSPW 2021 Challenge

Dec 13, 2021

Jiafan Zhuang, Yixin Zhang, Xinyu Hu, Junjie Li, Zilei Wang

Figure 1 for 5th Place Solution for VSPW 2021 Challenge

Figure 2 for 5th Place Solution for VSPW 2021 Challenge

Figure 3 for 5th Place Solution for VSPW 2021 Challenge

Figure 4 for 5th Place Solution for VSPW 2021 Challenge

Abstract:In this article, we introduce the solution we used in the VSPW 2021 Challenge. Our experiments are based on two baseline models, Swin Transformer and MaskFormer. To further boost performance, we adopt stochastic weight averaging technique and design hierarchical ensemble strategy. Without using any external semantic segmentation dataset, our solution ranked the 5th place in the private leaderboard. Besides, we have some interesting attempts to tackle long-tail recognition and overfitting issues, which achieves improvement on val subset. Maybe due to distribution difference, these attempts don't work on test subset. We will also introduce these attempts and hope to inspire other researchers.

* Presented in ICCV'21 Workshop

Via

Access Paper or Ask Questions

Video Semantic Segmentation with Distortion-Aware Feature Correction

Jun 18, 2020

Jiafan Zhuang, Zilei Wang, Bingke Wang

Figure 1 for Video Semantic Segmentation with Distortion-Aware Feature Correction

Figure 2 for Video Semantic Segmentation with Distortion-Aware Feature Correction

Figure 3 for Video Semantic Segmentation with Distortion-Aware Feature Correction

Figure 4 for Video Semantic Segmentation with Distortion-Aware Feature Correction

Abstract:Video semantic segmentation is active in recent years benefited from the great progress of image semantic segmentation. For such a task, the per-frame image segmentation is generally unacceptable in practice due to high computation cost. To tackle this issue, many works use the flow-based feature propagation to reuse the features of previous frames. However, the optical flow estimation inevitably suffers inaccuracy and then causes the propagated features distorted. In this paper, we propose distortion-aware feature correction to alleviate the issue, which improves video segmentation performance by correcting distorted propagated features. To be specific, we firstly propose to transfer distortion patterns from feature into image space and conduct effective distortion map prediction. Benefited from the guidance of distortion maps, we proposed Feature Correction Module (FCM) to rectify propagated features in the distorted areas. Our proposed method can significantly boost the accuracy of video semantic segmentation at a low price. The extensive experimental results on Cityscapes and CamVid show that our method outperforms the recent state-of-the-art methods.

Via

Access Paper or Ask Questions