Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Yan

WS-DETR: Robust Water Surface Object Detection through Vision-Radar Fusion with Detection Transformer

Apr 10, 2025

Huilin Yin, Pengyu Wang, Senmao Li, Jun Yan, Daniel Watzenig

Abstract:Robust object detection for Unmanned Surface Vehicles (USVs) in complex water environments is essential for reliable navigation and operation. Specifically, water surface object detection faces challenges from blurred edges and diverse object scales. Although vision-radar fusion offers a feasible solution, existing approaches suffer from cross-modal feature conflicts, which negatively affect model robustness. To address this problem, we propose a robust vision-radar fusion model WS-DETR. In particular, we first introduce a Multi-Scale Edge Information Integration (MSEII) module to enhance edge perception and a Hierarchical Feature Aggregator (HiFA) to boost multi-scale object detection in the encoder. Then, we adopt self-moving point representations for continuous convolution and residual connection to efficiently extract irregular features under the scenarios of irregular point cloud data. To further mitigate cross-modal conflicts, an Adaptive Feature Interactive Fusion (AFIF) module is introduced to integrate visual and radar features through geometric alignment and semantic fusion. Extensive experiments on the WaterScenes dataset demonstrate that WS-DETR achieves state-of-the-art (SOTA) performance, maintaining its superiority even under adverse weather and lighting conditions.

Via

Access Paper or Ask Questions

A Universal Model Combining Differential Equations and Neural Networks for Ball Trajectory Prediction

Mar 25, 2025

Zhiwei Shi, Chengxi Zhu, Fan Yang, Jun Yan, Zheyun Qin, Songquan Shi, Zhumin Chen

Abstract:This paper presents a data driven universal ball trajectory prediction method integrated with physics equations. Existing methods are designed for specific ball types and struggle to generalize. This challenge arises from three key factors. First, learning-based models require large datasets but suffer from accuracy drops in unseen scenarios. Second, physics-based models rely on complex formulas and detailed inputs, yet accurately obtaining ball states, such as spin, is often impractical. Third, integrating physical principles with neural networks to achieve high accuracy, fast inference, and strong generalization remains difficult. To address these issues, we propose an innovative approach that incorporates physics-based equations and neural networks. We first derive three generalized physical formulas. Then, using a neural network and observed trajectory points, we infer certain parameters while fitting the remaining ones. These formulas enable precise trajectory prediction with minimal training data: only a few dozen samples. Extensive experiments demonstrate our method superiority in generalization, real-time performance, and accuracy.

* This submission was made without my advisor's consent, and I mistakenly uploaded an incorrect version of the paper. Additionally, some content in the paper should not be made publicly available at this time, as per my advisor's wishes. I apologize for any inconvenience this may have caused

Via

Access Paper or Ask Questions

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

Mar 11, 2025

Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long T. Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee(+5 more)

Abstract:Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities-utterances, turns, and sessions-into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs' cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.

Via

Access Paper or Ask Questions

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Mar 10, 2025

Fan Yin, Zifeng Wang, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee(+2 more)

Figure 1 for Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Figure 2 for Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Figure 3 for Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Figure 4 for Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Abstract:Large language models (LLMs) have exhibited the ability to effectively utilize external tools to address user queries. However, their performance may be limited in complex, multi-turn interactions involving users and multiple tools. To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans. The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls. We model the complicated function interactions in multi-turn cases with graph and design novel node operations to build reliable signature paths. Motivated by context distillation, when guiding the generation of positive and negative trajectories using a teacher model, we provide reference function call sequences as positive hints in context and contrastive, incorrect function calls as negative hints. Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery, surpassing the performance of the teacher model Gemini-1.5-pro-002 by a large margin in function calling.

* 12 pages, 3 figures, 4 tables

Via

Access Paper or Ask Questions

O-RAN xApps Conflict Management using Graph Convolutional Networks

Mar 05, 2025

Maryam Al Shami, Jun Yan, Emmanuel Thepie Fapi

Figure 1 for O-RAN xApps Conflict Management using Graph Convolutional Networks

Figure 2 for O-RAN xApps Conflict Management using Graph Convolutional Networks

Figure 3 for O-RAN xApps Conflict Management using Graph Convolutional Networks

Figure 4 for O-RAN xApps Conflict Management using Graph Convolutional Networks

Abstract:Open Radio Access Network (O-RAN) adopts a flexible, open, and virtualized structure with standardized interfaces, reducing dependency on a single supplier. Conflict management in O-RAN refers to the process of identifying and resolving conflicts between network applications. xApps are applications deployed at the RAN Intelligent Controller (RIC) that leverage advanced AI/ML algorithms to make dynamic decisions for network optimization. The lack of a unified mechanism to coordinate and prioritize the actions of different applications can create three types of conflicts (direct, indirect, and implicit). In our paper, we introduce a novel data-driven GCN-based method called Graph-based xApps Conflict and Root Cause Analysis Engine (GRACE) based on Graph Convolutional Network (GCN). It detects three types of conflicts (direct, indirect, and implicit) and pinpoints the root causes (xApps). GRACE captures the complex and hidden dependencies among the xApps, the controlled parameters, and the KPIs in O-RAN to detect possible conflicts. Then, it identifies the root causes (xApps) contributing to the detected conflicts. The proposed method was tested on highly imbalanced datasets where the number of conflict instances ranges from 40% to 10%. The model is tested in a setting that simulates real-world scenarios where conflicts are rare to assess its performance and generalizability. Experimental results demonstrate an exceptional performance, achieving a high F1-score greater than 98% for all the case studies.

* 9 pages, 10 figures

Via

Access Paper or Ask Questions

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms

Feb 05, 2025

Hepeng Li, Yuhong Liu, Jun Yan

Abstract:Artificially intelligent (AI) agents that are capable of autonomous learning and independent decision-making hold great promise for addressing complex challenges across domains like transportation, energy systems, and manufacturing. However, the surge in AI systems' design and deployment driven by various stakeholders with distinct and unaligned objectives introduces a crucial challenge: how can uncoordinated AI systems coexist and evolve harmoniously in shared environments without creating chaos? To address this, we advocate for a fundamental rethinking of existing multi-agent frameworks, such as multi-agent systems and game theory, which are largely limited to predefined rules and static objective structures. We posit that AI agents should be empowered to dynamically adjust their objectives, make compromises, form coalitions, and safely compete or cooperate through evolving relationships and social feedback. Through this paper, we call for a shift toward the emergent, self-organizing, and context-aware nature of these systems.

Via

Access Paper or Ask Questions

Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

Dec 03, 2024

Kai Sun, Siyan Xue, Fuchun Sun, Haoran Sun, Yu Luo, Ling Wang, Siyuan Wang, Na Guo, Lei Liu, Tian Zhao(+5 more)

Figure 1 for Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

Figure 2 for Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

Figure 3 for Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

Figure 4 for Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

Abstract:Recent advancements in deep learning have significantly revolutionized the field of clinical diagnosis and treatment, offering novel approaches to improve diagnostic precision and treatment efficacy across diverse clinical domains, thus driving the pursuit of precision medicine. The growing availability of multi-organ and multimodal datasets has accelerated the development of large-scale Medical Multimodal Foundation Models (MMFMs). These models, known for their strong generalization capabilities and rich representational power, are increasingly being adapted to address a wide range of clinical tasks, from early diagnosis to personalized treatment strategies. This review offers a comprehensive analysis of recent developments in MMFMs, focusing on three key aspects: datasets, model architectures, and clinical applications. We also explore the challenges and opportunities in optimizing multimodal representations and discuss how these advancements are shaping the future of healthcare by enabling improved patient outcomes and more efficient clinical workflows.

Via

Access Paper or Ask Questions

Rethinking Backdoor Detection Evaluation for Language Models

Aug 31, 2024

Jun Yan, Wenjie Jacky Mo, Xiang Ren, Robin Jia

Figure 1 for Rethinking Backdoor Detection Evaluation for Language Models

Figure 2 for Rethinking Backdoor Detection Evaluation for Language Models

Figure 3 for Rethinking Backdoor Detection Evaluation for Language Models

Figure 4 for Rethinking Backdoor Detection Evaluation for Language Models

Abstract:Backdoor attacks, in which a model behaves maliciously when given an attacker-specified trigger, pose a major security risk for practitioners who depend on publicly released language models. Backdoor detection methods aim to detect whether a released model contains a backdoor, so that practitioners can avoid such vulnerabilities. While existing backdoor detection methods have high accuracy in detecting backdoored models on standard benchmarks, it is unclear whether they can robustly identify backdoors in the wild. In this paper, we examine the robustness of backdoor detectors by manipulating different factors during backdoor planting. We find that the success of existing methods highly depends on how intensely the model is trained on poisoned data during backdoor planting. Specifically, backdoors planted with either more aggressive or more conservative training are significantly more difficult to detect than the default ones. Our results highlight a lack of robustness of existing backdoor detectors and the limitations in current benchmark construction.

Via

Access Paper or Ask Questions

Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

Aug 19, 2024

Jun Yan, Pengyu Wang, Danni Wang, Weiquan Huang, Daniel Watzenig, Huilin Yin

Figure 1 for Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

Figure 2 for Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

Figure 3 for Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

Figure 4 for Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

Abstract:Semantic segmentation is a significant perception task in autonomous driving. It suffers from the risks of adversarial examples. In the past few years, deep learning has gradually transitioned from convolutional neural network (CNN) models with a relatively small number of parameters to foundation models with a huge number of parameters. The segment-anything model (SAM) is a generalized image segmentation framework that is capable of handling various types of images and is able to recognize and segment arbitrary objects in an image without the need to train on a specific object. It is a unified model that can handle diverse downstream tasks, including semantic segmentation, object detection, and tracking. In the task of semantic segmentation for autonomous driving, it is significant to study the zero-shot adversarial robustness of SAM. Therefore, we deliver a systematic empirical study on the robustness of SAM without additional training. Based on the experimental results, the zero-shot adversarial robustness of the SAM under the black-box corruptions and white-box adversarial attacks is acceptable, even without the need for additional training. The finding of this study is insightful in that the gigantic model parameters and huge amounts of training data lead to the phenomenon of emergence, which builds a guarantee of adversarial robustness. SAM is a vision foundation model that can be regarded as an early prototype of an artificial general intelligence (AGI) pipeline. In such a pipeline, a unified model can handle diverse tasks. Therefore, this research not only inspects the impact of vision foundation models on safe autonomous driving but also provides a perspective on developing trustworthy AGI. The code is available at: https://github.com/momo1986/robust_sam_iv.

* Accepted to IAVVC 2024

Via

Access Paper or Ask Questions

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Aug 05, 2024

Haolin Jin, Linghan Huang, Haipeng Cai, Jun Yan, Bo Li, Huaming Chen

Figure 1 for From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Figure 2 for From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Figure 3 for From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Figure 4 for From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Abstract:With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for Artificial General Intelligence (AGI), combine LLMs as the core for decision-making and action-taking, addressing some of the inherent limitations of LLMs such as lack of autonomy and self-improvement. Despite numerous studies and surveys exploring the possibility of using LLMs in software engineering, it lacks a clear distinction between LLMs and LLM based agents. It is still in its early stage for a unified standard and benchmarking to qualify an LLM solution as an LLM-based agent in its domain. In this survey, we broadly investigate the current practice and solutions for LLMs and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We review and differentiate the work of LLMs and LLM-based agents from these six topics, examining their differences and similarities in tasks, benchmarks, and evaluation metrics. Finally, we discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering. We anticipate this work will shed some lights on pushing the boundaries of LLM-based agents in software engineering for future research.

Via

Access Paper or Ask Questions