Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ye Yuan

Sam

Automotive Radar Multi-Frame Track-Before-Detect Algorithm Considering Self-Positioning Errors

Apr 24, 2025

Wujun Li, Qing Miao, Ye Yuan, Yunlian Tian, Wei Yi, Kah Chan Teh

Abstract:This paper presents a method for the joint detection and tracking of weak targets in automotive radars using the multi-frame track-before-detect (MF-TBD) procedure. Generally, target tracking in automotive radars is challenging due to radar field of view (FOV) misalignment, nonlinear coordinate conversion, and self-positioning errors of the ego-vehicle, which are caused by platform motion. These issues significantly hinder the implementation of MF-TBD in automotive radars. To address these challenges, a new MF-TBD detection architecture is first proposed. It can adaptively adjust the detection threshold value based on the existence of moving targets within the radar FOV. Since the implementation of MF-TBD necessitates the inclusion of position, velocity, and yaw angle information of the ego-vehicle, each with varying degrees of measurement error, we further propose a multi-frame energy integration strategy for moving-platform radar and accurately derive the target energy integration path functions. The self-positioning errors of the ego-vehicle, which are usually not considered in some previous target tracking approaches, are well addressed. Numerical simulations and experimental results with real radar data demonstrate large detection and tracking gains over standard automotive radar processing in weak target environments.

Via

Access Paper or Ask Questions

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Mar 27, 2025

Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long(+16 more)

Abstract:The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architectural foundations, collaboration mechanisms, and evolutionary pathways. We unify fragmented research threads by revealing fundamental connections between agent design principles and their emergent behaviors in complex environments. Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time, while also addressing evaluation methodologies, tool applications, practical challenges, and diverse application domains. By surveying the latest developments in this rapidly evolving field, we offer researchers a structured taxonomy for understanding LLM agents and identify promising directions for future research. The collection is available at https://github.com/luo-junyu/Awesome-Agent-Papers.

* 329 papers surveyed, resources are at https://github.com/luo-junyu/Awesome-Agent-Papers

Via

Access Paper or Ask Questions

Optimization of MedSAM model based on bounding box adaptive perturbation algorithm

Mar 25, 2025

Boyi Li, Ye Yuan, Wenjun Tan

Abstract:The MedSAM model, built upon the SAM framework, enhances medical image segmentation through generalizable training but still exhibits notable limitations. First, constraints in the perturbation window settings during training can cause MedSAM to incorrectly segment small tissues or organs together with adjacent structures, leading to segmentation errors. Second, when dealing with medical image targets characterized by irregular shapes and complex structures, segmentation often relies on narrowing the bounding box to refine segmentation intent. However, MedSAM's performance under reduced bounding box prompts remains suboptimal. To address these challenges, this study proposes a bounding box adaptive perturbation algorithm to optimize the training process. The proposed approach aims to reduce segmentation errors for small targets and enhance the model's accuracy when processing reduced bounding box prompts, ultimately improving the robustness and reliability of the MedSAM model for complex medical imaging tasks.

* 6 pages, 6 figures, 3 Tables

Via

Access Paper or Ask Questions

Offline Model-Based Optimization: Comprehensive Review

Mar 21, 2025

Minsu Kim, Jiayao Gu, Ye Yuan, Taeyoung Yun, Zixuan Liu, Yoshua Bengio, Can Chen

Figure 1 for Offline Model-Based Optimization: Comprehensive Review

Figure 2 for Offline Model-Based Optimization: Comprehensive Review

Figure 3 for Offline Model-Based Optimization: Comprehensive Review

Figure 4 for Offline Model-Based Optimization: Comprehensive Review

Abstract:Offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking(reward hacking), exploiting model inaccuracies in unseen regions, or other spurious optimizations that yield misleadingly high performance estimates outside the training distribution. Recent advances in model-based optimization(MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field including safe control of superintelligent systems.

* 29 pages

Via

Access Paper or Ask Questions

Prompt-Driven Continual Graph Learning

Feb 10, 2025

Qi Wang, Tianfei Zhou, Ye Yuan, Rui Mao

Figure 1 for Prompt-Driven Continual Graph Learning

Figure 2 for Prompt-Driven Continual Graph Learning

Figure 3 for Prompt-Driven Continual Graph Learning

Figure 4 for Prompt-Driven Continual Graph Learning

Abstract:Continual Graph Learning (CGL), which aims to accommodate new tasks over evolving graph data without forgetting prior knowledge, is garnering significant research interest. Mainstream solutions adopt the memory replay-based idea, ie, caching representative data from earlier tasks for retraining the graph model. However, this strategy struggles with scalability issues for constantly evolving graphs and raises concerns regarding data privacy. Inspired by recent advancements in the prompt-based learning paradigm, this paper introduces a novel prompt-driven continual graph learning (PROMPTCGL) framework, which learns a separate prompt for each incoming task and maintains the underlying graph neural network model fixed. In this way, PROMPTCGL naturally avoids catastrophic forgetting of knowledge from previous tasks. More specifically, we propose hierarchical prompting to instruct the model from both feature- and topology-level to fully address the variability of task graphs in dynamic continual learning. Additionally, we develop a personalized prompt generator to generate tailored prompts for each graph node while minimizing the number of prompts needed, leading to constant memory consumption regardless of the graph scale. Extensive experiments on four benchmarks show that PROMPTCGL achieves superior performance against existing CGL approaches while significantly reducing memory consumption. Our code is available at https://github.com/QiWang98/PromptCGL.

* 12 pages, 7figures

Via

Access Paper or Ask Questions

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

Dec 12, 2024

Xueting Li, Ye Yuan, Shalini De Mello, Gilles Daviet, Jonathan Leaf, Miles Macklin, Jan Kautz, Umar Iqbal

Abstract:We introduce SimAvatar, a framework designed to generate simulation-ready clothed 3D human avatars from a text prompt. Current text-driven human avatar generation methods either model hair, clothing, and the human body using a unified geometry or produce hair and garments that are not easily adaptable for simulation within existing simulation pipelines. The primary challenge lies in representing the hair and garment geometry in a way that allows leveraging established prior knowledge from foundational image diffusion models (e.g., Stable Diffusion) while being simulation-ready using either physics or neural simulators. To address this task, we propose a two-stage framework that combines the flexibility of 3D Gaussians with simulation-ready hair strands and garment meshes. Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt. To leverage prior knowledge from foundational diffusion models, we attach 3D Gaussians to the body mesh, garment mesh, as well as hair strands and learn the avatar appearance through optimization. To drive the avatar given a pose sequence, we first apply physics simulators onto the garment meshes and hair strands. We then transfer the motion onto 3D Gaussians through carefully designed mechanisms for each body part. As a result, our synthesized avatars have vivid texture and realistic dynamic motion. To the best of our knowledge, our method is the first to produce highly realistic, fully simulation-ready 3D avatars, surpassing the capabilities of current approaches.

* Project website: https://nvlabs.github.io/SimAvatar/

Via

Access Paper or Ask Questions

Cluster-Enhanced Federated Graph Neural Network for Recommendation

Dec 11, 2024

Haiyan Wang, Ye Yuan

Abstract:Personal interaction data can be effectively modeled as individual graphs for each user in recommender systems.Graph Neural Networks (GNNs)-based recommendation techniques have become extremely popular since they can capture high-order collaborative signals between users and items by aggregating the individual graph into a global interactive graph.However, this centralized approach inherently poses a threat to user privacy and security. Recently, federated GNN-based recommendation techniques have emerged as a promising solution to mitigate privacy concerns. Nevertheless, current implementations either limit on-device training to an unaccompanied individual graphs or necessitate reliance on an extra third-party server to touch other individual graphs, which also increases the risk of privacy leakage. To address this challenge, we propose a Cluster-enhanced Federated Graph Neural Network framework for Recommendation, named CFedGR, which introduces high-order collaborative signals to augment individual graphs in a privacy preserving manner. Specifically, the server clusters the pretrained user representations to identify high-order collaborative signals. In addition, two efficient strategies are devised to reduce communication between devices and the server. Extensive experiments on three benchmark datasets validate the effectiveness of our proposed methods.

Via

Access Paper or Ask Questions

BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Dec 11, 2024

Shengze Wang, Jiefeng Li, Tianye Li, Ye Yuan, Henry Fuchs, Koki Nagano, Shalini De Mello, Michael Stengel

Figure 1 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Figure 2 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Figure 3 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Figure 4 for BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

Abstract:Single-image human mesh recovery is a challenging task due to the ill-posed nature of simultaneous body shape, pose, and camera estimation. Existing estimators work well on images taken from afar, but they break down as the person moves close to the camera. Moreover, current methods fail to achieve both accurate 3D pose and 2D alignment at the same time. Error is mainly introduced by inaccurate perspective projection heuristically derived from orthographic parameters. To resolve this long-standing challenge, we present our method BLADE which accurately recovers perspective parameters from a single image without heuristic assumptions. We start from the inverse relationship between perspective distortion and the person's Z-translation Tz, and we show that Tz can be reliably estimated from the image. We then discuss the important role of Tz for accurate human mesh recovery estimated from close-range images. Finally, we show that, once Tz and the 3D human mesh are estimated, one can accurately recover the focal length and full 3D translation. Extensive experiments on standard benchmarks and real-world close-range images show that our method is the first to accurately recover projection parameters from a single image, and consequently attain state-of-the-art accuracy on 3D pose estimation and 2D alignment for a wide range of images. https://research.nvidia.com/labs/amri/projects/blade/

Via

Access Paper or Ask Questions

ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous Driving

Dec 10, 2024

Rongqing Li, Changsheng Li, Yuhang Li, Hanjie Li, Yi Chen, Dongchun Ren, Ye Yuan, Guoren Wang

Figure 1 for ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous Driving

Figure 2 for ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous Driving

Figure 3 for ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous Driving

Figure 4 for ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous Driving

Abstract:Trajectory prediction of agents is crucial for the safety of autonomous vehicles, whereas previous approaches usually rely on sufficiently long-observed trajectory to predict the future trajectory of the agents. However, in real-world scenarios, it is not realistic to collect adequate observed locations for moving agents, leading to the collapse of most prediction models. For instance, when a moving car suddenly appears and is very close to an autonomous vehicle because of the obstruction, it is quite necessary for the autonomous vehicle to quickly and accurately predict the future trajectories of the car with limited observed trajectory locations. In light of this, we focus on investigating the task of instantaneous trajectory prediction, i.e., two observed locations are available during inference. To this end, we propose a general and plug-and-play instantaneous trajectory prediction approach, called ITPNet. Specifically, we propose a backward forecasting mechanism to reversely predict the latent feature representations of unobserved historical trajectories of the agent based on its two observed locations and then leverage them as complementary information for future trajectory prediction. Meanwhile, due to the inevitable existence of noise and redundancy in the predicted latent feature representations, we further devise a Noise Redundancy Reduction Former, aiming at to filter out noise and redundancy from unobserved trajectories and integrate the filtered features and observed features into a compact query for future trajectory predictions. In essence, ITPNet can be naturally compatible with existing trajectory prediction models, enabling them to gracefully handle the case of instantaneous trajectory prediction. Extensive experiments on the Argoverse and nuScenes datasets demonstrate ITPNet outperforms the baselines, and its efficacy with different trajectory prediction models.

* In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024)

Via

Access Paper or Ask Questions

DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model

Dec 08, 2024

Rongqing Li, Jiaqi Yu, Changsheng Li, Wenhan Luo, Ye Yuan, Guoren Wang

Figure 1 for DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model

Figure 2 for DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model

Figure 3 for DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model

Figure 4 for DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model

Abstract:Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes (e.g., the number of convolutional layers) of a target black-box model can be exposed through a sequence of queries. There is a crucial limitation: these works assume the training dataset of the target model is known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset. We put forward a general and principled framework DREAM, by casting this problem as out-of-distribution (OOD) generalization. In this way, we can learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental results demonstrate the superiority of our proposed method over the baselines.

* IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 12, pp. 8009-8022, Dec. 2024
* arXiv admin note: substantial text overlap with arXiv:2307.10997

Via

Access Paper or Ask Questions