Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Boyang Yu

HoloAgent-0: A Unified Embodied Agent Framework with 3D Spatial Memory

Jun 22, 2026

Xiaolin Zhou, Liu Liu, Tingyang Xiao, Wei Feng, Fa Fu, Xinrui Meng, Xinjie Wang, Jialiang Han, Boyang Yu, Yun Du(+2 more)

Abstract:LLM agents follow a practical execution loop in digital environments: they reason over structured states, invoke tools, inspect feedback, and revise actions. Extending this loop to physical robots is difficult because physical execution is continuous, embodiment-dependent, uncertain, and constrained by safety. Existing embodied-AI systems have advanced manipulation, spatial understanding, navigation, and humanoid control, but these capabilities often remain specialized modules or loosely coupled decision loops. In this work, we introduce HoloAgent-0, a unified embodied agent framework for real-world robot deployment. Embodied AgentOS converts language instructions into executable skill graphs, schedules robot resources, monitors execution, and triggers clarification or re-planning from runtime feedback. HoloAgent-0 organizes heterogeneous robot models and controllers through three coupled layers: Embodied AgentOS for closed-loop execution, 3D spatial memory for physical world grounding, and embodied skills for robot action. We deploy HoloAgent-0 on real hardware and evaluate its spatial memory, long-horizon navigation, and closed-loop execution across motion generation, object search, cross-robot coordination, and mobile manipulation.

Via

Access Paper or Ask Questions

MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots

May 23, 2026

Yusen Feng, Xiang Wang, Heyuan Yao, Zixi Kang, Xinyu Huo, Boyang Yu, Pengyun Qiu, Ruijie Zhao, Baoquan Chen, Libin Liu

Abstract:This paper presents MuGen, a data-driven framework for learning and deploying multi-skill locomotion on humanoid robots. MuGen enables a robot to perform expressive motions like humans under the guidance of example motion sequences. To achieve this, we employ vector-quantized autoencoders (VQ-VAEs) trained with model-based reinforcement learning, resulting in a generative representation of locomotion that captures key patterns of human motion from hours of heterogeneous human performance data. We employ a teacher-student learning framework and develop a new policy distillation strategy to enable a deployable student policy learning this efficient latent representation. This policy allows the robot to track and mimic unseen human motions and further enables the robot to reuse the learned latent space for other tasks. We demonstrate the effectiveness of our framework through a diverse set of motions and accurate execution.

Via

Access Paper or Ask Questions

3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography

Feb 04, 2025

Weicheng Zhu, Haoxu Huang, Huanze Tang, Rushabh Musthyala, Boyang Yu, Long Chen, Emilio Vega, Thomas O'Donnell, Seena Dehkharghani, Jennifer A. Frontera(+3 more)

Figure 1 for 3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography

Figure 2 for 3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography

Figure 3 for 3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography

Figure 4 for 3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography

Abstract:Head computed tomography (CT) imaging is a widely-used imaging modality with multitudes of medical indications, particularly in assessing pathology of the brain, skull, and cerebrovascular system. It is commonly the first-line imaging in neurologic emergencies given its rapidity of image acquisition, safety, cost, and ubiquity. Deep learning models may facilitate detection of a wide range of diseases. However, the scarcity of high-quality labels and annotations, particularly among less common conditions, significantly hinders the development of powerful models. To address this challenge, we introduce FM-CT: a Foundation Model for Head CT for generalizable disease detection, trained using self-supervised learning. Our approach pre-trains a deep learning model on a large, diverse dataset of 361,663 non-contrast 3D head CT scans without the need for manual annotations, enabling the model to learn robust, generalizable features. To investigate the potential of self-supervised learning in head CT, we employed both discrimination with self-distillation and masked image modeling, and we construct our model in 3D rather than at the slice level (2D) to exploit the structure of head CT scans more comprehensively and efficiently. The model's downstream classification performance is evaluated using internal and three external datasets, encompassing both in-distribution (ID) and out-of-distribution (OOD) data. Our results demonstrate that the self-supervised foundation model significantly improves performance on downstream diagnostic tasks compared to models trained from scratch and previous 3D CT foundation models on scarce annotated datasets. This work highlights the effectiveness of self-supervised learning in medical imaging and sets a new benchmark for head CT image analysis in 3D, enabling broader use of artificial intelligence for head CT-based diagnosis.

* Under Review Preprint

Via

Access Paper or Ask Questions

Benchmarking Large Language Model Volatility

Nov 26, 2023

Boyang Yu

Figure 1 for Benchmarking Large Language Model Volatility

Figure 2 for Benchmarking Large Language Model Volatility

Figure 3 for Benchmarking Large Language Model Volatility

Figure 4 for Benchmarking Large Language Model Volatility

Abstract:The impact of non-deterministic outputs from Large Language Models (LLMs) is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the language model decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.

* 7 pages, 2 figures, Workshop on AI Safety and Robustness In Finance, ICAIF 2023

Via

Access Paper or Ask Questions

Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects

Nov 21, 2023

Boyang Yu, Aakash Kaku, Kangning Liu, Avinash Parnandi, Emily Fokas, Anita Venkatesan, Natasha Pandit, Rajesh Ranganath, Heidi Schambra, Carlos Fernandez-Granda

Figure 1 for Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects

Figure 2 for Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects

Figure 3 for Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects

Figure 4 for Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects

Abstract:Automatic assessment of impairment and disease severity is a key challenge in data-driven medicine. We propose a novel framework to address this challenge, which leverages AI models trained exclusively on healthy individuals. The COnfidence-Based chaRacterization of Anomalies (COBRA) score exploits the decrease in confidence of these models when presented with impaired or diseased patients to quantify their deviation from the healthy population. We applied the COBRA score to address a key limitation of current clinical evaluation of upper-body impairment in stroke patients. The gold-standard Fugl-Meyer Assessment (FMA) requires in-person administration by a trained assessor for 30-45 minutes, which restricts monitoring frequency and precludes physicians from adapting rehabilitation protocols to the progress of each patient. The COBRA score, computed automatically in under one minute, is shown to be strongly correlated with the FMA on an independent test cohort for two different data modalities: wearable sensors ($\rho = 0.845$, 95% CI [0.743,0.908]) and video ($\rho = 0.746$, 95% C.I [0.594, 0.847]). To demonstrate the generalizability of the approach to other conditions, the COBRA score was also applied to quantify severity of knee osteoarthritis from magnetic-resonance imaging scans, again achieving significant correlation with an independent clinical assessment ($\rho = 0.644$, 95% C.I [0.585,0.696]).

* 32 pages, 10 figures

Via

Access Paper or Ask Questions

Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events

Jul 12, 2023

Boyang Yu, Nikolai Hartmann, Luca Schinnerl, Thomas Kuhr

Figure 1 for Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events

Figure 2 for Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events

Figure 3 for Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events

Figure 4 for Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events

Abstract:When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus, filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and investigated statistical methods including sampling and reweighting to deal with the biases introduced by the filtering.

* to be published in ACAT 2022 proceedings in the Journal Of Physics: Conference Series

Via

Access Paper or Ask Questions

Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability

Apr 01, 2023

Haoyi Xiong, Xuhong Li, Boyang Yu, Zhanxing Zhu, Dongrui Wu, Dejing Dou

Abstract:Random label noises (or observational noises) widely exist in practical machine learning settings. While previous studies primarily focus on the affects of label noises to the performance of learning, our work intends to investigate the implicit regularization effects of the label noises, under mini-batch sampling settings of stochastic gradient descent (SGD), with assumptions that label noises are unbiased. Specifically, we analyze the learning dynamics of SGD over the quadratic loss with unbiased label noises, where we model the dynamics of SGD as a stochastic differentiable equation (SDE) with two diffusion terms (namely a Doubly Stochastic Model). While the first diffusion term is caused by mini-batch sampling over the (label-noiseless) loss gradients as many other works on SGD, our model investigates the second noise term of SGD dynamics, which is caused by mini-batch sampling over the label noises, as an implicit regularizer. Our theoretical analysis finds such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters (namely inference stability). Though similar phenomenon have been investigated, our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more generalizable result with convergence of approximation proved. To validate our analysis, we design two sets of empirical studies to analyze the implicit regularizer of SGD with unbiased random label noises for deep neural networks training and linear regression.

* The complete manuscript of our previous submission to ICLR'21 (https://openreview.net/forum?id=g4szfsQUdy3). This manuscript was major done in 2021. We gave try to some venues but unfortunately haven't made it accepted yet

Via

Access Paper or Ask Questions

4D Facial Expression Diffusion Model

Mar 29, 2023

Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo

Figure 1 for 4D Facial Expression Diffusion Model

Figure 2 for 4D Facial Expression Diffusion Model

Figure 3 for 4D Facial Expression Diffusion Model

Figure 4 for 4D Facial Expression Diffusion Model

Abstract:Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.

Via

Access Paper or Ask Questions

Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Nov 24, 2021

Zhining Liu, Pengfei Wei, Zhepei Wei, Boyang Yu, Jing Jiang, Wei Cao, Jiang Bian, Yi Chang

Figure 1 for Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Figure 2 for Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Figure 3 for Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Figure 4 for Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Abstract:Imbalanced Learning (IL) is an important problem that widely exists in data mining applications. Typical IL methods utilize intuitive class-wise resampling or reweighting to directly balance the training set. However, some recent research efforts in specific domains show that class-imbalanced learning can be achieved without class-wise manipulation. This prompts us to think about the relationship between the two different IL strategies and the nature of the class imbalance. Fundamentally, they correspond to two essential imbalances that exist in IL: the difference in quantity between examples from different classes as well as between easy and hard examples within a single class, i.e., inter-class and intra-class imbalance. Existing works fail to explicitly take both imbalances into account and thus suffer from suboptimal performance. In light of this, we present Duple-Balanced Ensemble, namely DUBE , a versatile ensemble learning framework. Unlike prevailing methods, DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation, which allows it to achieve competitive performance while being computationally efficient. We also present a detailed discussion and analysis about the pros and cons of different inter/intra-class balancing strategies based on DUBE . Extensive experiments validate the effectiveness of the proposed method. Code and examples are available at https://github.com/ICDE2022Sub/duplebalance.

* 14 pages, 4 tables, 11 figures

Via

Access Paper or Ask Questions

IMBENS: Ensemble Class-imbalanced Learning in Python

Nov 24, 2021

Zhining Liu, Zhepei Wei, Erxin Yu, Qiang Huang, Kai Guo, Boyang Yu, Zhaonian Cai, Hangting Ye, Wei Cao, Jiang Bian(+3 more)

Figure 1 for IMBENS: Ensemble Class-imbalanced Learning in Python

Figure 2 for IMBENS: Ensemble Class-imbalanced Learning in Python

Figure 3 for IMBENS: Ensemble Class-imbalanced Learning in Python

Abstract:imbalanced-ensemble, abbreviated as imbens, is an open-source Python toolbox for quick implementing and deploying ensemble learning algorithms on class-imbalanced data. It provides access to multiple state-of-art ensemble imbalanced learning (EIL) methods, visualizer, and utility functions for dealing with the class imbalance problem. These ensemble methods include resampling-based, e.g., under/over-sampling, and reweighting-based ones, e.g., cost-sensitive learning. Beyond the implementation, we also extend conventional binary EIL algorithms with new functionalities like multi-class support and resampling scheduler, thereby enabling them to handle more complex tasks. The package was developed under a simple, well-documented API design follows that of scikit-learn for increased ease of use. imbens is released under the MIT open-source license and can be installed from Python Package Index (PyPI). Source code, binaries, detailed documentation, and usage examples are available at https://github.com/ZhiningLiu1998/imbalanced-ensemble.

* 6 pages, 2 tables, 1 figure

Via

Access Paper or Ask Questions