Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sibo Tian

SELF-VLA: A Skill Enhanced Agentic Vision-Language-Action Framework for Contact-Rich Disassembly

Mar 10, 2026

Chang Liu, Sibo Tian, Xiao Liang, Minghui Zheng

Abstract:Disassembly automation has long been pursued to address the growing demand for efficient and proper recovery of valuable components from the end-of-life (EoL) electronic products. Existing approaches have demonstrated promising and regimented performance by decomposing the disassembly process into different subtasks. However, each subtask typically requires extensive data preparation, model training, and system management. Moreover, these approaches are often task- and component-specific, making them poorly suited to handle the variability and uncertainty of EoL products and limiting their generalization capabilities. All these factors restrict the practical deployment of current robotic disassembly systems and leave them highly reliant on human labor. With the recent development of foundation models in robotics, vision-language-action (VLA) models have shown impressive performance on standard robotic manipulation tasks, but their applicability to complex, contact-rich, and long-horizon industrial practices like disassembly, which requires sequential and precise manipulation, remains limited. To address this challenge, we propose SELF-VLA, an agentic VLA framework that integrates explicit disassembly skills. Experimental studies demonstrate that our framework significantly outperforms current state-of-the-art end-to-end VLA models on two contact-rich disassembly tasks. The video illustration can be found via https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2026/03/IROS-VLA-Video.mp4.

Via

Access Paper or Ask Questions

PrediFlow: A Flow-Based Prediction-Refinement Framework for Real-Time Human Motion Prediction in Human-Robot Collaboration

Dec 15, 2025

Sibo Tian, Minghui Zheng, Xiao Liang

Abstract:Stochastic human motion prediction is critical for safe and effective human-robot collaboration (HRC) in industrial remanufacturing, as it captures human motion uncertainties and multi-modal behaviors that deterministic methods cannot handle. While earlier works emphasize highly diverse predictions, they often generate unrealistic human motions. More recent methods focus on accuracy and real-time performance, yet there remains potential to improve prediction quality further without exceeding time budgets. Additionally, current research on stochastic human motion prediction in HRC typically considers human motion in isolation, neglecting the influence of robot motion on human behavior. To address these research gaps and enable real-time, realistic, and interaction-aware human motion prediction, we propose a novel prediction-refinement framework that integrates both human and robot observed motion to refine the initial predictions produced by a pretrained state-of-the-art predictor. The refinement module employs a Flow Matching structure to account for uncertainty. Experimental studies on the HRC desktop disassembly dataset demonstrate that our method significantly improves prediction accuracy while preserving the uncertainties and multi-modalities of human motion. Moreover, the total inference time of the proposed framework remains within the time budget, highlighting the effectiveness and practicality of our approach.

Via

Access Paper or Ask Questions

Grasping Force Control and Adaptation for a Cable-Driven Robotic Hand

Jul 27, 2024

Eric Mountain, Ean Weise, Sibo Tian, Beiwen Li, Xiao Liang, Minghui Zheng

Abstract:This paper introduces a unique force control and adaptation algorithm for a lightweight and low-complexity five-fingered robotic hand, namely an Integrated-Finger Robotic Hand (IFRH). The force control and adaptation algorithm is intuitive to design, easy to implement, and improves the grasping functionality through feedforward adaptation automatically. Specifically, we have extended Youla-parameterization which is traditionally used in feedback controller design into a feedforward iterative learning control algorithm (ILC). The uniqueness of such an extension is that both the feedback and feedforward controllers are parameterized over one unified design parameter which can be easily customized based on the desired closed-loop performance. While Youla-parameterization and ILC have been explored in the past on various applications, our unique parameterization and computational methods make the design intuitive and easy to implement. This provides both robust and adaptive learning capabilities, and our application rivals the complexity of many robotic hand control systems. Extensive experimental tests have been conducted to validate the effectiveness of our method.

Via

Access Paper or Ask Questions

Integrating Uncertainty-Aware Human Motion Prediction into Graph-Based Manipulator Motion Planning

May 16, 2024

Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

Abstract:There has been a growing utilization of industrial robots as complementary collaborators for human workers in re-manufacturing sites. Such a human-robot collaboration (HRC) aims to assist human workers in improving the flexibility and efficiency of labor-intensive tasks. In this paper, we propose a human-aware motion planning framework for HRC to effectively compute collision-free motions for manipulators when conducting collaborative tasks with humans. We employ a neural human motion prediction model to enable proactive planning for manipulators. Particularly, rather than blindly trusting and utilizing predicted human trajectories in the manipulator planning, we quantify uncertainties of the neural prediction model to further ensure human safety. Moreover, we integrate the uncertainty-aware prediction into a graph that captures key workspace elements and illustrates their interconnections. Then a graph neural network is leveraged to operate on the constructed graph. Consequently, robot motion planning considers both the dependencies among all the elements in the workspace and the potential influence of future movements of human workers. We experimentally validate the proposed planning framework using a 6-degree-of-freedom manipulator in a shared workspace where a human is performing disassembling tasks. The results demonstrate the benefits of our approach in terms of improving the smoothness and safety of HRC. A brief video introduction of this work is available as the supplemental materials.

Via

Access Paper or Ask Questions

KG-Planner: Knowledge-Informed Graph Neural Planning for Collaborative Manipulators

May 13, 2024

Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

Abstract:This paper presents a novel knowledge-informed graph neural planner (KG-Planner) to address the challenge of efficiently planning collision-free motions for robots in high-dimensional spaces, considering both static and dynamic environments involving humans. Unlike traditional motion planners that struggle with finding a balance between efficiency and optimality, the KG-Planner takes a different approach. Instead of relying solely on a neural network or imitating the motions of an oracle planner, our KG-Planner integrates explicit physical knowledge from the workspace. The integration of knowledge has two key aspects: (1) we present an approach to design a graph that can comprehensively model the workspace's compositional structure. The designed graph explicitly incorporates critical elements such as robot joints, obstacles, and their interconnections. This representation allows us to capture the intricate relationships between these elements. (2) We train a Graph Neural Network (GNN) that excels at generating nearly optimal robot motions. In particular, the GNN employs a layer-wise propagation rule to facilitate the exchange and update of information among workspace elements based on their connections. This propagation emphasizes the influence of these elements throughout the planning process. To validate the efficacy and efficiency of our KG-Planner, we conduct extensive experiments in both static and dynamic environments. These experiments include scenarios with and without human workers. The results of our approach are compared against existing methods, showcasing the superior performance of the KG-Planner. A short video introduction of this work is available (video link provided in the paper).

Via

Access Paper or Ask Questions

A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

Feb 20, 2024

Wansong Liu, Sibo Tian, Boyi Hu, Xiao Liang, Minghui Zheng

Figure 1 for A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

Figure 2 for A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

Figure 3 for A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

Figure 4 for A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

Abstract:This paper presents a deep learning enhanced adaptive unscented Kalman filter (UKF) for predicting human arm motion in the context of manufacturing. Unlike previous network-based methods that solely rely on captured human motion data, which is represented as bone vectors in this paper, we incorporate a human arm dynamic model into the motion prediction algorithm and use the UKF to iteratively forecast human arm motions. Specifically, a Lagrangian-mechanics-based physical model is employed to correlate arm motions with associated muscle forces. Then a Recurrent Neural Network (RNN) is integrated into the framework to predict future muscle forces, which are transferred back to future arm motions based on the dynamic model. Given the absence of measurement data for future human motions that can be input into the UKF to update the state, we integrate another RNN to directly predict human future motions and treat the prediction as surrogate measurement data fed into the UKF. A noteworthy aspect of this study involves the quantification of uncertainties associated with both the data-driven and physical models in one unified framework. These quantified uncertainties are used to dynamically adapt the measurement and process noises of the UKF over time. This adaption, driven by the uncertainties of the RNN models, addresses inaccuracies stemming from the data-driven model and mitigates discrepancies between the assumed and true physical models, ultimately enhancing the accuracy and robustness of our predictions. Compared to the traditional RNN-based prediction, our method demonstrates improved accuracy and robustness in extensive experimental validations of various types of human motions.

Via

Access Paper or Ask Questions

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Jul 30, 2023

Sibo Tian, Minghui Zheng, Xiao Liang

Figure 1 for TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Figure 2 for TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Figure 3 for TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Figure 4 for TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Abstract:Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. Additionally, we employ the discrete cosine transform to model motion sequences in the frequency space, thereby improving performance. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization to condition the prediction on past observed motion, we treat all inputs, including conditions, as tokens to create a more lightweight model compared to existing approaches. Extensive experimental studies are conducted on benchmark datasets to validate the effectiveness of our human motion prediction model.

Via

Access Paper or Ask Questions

DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles

Jul 07, 2023

Kareem A. Eltouny, Wansong Liu, Sibo Tian, Minghui Zheng, Xiao Liang

Figure 1 for DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles

Figure 2 for DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles

Figure 3 for DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles

Figure 4 for DE-TGN: Uncertainty-Aware Human Motion Forecasting using Deep Ensembles

Abstract:Ensuring the safety of human workers in a collaborative environment with robots is of utmost importance. Although accurate pose prediction models can help prevent collisions between human workers and robots, they are still susceptible to critical errors. In this study, we propose a novel approach called deep ensembles of temporal graph neural networks (DE-TGN) that not only accurately forecast human motion but also provide a measure of prediction uncertainty. By leveraging deep ensembles and employing stochastic Monte-Carlo dropout sampling, we construct a volumetric field representing a range of potential future human poses based on covariance ellipsoids. To validate our framework, we conducted experiments using three motion capture datasets including Human3.6M, and two human-robot interaction scenarios, achieving state-of-the-art prediction error. Moreover, we discovered that deep ensembles not only enable us to quantify uncertainty but also improve the accuracy of our predictions.

Via

Access Paper or Ask Questions