Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Gu

Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling

Feb 23, 2026

Yirui Sun, Guangyu Zhuge, Keliang Liu, Jie Gu, Zhihao xia, Qionglin Ren, Chunxu tian, Zhongxue Ga

Abstract:Realizing dexterous embodied manipulation necessitates the deep integration of heterogeneous multimodal sensory inputs. However, current vision-centric paradigms often overlook the critical force and geometric feedback essential for complex tasks. This paper presents DeMUSE, a Deep Multimodal Unified Sparse Experts framework leveraging a Diffusion Transformer to integrate RGB, depth, and 6-axis force into a unified serialized stream. Adaptive Modality-specific Normalization (AdaMN) is employed to recalibrate modality-aware features, mitigating representation imbalance and harmonizing the heterogeneous distributions of multi-sensory signals. To facilitate efficient scaling, the architecture utilizes a Sparse Mixture-of-Experts (MoE) with shared experts, increasing model capacity for physical priors while maintaining the low inference latency required for real-time control. A Joint denoising objective synchronously synthesizes environmental evolution and action sequences to ensure physical consistency. Achieving success rates of 83.2% and 72.5% in simulation and real-world trials, DeMUSE demonstrates state-of-the-art performance, validating the necessity of deep multi-sensory integration for complex physical interactions.

Via

Access Paper or Ask Questions

Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos

Feb 14, 2026

Can Li, Jie Gu, Jingmin Chen, Fangzhou Qiu, Lei Sun

Abstract:Understanding dynamic scenes from casual videos is critical for scalable robot learning, yet four-dimensional (4D) reconstruction under strictly monocular settings remains highly ill-posed. To address this challenge, our key insight is that real-world dynamics exhibits a multi-scale regularity from object to particle level. To this end, we design the multi-scale dynamics mechanism that factorizes complex motion fields. Within this formulation, we propose Gaussian sequences with multi-scale dynamics, a novel representation for dynamic 3D Gaussians derived through compositions of multi-level motion. This layered structure substantially alleviates ambiguity of reconstruction and promotes physically plausible dynamics. We further incorporate multi-modal priors from vision foundation models to establish complementary supervision, constraining the solution space and improving the reconstruction fidelity. Our approach enables accurate and globally consistent 4D reconstruction from monocular casual videos. Experiments of dynamic novel-view synthesis (NVS) on benchmark and real-world manipulation datasets demonstrate considerable improvements over existing methods.

Via

Access Paper or Ask Questions

Rhombot: Rhombus-shaped Modular Robots for Stable, Medium-Independent Reconfiguration Motion

Jan 27, 2026

Jie Gu, Yirui Sun, Zhihao Xia, Tin Lun Lam, Chunxu Tian, Dan Zhang

Abstract:In this paper, we present Rhombot, a novel deformable planar lattice modular self-reconfigurable robot (MSRR) with a rhombus shaped module. Each module consists of a parallelogram skeleton with a single centrally mounted actuator that enables folding and unfolding along its diagonal. The core design philosophy is to achieve essential MSRR functionalities such as morphing, docking, and locomotion with minimal control complexity. This enables a continuous and stable reconfiguration process that is independent of the surrounding medium, allowing the system to reliably form various configurations in diverse environments. To leverage the unique kinematics of Rhombot, we introduce morphpivoting, a novel motion primitive for reconfiguration that differs from advanced MSRR systems, and propose a strategy for its continuous execution. Finally, a series of physical experiments validate the module's stable reconfiguration ability, as well as its positional and docking accuracy.

Via

Access Paper or Ask Questions

Self-Reconfiguration Planning for Deformable Quadrilateral Modular Robots

Jan 27, 2026

Jie Gu, Hongrun Gao, Zhihao Xia, Yirun Sun, Chunxu Tian, Dan Zhang

Abstract:For lattice modular self-reconfigurable robots (MSRRs), maintaining stable connections during reconfiguration is crucial for physical feasibility and deployability. This letter presents a novel self-reconfiguration planning algorithm for deformable quadrilateral MSRRs that guarantees stable connection. The method first constructs feasible connect/disconnect actions using a virtual graph representation, and then organizes these actions into a valid execution sequence through a Dependence-based Reverse Tree (DRTree) that resolves interdependencies. We also prove that reconfiguration sequences satisfying motion characteristics exist for any pair of configurations with seven or more modules (excluding linear topologies). Finally, comparisons with a modified BiRRT algorithm highlight the superior efficiency and stability of our approach, while deployment on a physical robotic platform confirms its practical feasibility.

Via

Access Paper or Ask Questions

LLA: Enhancing Security and Privacy for Generative Models with Logic-Locked Accelerators

Dec 26, 2025

You Li, Guannan Zhao, Yuhao Ju, Yunqi He, Jie Gu, Hai Zhou

Abstract:We introduce LLA, an effective intellectual property (IP) protection scheme for generative AI models. LLA leverages the synergy between hardware and software to defend against various supply chain threats, including model theft, model corruption, and information leakage. On the software side, it embeds key bits into neurons that can trigger outliers to degrade performance and applies invariance transformations to obscure the key values. On the hardware side, it integrates a lightweight locking module into the AI accelerator while maintaining compatibility with various dataflow patterns and toolchains. An accelerator with a pre-stored secret key acts as a license to access the model services provided by the IP owner. The evaluation results show that LLA can withstand a broad range of oracle-guided key optimization attacks, while incurring a minimal computational overhead of less than 0.1% for 7,168 key bits.

* Accepted by AAAI'26 as a conference paper and selected for oral presentation

Via

Access Paper or Ask Questions

Egocentric Instruction-oriented Affordance Prediction via Large Multimodal Model

Aug 25, 2025

Bokai Ji, Jie Gu, Xiaokang Ma, Chu Tang, Jingmin Chen, Guangxia Li

Abstract:Affordance is crucial for intelligent robots in the context of object manipulation. In this paper, we argue that affordance should be task-/instruction-dependent, which is overlooked by many previous works. That is, different instructions can lead to different manipulation regions and directions even for the same object. According to this observation, we present a new dataset comprising fifteen thousand object-instruction-affordance triplets. All scenes in the dataset are from an egocentric viewpoint, designed to approximate the perspective of a human-like robot. Furthermore, we investigate how to enable large multimodal models (LMMs) to serve as affordance predictors by implementing a ``search against verifiers'' pipeline. An LMM is asked to progressively predict affordances, with the output at each step being verified by itself during the iterative process, imitating a reasoning process. Experiments show that our method not only unlocks new instruction-oriented affordance prediction capabilities, but also achieves outstanding performance broadly.

Via

Access Paper or Ask Questions

Stimulating Imagination: Towards General-purpose Object Rearrangement

Aug 03, 2024

Jianyang Wu, Jie Gu, Xiaokang Ma, Chu Tang, Jingmin Chen

Figure 1 for Stimulating Imagination: Towards General-purpose Object Rearrangement

Figure 2 for Stimulating Imagination: Towards General-purpose Object Rearrangement

Figure 3 for Stimulating Imagination: Towards General-purpose Object Rearrangement

Figure 4 for Stimulating Imagination: Towards General-purpose Object Rearrangement

Abstract:General-purpose object placement is a fundamental capability of an intelligent generalist robot, i.e., being capable of rearranging objects following human instructions even in novel environments. To achieve this, we break the rearrangement down into three parts, including object localization, goal imagination and robot control, and propose a framework named SPORT. SPORT leverages pre-trained large vision models for broad semantic reasoning about objects, and learns a diffusion-based 3D pose estimator to ensure physically-realistic results. Only object types (to be moved or reference) are communicated between these two parts, which brings two benefits. One is that we can fully leverage the powerful ability of open-set object localization and recognition since no specific fine-tuning is needed for robotic scenarios. Furthermore, the diffusion-based estimator only need to "imagine" the poses of the moving and reference objects after the placement, while no necessity for their semantic information. Thus the training burden is greatly reduced and no massive training is required. The training data for goal pose estimation is collected in simulation and annotated with GPT-4. A set of simulation and real-world experiments demonstrate the potential of our approach to accomplish general-purpose object rearrangement, placing various objects following precise instructions.

* 9 pages

Via

Access Paper or Ask Questions

Tackling Missing Values in Probabilistic Wind Power Forecasting: A Generative Approach

Mar 06, 2024

Honglin Wen, Pierre Pinson, Jie Gu, Zhijian Jin

Abstract:Machine learning techniques have been successfully used in probabilistic wind power forecasting. However, the issue of missing values within datasets due to sensor failure, for instance, has been overlooked for a long time. Although it is natural to consider addressing this issue by imputing missing values before model estimation and forecasting, we suggest treating missing values and forecasting targets indifferently and predicting all unknown values simultaneously based on observations. In this paper, we offer an efficient probabilistic forecasting approach by estimating the joint distribution of features and targets based on a generative model. It is free of preprocessing, and thus avoids introducing potential errors. Compared with the traditional "impute, then predict" pipeline, the proposed approach achieves better performance in terms of continuous ranked probability score.

* 8 pages, to be presented at Power Systems Computation Conference (PSCC) 2024

Via

Access Paper or Ask Questions

Continuous and Distribution-free Probabilistic Wind Power Forecasting: A Conditional Normalizing Flow Approach

Jun 06, 2022

Honglin Wen, Pierre Pinson, Jinghuan Ma, Jie Gu, Zhijian Jin

Figure 1 for Continuous and Distribution-free Probabilistic Wind Power Forecasting: A Conditional Normalizing Flow Approach

Figure 2 for Continuous and Distribution-free Probabilistic Wind Power Forecasting: A Conditional Normalizing Flow Approach

Figure 3 for Continuous and Distribution-free Probabilistic Wind Power Forecasting: A Conditional Normalizing Flow Approach

Figure 4 for Continuous and Distribution-free Probabilistic Wind Power Forecasting: A Conditional Normalizing Flow Approach

Abstract:We present a data-driven approach for probabilistic wind power forecasting based on conditional normalizing flow (CNF). In contrast with the existing, this approach is distribution-free (as for non-parametric and quantile-based approaches) and can directly yield continuous probability densities, hence avoiding quantile crossing. It relies on a base distribution and a set of bijective mappings. Both the shape parameters of the base distribution and the bijective mappings are approximated with neural networks. Spline-based conditional normalizing flow is considered owing to its non-affine characteristics. Over the training phase, the model sequentially maps input examples onto samples of base distribution, given the conditional contexts, where parameters are estimated through maximum likelihood. To issue probabilistic forecasts, one eventually maps samples of the base distribution into samples of a desired distribution. Case studies based on open datasets validate the effectiveness of the proposed model, and allows us to discuss its advantages and caveats with respect to the state of the art.

* The second revision to IEEE Transactions on Sustainable Energy

Via

Access Paper or Ask Questions

Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution

Jan 24, 2022

Zhi-Ping Liu, Min-Gang Zhou, Wen-Bo Liu, Chen-Long Li, Jie Gu, Hua-Lei Yin, Zeng-Bing Chen

Figure 1 for Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution

Figure 2 for Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution

Figure 3 for Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution

Figure 4 for Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution

Abstract:Continuous-variable quantum key distribution (CV QKD) with discrete modulation has attracted increasing attention due to its experimental simplicity, lower-cost implementation and compatibility with classical optical communication. Correspondingly, some novel numerical methods have been proposed to analyze the security of these protocols against collective attacks, which promotes key rates over one hundred kilometers of fiber distance. However, numerical methods are limited by their calculation time and resource consumption, for which they cannot play more roles on mobile platforms in quantum networks. To improve this issue, a neural network model predicting key rates in nearly real time has been proposed previously. Here, we go further and show a neural network model combined with Bayesian optimization. This model automatically designs the best architecture of neural network computing key rates in real time. We demonstrate our model with two variants of CV QKD protocols with quaternary modulation. The results show high reliability with secure probability as high as $99.15\%-99.59\%$, considerable tightness and high efficiency with speedup of approximately $10^7$ in both cases. This inspiring model enables the real-time computation of unstructured quantum key distribution protocols' key rate more automatically and efficiently, which has met the growing needs of implementing QKD protocols on moving platforms.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions