Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhitao Wang

T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates

Jul 10, 2025

Zhitao Wang, Hengyu Man, Wenrui Li, Xingtao Wang, Xiaopeng Fan, Debin Zhao

Abstract:Recent advances in video generation techniques have given rise to an emerging paradigm of generative video coding, aiming to achieve semantically accurate reconstructions in Ultra-Low Bitrate (ULB) scenarios by leveraging strong generative priors. However, most existing methods are limited by domain specificity (e.g., facial or human videos) or an excessive dependence on high-level text guidance, which often fails to capture motion details and results in unrealistic reconstructions. To address these challenges, we propose a Trajectory-Guided Generative Video Coding framework (dubbed T-GVC). T-GVC employs a semantic-aware sparse motion sampling pipeline to effectively bridge low-level motion tracking with high-level semantic understanding by extracting pixel-wise motion as sparse trajectory points based on their semantic importance, not only significantly reducing the bitrate but also preserving critical temporal semantic information. In addition, by incorporating trajectory-aligned loss constraints into diffusion processes, we introduce a training-free latent space guidance mechanism to ensure physically plausible motion patterns without sacrificing the inherent capabilities of generative models. Experimental results demonstrate that our framework outperforms both traditional codecs and state-of-the-art end-to-end video compression methods under ULB conditions. Furthermore, additional experiments confirm that our approach achieves more precise motion control than existing text-guided methods, paving the way for a novel direction of generative video coding guided by geometric motion modeling.

Via

Access Paper or Ask Questions

RL-OGM-Parking: Lidar OGM-Based Hybrid Reinforcement Learning Planner for Autonomous Parking

Feb 26, 2025

Zhitao Wang, Zhe Chen, Mingyang Jiang, Tong Qin, Ming Yang

Abstract:Autonomous parking has become a critical application in automatic driving research and development. Parking operations often suffer from limited space and complex environments, requiring accurate perception and precise maneuvering. Traditional rule-based parking algorithms struggle to adapt to diverse and unpredictable conditions, while learning-based algorithms lack consistent and stable performance in various scenarios. Therefore, a hybrid approach is necessary that combines the stability of rule-based methods and the generalizability of learning-based methods. Recently, reinforcement learning (RL) based policy has shown robust capability in planning tasks. However, the simulation-to-reality (sim-to-real) transfer gap seriously blocks the real-world deployment. To address these problems, we employ a hybrid policy, consisting of a rule-based Reeds-Shepp (RS) planner and a learning-based reinforcement learning (RL) planner. A real-time LiDAR-based Occupancy Grid Map (OGM) representation is adopted to bridge the sim-to-real gap, leading the hybrid policy can be applied to real-world systems seamlessly. We conducted extensive experiments both in the simulation environment and real-world scenarios, and the result demonstrates that the proposed method outperforms pure rule-based and learning-based methods. The real-world experiment further validates the feasibility and efficiency of the proposed method.

Via

Access Paper or Ask Questions

A Field Calibration Approach for Triaxial MEMS Gyroscopes Based on Gravity and Rotation Consistency

Oct 25, 2024

Yaqi Li, Li Wang, Zhitao Wang, Xiangqing Li, Steven W. Su

Figure 1 for A Field Calibration Approach for Triaxial MEMS Gyroscopes Based on Gravity and Rotation Consistency

Figure 2 for A Field Calibration Approach for Triaxial MEMS Gyroscopes Based on Gravity and Rotation Consistency

Figure 3 for A Field Calibration Approach for Triaxial MEMS Gyroscopes Based on Gravity and Rotation Consistency

Figure 4 for A Field Calibration Approach for Triaxial MEMS Gyroscopes Based on Gravity and Rotation Consistency

Abstract:This paper developed an efficient method for calibrating triaxial MEMS gyroscopes, which can be effectively utilized in the field environment. The core strategy is to utilize the criterion that the dot product of the measured gravity and the rotation speed in a fixed frame remains constant. To eliminate the impact of external acceleration, the calibration process involves separate procedures for measuring local gravity and rotation speed. Moreover, unlike existing approaches for auto calibration of triaxial sensors that often result in nonlinear optimization problems, the proposed method simplifies the estimation of the gyroscope scale factor by employing a linear least squares algorithm. Extensive numerical simulations have been conducted to analyze the proposed method's performance in calibrating the six-parameter triaxial gyroscope model, taking into consideration measurements corrupted by simulated noise. Experimental validation was also carried out using two commercially available MEMS inertial measurement units (LSM9DS1) and a servo motor. The experimental results effectively demonstrate the efficacy of the proposed calibration approach.

Via

Access Paper or Ask Questions

On-site scale factor linearity calibration of MEMS triaxial gyroscopes

May 06, 2024

Yaqi Li, Li Wang, Zhitao Wang, Xiangqing Li, Jiaojiao Li, Steven weidong Su

Figure 1 for On-site scale factor linearity calibration of MEMS triaxial gyroscopes

Figure 2 for On-site scale factor linearity calibration of MEMS triaxial gyroscopes

Figure 3 for On-site scale factor linearity calibration of MEMS triaxial gyroscopes

Figure 4 for On-site scale factor linearity calibration of MEMS triaxial gyroscopes

Abstract:The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environments. The core strategy of the method involves utilizing the fact that the dot product of the measured gravity and the rotational speed in a fixed frame remains constant. To eliminate the influence of rotating centrifugal force on the accelerometer, the accelerometer data is measured while stationary. The proposed calibration experiment scheme, which allows gyroscopic measurements when operating each axis at a specific rotation speed, making it easier to evaluate the linearity across a related speed range constituted by a series of rotation speeds. Moreover, solely the classical least squares algorithm proves adequate for estimating the scale factor, notably streamlining the analysis of the calibration process. Extensive numerical simulations were conducted to analyze the proposed method's performance in calibrating a triaxial gyroscope model. Experimental validation was also carried out using a commercially available MEMS inertial measurement unit (LSM9DS1 from Arduino nano 33 BLE SENSE) and a servo motor capable of controlling precise speed. The experimental results effectively demonstrate the efficacy of the proposed calibration approach.

Via

Access Paper or Ask Questions

XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

Jan 10, 2024

Zhitao Wang, Wei Wang, Zirao Li, Long Wang, Can Yi, Xinjie Xu, Luyang Cao, Hanjing Su, Shouzhi Chen, Jun Zhou

Abstract:In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work.

Via

Access Paper or Ask Questions

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Dec 22, 2023

Pengxiang Ding, Han Zhao, Zhitao Wang, Zhenyu Wei, Shangke Lyu, Donglin Wang

Figure 1 for QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Figure 2 for QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Figure 3 for QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Figure 4 for QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Abstract:The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying system design but limiting the synergy between different information streams. This compartmentalization poses challenges in achieving seamless autonomous reasoning, decision-making, and action execution. To address these limitations, a novel paradigm, named Vision-Language-Action tasks for QUAdruped Robots (QUAR-VLA), has been introduced in this paper. This approach tightly integrates visual information and instructions to generate executable actions, effectively merging perception, planning, and decision-making. The central idea is to elevate the overall intelligence of the robot. Within this framework, a notable challenge lies in aligning fine-grained instructions with visual perception information. This emphasizes the complexity involved in ensuring that the robot accurately interprets and acts upon detailed instructions in harmony with its visual observations. Consequently, we propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input and generates executable actions for real-world robots and present QUAdruped Robot Dataset (QUARD), a large-scale multi-task dataset including navigation, complex terrain locomotion, and whole-body manipulation tasks for training QUART models. Our extensive evaluation (4000 evaluation trials) shows that our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.

Via

Access Paper or Ask Questions

RotoGBML: Towards Out-of-Distribution Generalization for Gradient-Based Meta-Learning

Mar 12, 2023

Min Zhang, Zifeng Zhuang, Zhitao Wang, Donglin Wang, Wenbin Li

Figure 1 for RotoGBML: Towards Out-of-Distribution Generalization for Gradient-Based Meta-Learning

Figure 2 for RotoGBML: Towards Out-of-Distribution Generalization for Gradient-Based Meta-Learning

Figure 3 for RotoGBML: Towards Out-of-Distribution Generalization for Gradient-Based Meta-Learning

Figure 4 for RotoGBML: Towards Out-of-Distribution Generalization for Gradient-Based Meta-Learning

Abstract:Gradient-based meta-learning (GBML) algorithms are able to fast adapt to new tasks by transferring the learned meta-knowledge, while assuming that all tasks come from the same distribution (in-distribution, ID). However, in the real world, they often suffer from an out-of-distribution (OOD) generalization problem, where tasks come from different distributions. OOD exacerbates inconsistencies in magnitudes and directions of task gradients, which brings challenges for GBML to optimize the meta-knowledge by minimizing the sum of task gradients in each minibatch. To address this problem, we propose RotoGBML, a novel approach to homogenize OOD task gradients. RotoGBML uses reweighted vectors to dynamically balance diverse magnitudes to a common scale and uses rotation matrixes to rotate conflicting directions close to each other. To reduce overhead, we homogenize gradients with the features rather than the network parameters. On this basis, to avoid the intervention of non-causal features (e.g., backgrounds), we also propose an invariant self-information (ISI) module to extract invariant causal features (e.g., the outlines of objects). Finally, task gradients are homogenized based on these invariant causal features. Experiments show that RotoGBML outperforms other state-of-the-art methods on various few-shot image classification benchmarks.

* 13 pages

Via

Access Paper or Ask Questions

Pairwise Learning for Neural Link Prediction

Dec 22, 2021

Zhitao Wang, Yong Zhou, Litao Hong, Yuanhang Zou, Hanjing Su, Shouzhi Chen

Figure 1 for Pairwise Learning for Neural Link Prediction

Figure 2 for Pairwise Learning for Neural Link Prediction

Figure 3 for Pairwise Learning for Neural Link Prediction

Figure 4 for Pairwise Learning for Neural Link Prediction

Abstract:In this paper, we aim at providing an effective Pairwise Learning Neural Link Prediction (PLNLP) framework. The framework treats link prediction as a pairwise learning to rank problem and consists of four main components, i.e., neighborhood encoder, link predictor, negative sampler and objective function. The framework is flexible that any generic graph neural convolution or link prediction specific neural architecture could be employed as neighborhood encoder. For link predictor, we design different scoring functions, which could be selected based on different types of graphs. In negative sampler, we provide several sampling strategies, which are problem specific. As for objective function, we propose to use an effective ranking loss, which approximately maximizes the standard ranking metric AUC. We evaluate the proposed PLNLP framework on 4 link property prediction datasets of Open Graph Benchmark, including ogbl-ddi, ogbl-collab, ogbl-ppa and ogbl-ciation2. PLNLP achieves top 1 performance on ogbl-ddi and ogbl-collab, and top 2 performance on ogbl-ciation2 only with basic neural architecture. The performance demonstrates the effectiveness of PLNLP.

Via

Access Paper or Ask Questions

Reinforcement Learning based Negotiation-aware Motion Planning of Autonomous Vehicles

Jul 08, 2021

Zhitao Wang, Yuzheng Zhuang, Qiang Gu, Dong Chen, Hongbo Zhang, Wulong Liu

Figure 1 for Reinforcement Learning based Negotiation-aware Motion Planning of Autonomous Vehicles

Figure 2 for Reinforcement Learning based Negotiation-aware Motion Planning of Autonomous Vehicles

Figure 3 for Reinforcement Learning based Negotiation-aware Motion Planning of Autonomous Vehicles

Figure 4 for Reinforcement Learning based Negotiation-aware Motion Planning of Autonomous Vehicles

Abstract:For autonomous vehicles integrating onto roadways with human traffic participants, it requires understanding and adapting to the participants' intention and driving styles by responding in predictable ways without explicit communication. This paper proposes a reinforcement learning based negotiation-aware motion planning framework, which adopts RL to adjust the driving style of the planner by dynamically modifying the prediction horizon length of the motion planner in real time adaptively w.r.t the event of a change in environment, typically triggered by traffic participants' switch of intents with different driving styles. The framework models the interaction between the autonomous vehicle and other traffic participants as a Markov Decision Process. A temporal sequence of occupancy grid maps are taken as inputs for RL module to embed an implicit intention reasoning. Curriculum learning is employed to enhance the training efficiency and the robustness of the algorithm. We applied our method to narrow lane navigation in both simulation and real world to demonstrate that the proposed method outperforms the common alternative due to its advantage in alleviating the social dilemma problem with proper negotiation skills.

Via

Access Paper or Ask Questions