Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zixing Wang

FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection

Aug 27, 2025

Yuhang Zhao, Zixing Wang

Figure 1 for FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection

Figure 2 for FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection

Figure 3 for FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection

Figure 4 for FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection

Abstract:End-to-end object detectors offer a promising NMS-free paradigm for real-time applications, yet their high computational cost remains a significant barrier, particularly for complex scenarios like intersection traffic monitoring. To address this challenge, we propose FlowDet, a high-speed detector featuring a decoupled encoder optimization strategy applied to the DETR architecture. Specifically, FlowDet employs a novel Geometric Deformable Unit (GDU) for traffic-aware geometric modeling and a Scale-Aware Attention (SAA) module to maintain high representational power across extreme scale variations. To rigorously evaluate the model's performance in environments with severe occlusion and high object density, we collected the Intersection-Flow-5k dataset, a new challenging scene for this task. Evaluated on Intersection-Flow-5k, FlowDet establishes a new state-of-the-art. Compared to the strong RT-DETR baseline, it improves AP(test) by 1.5% and AP50(test) by 1.6%, while simultaneously reducing GFLOPs by 63.2% and increasing inference speed by 16.2%. Our work demonstrates a new path towards building highly efficient and accurate detectors for demanding, real-world perception systems. The Intersection-Flow-5k dataset is available at https://github.com/AstronZh/Intersection-Flow-5K.

* Accepted by PRCV 2025. Project page with code and dataset: https://github.com/AstronZh/Intersection-Flow-5K

Via

Access Paper or Ask Questions

Dynamic Robot Tool Use with Vision Language Models

May 02, 2025

Noah Trupin, Zixing Wang, Ahmed H. Qureshi

Abstract:Tool use enhances a robot's task capabilities. Recent advances in vision-language models (VLMs) have equipped robots with sophisticated cognitive capabilities for tool-use applications. However, existing methodologies focus on elementary quasi-static tool manipulations or high-level tool selection while neglecting the critical aspect of task-appropriate tool grasping. To address this limitation, we introduce inverse Tool-Use Planning (iTUP), a novel VLM-driven framework that enables grounded fine-grained planning for versatile robotic tool use. Through an integrated pipeline of VLM-based tool and contact point grounding, position-velocity trajectory planning, and physics-informed grasp generation and selection, iTUP demonstrates versatility across (1) quasi-static and more challenging (2) dynamic and (3) cluster tool-use tasks. To ensure robust planning, our framework integrates stable and safe task-aware grasping by reasoning over semantic affordances and physical constraints. We evaluate iTUP and baselines on a comprehensive range of realistic tool use tasks including precision hammering, object scooping, and cluster sweeping. Experimental results demonstrate that iTUP ensures a thorough grounding of cognition and planning for challenging robot tool use across diverse environments.

* In submission and under review

Via

Access Paper or Ask Questions

Implicit Physics-aware Policy for Dynamic Manipulation of Rigid Objects via Soft Body Tools

Feb 08, 2025

Zixing Wang, Ahmed H. Qureshi

Figure 1 for Implicit Physics-aware Policy for Dynamic Manipulation of Rigid Objects via Soft Body Tools

Figure 2 for Implicit Physics-aware Policy for Dynamic Manipulation of Rigid Objects via Soft Body Tools

Figure 3 for Implicit Physics-aware Policy for Dynamic Manipulation of Rigid Objects via Soft Body Tools

Figure 4 for Implicit Physics-aware Policy for Dynamic Manipulation of Rigid Objects via Soft Body Tools

Abstract:Recent advancements in robot tool use have unlocked their usage for novel tasks, yet the predominant focus is on rigid-body tools, while the investigation of soft-body tools and their dynamic interaction with rigid bodies remains unexplored. This paper takes a pioneering step towards dynamic one-shot soft tool use for manipulating rigid objects, a challenging problem posed by complex interactions and unobservable physical properties. To address these problems, we propose the Implicit Physics-aware (IPA) policy, designed to facilitate effective soft tool use across various environmental configurations. The IPA policy conducts system identification to implicitly identify physics information and predict goal-conditioned, one-shot actions accordingly. We validate our approach through a challenging task, i.e., transporting rigid objects using soft tools such as ropes to distant target positions in a single attempt under unknown environment physics parameters. Our experimental results indicate the effectiveness of our method in efficiently identifying physical properties, accurately predicting actions, and smoothly generalizing to real-world environments. The related video is available at: https://youtu.be/4hPrUDTc4Rg?si=WUZrT2vjLMt8qRWA

* ICRA 2025

Via

Access Paper or Ask Questions

Passive iFIR filters for data-driven control

Mar 11, 2024

Zixing Wang, Yongkang Huo, Fulvio Forni

Figure 1 for Passive iFIR filters for data-driven control

Figure 2 for Passive iFIR filters for data-driven control

Figure 3 for Passive iFIR filters for data-driven control

Figure 4 for Passive iFIR filters for data-driven control

Abstract:We consider the design of a new class of passive iFIR controllers given by the parallel action of an integrator and a finite impulse response filter. iFIRs are more expressive than PID controllers but retain their features and simplicity. The paper provides a model-free data-driven design for passive iFIR controllers based on virtual reference feedback tuning. Passivity is enforced through constrained optimization (three different formulations are discussed). The proposed design does not rely on large datasets or accurate plant models.

* 6 pages, 8 figures, Submitted to IEEE Control Systems Letters (L-CSS) with the option to present it to 2024 Conference on Decision and Control (CDC 2024)

Via

Access Paper or Ask Questions

AnyPose: Anytime 3D Human Pose Forecasting via Neural Ordinary Differential Equations

Sep 09, 2023

Zixing Wang, Ahmed H. Qureshi

Abstract:Anytime 3D human pose forecasting is crucial to synchronous real-world human-machine interaction, where the term ``anytime" corresponds to predicting human pose at any real-valued time step. However, to the best of our knowledge, all the existing methods in human pose forecasting perform predictions at preset, discrete time intervals. Therefore, we introduce AnyPose, a lightweight continuous-time neural architecture that models human behavior dynamics with neural ordinary differential equations. We validate our framework on the Human3.6M, AMASS, and 3DPW dataset and conduct a series of comprehensive analyses towards comparison with existing methods and the intersection of human pose and neural ordinary differential equations. Our results demonstrate that AnyPose exhibits high-performance accuracy in predicting future poses and takes significantly lower computational time than traditional methods in solving anytime prediction tasks.

Via

Access Paper or Ask Questions

DeRi-IGP: Manipulating Rigid Objects Using Deformable Objects via Iterative Grasp-Pull

Sep 09, 2023

Zixing Wang, Ahmed H. Qureshi

Figure 1 for DeRi-IGP: Manipulating Rigid Objects Using Deformable Objects via Iterative Grasp-Pull

Figure 2 for DeRi-IGP: Manipulating Rigid Objects Using Deformable Objects via Iterative Grasp-Pull

Figure 3 for DeRi-IGP: Manipulating Rigid Objects Using Deformable Objects via Iterative Grasp-Pull

Figure 4 for DeRi-IGP: Manipulating Rigid Objects Using Deformable Objects via Iterative Grasp-Pull

Abstract:Heterogeneous systems manipulation, i.e., manipulating rigid objects via deformable (soft) objects, is an emerging field that remains in its early stages of research. Existing works in this field suffer from limited action and operational space, poor generalization ability, and expensive development. To address these challenges, we propose a universally applicable and effective moving primitive, Iterative Grasp-Pull (IGP), and a sample-based framework, DeRi-IGP, to solve the heterogeneous system manipulation task. The DeRi-IGP framework uses local onboard robots' RGBD sensors to observe the environment, comprising a soft-rigid body system. It then uses this information to iteratively grasp and pull a soft body (e.g., rope) to move the attached rigid body to a desired location. We evaluate the effectiveness of our framework in solving various heterogeneous manipulation tasks and compare its performance with several state-of-the-art baselines. The result shows that DeRi-IGP outperforms other methods by a significant margin. In addition, we also demonstrate the advantage of the large operational space of IGP in the long-distance object acquisition task within both simulated and real environments.

Via

Access Paper or Ask Questions

Efficient Q-Learning over Visit Frequency Maps for Multi-agent Exploration of Unknown Environments

Jul 30, 2023

Xuyang Chen, Ashvin N. Iyer, Zixing Wang, Ahmed H. Qureshi

Figure 1 for Efficient Q-Learning over Visit Frequency Maps for Multi-agent Exploration of Unknown Environments

Figure 2 for Efficient Q-Learning over Visit Frequency Maps for Multi-agent Exploration of Unknown Environments

Figure 3 for Efficient Q-Learning over Visit Frequency Maps for Multi-agent Exploration of Unknown Environments

Figure 4 for Efficient Q-Learning over Visit Frequency Maps for Multi-agent Exploration of Unknown Environments

Abstract:The robot exploration task has been widely studied with applications spanning from novel environment mapping to item delivery. For some time-critical tasks, such as rescue catastrophes, the agent is required to explore as efficiently as possible. Recently, Visit Frequency-based map representation achieved great success in such scenarios by discouraging repetitive visits with a frequency-based penalty. However, its relatively large size and single-agent settings hinder its further development. In this context, we propose Integrated Visit Frequency Map, which encodes identical information as Visit Frequency Map with a more compact size, and a visit frequency-based multi-agent information exchange and control scheme that is able to accommodate both representations. Through tests in diverse settings, the results indicate our proposed methods can achieve a comparable level of performance of VFM with lower bandwidth requirements and generalize well to different multi-agent setups including real-world environments.

* Accepted by IROS 2023. 8 pages

Via

Access Paper or Ask Questions

DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

May 22, 2023

Zixing Wang, Ahmed H. Qureshi

Figure 1 for DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

Figure 2 for DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

Figure 3 for DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

Figure 4 for DeRi-Bot: Learning to Collaboratively Manipulate Rigid Objects via Deformable Objects

Abstract:Recent research efforts have yielded significant advancements in manipulating objects under homogeneous settings where the robot is required to either manipulate rigid or deformable (soft) objects. However, the manipulation under heterogeneous setups that involve both deformable and rigid objects remains an unexplored area of research. Such setups are common in various scenarios that involve the transportation of heavy objects via ropes, e.g., on factory floors, at disaster sites, and in forestry. To address this challenge, we introduce DeRi-Bot, the first framework that enables the collaborative manipulation of rigid objects with deformable objects. Our framework comprises an Action Prediction Network (APN) and a Configuration Prediction Network (CPN) to model the complex pattern and stochasticity of soft-rigid body systems. We demonstrate the effectiveness of DeRi-Bot in moving rigid objects to a target position with ropes connected to robotic arms. Furthermore, DeRi-Bot is a distributive method that can accommodate an arbitrary number of robots or human partners without reconfiguration or retraining. We evaluate our framework in both simulated and real-world environments and show that it achieves promising results with strong generalization across different types of objects and multi-agent settings, including human-robot collaboration.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

Fast Estimating Pedestrian Moving State Based on Single 2D Body Pose by Shallow Neural Network

Jul 11, 2019

Zixing Wang, Nikolaos Papanikolopoulos

Figure 1 for Fast Estimating Pedestrian Moving State Based on Single 2D Body Pose by Shallow Neural Network

Figure 2 for Fast Estimating Pedestrian Moving State Based on Single 2D Body Pose by Shallow Neural Network

Figure 3 for Fast Estimating Pedestrian Moving State Based on Single 2D Body Pose by Shallow Neural Network

Figure 4 for Fast Estimating Pedestrian Moving State Based on Single 2D Body Pose by Shallow Neural Network

Abstract:Crossing or Not-Crossing (C/NC) problem is important to autonomous vehicles (AVs) to safely interact with pedestrians. However, this problem setup ignores pedestrians walking along the direction of vehicles' movement (LONG). To enhance AVs' awareness of pedestrians behavior, we make the first step towards extending C/NC to C/NC/LONG problem and recognize them based on single body pose. In contrast, previous C/NC state classification work depend on multiple poses or contextual information. Our proposed shallow neural network classifier is able to recognize these three states within a very short time. We test our it on JAAD dataset and report average 81.23% accuracy. In order to further improve the classifier's performance, we introduce a computational-efficient method, action momentum optimizer (AMO), which correct prediction based on crossing behavior pattern. And our experiment shows that classifier perform at most 11.39% better on continuous pose test with the help of it. Furthermore, this model can cooperate with different sensors and algorithms that provide 2D pedestrian body pose so that it is able to work across multiple light and weather conditions. In addition, we have created extended annotations of pose for JAAD dataset, which will be publicly released soon

* 10 pages

Via

Access Paper or Ask Questions

ECO: Egocentric Cognitive Mapping

Dec 02, 2018

Jayant Sharma, Zixing Wang, Alberto Speranzon, Vijay Venkataraman, Hyun Soo Park

Figure 1 for ECO: Egocentric Cognitive Mapping

Figure 2 for ECO: Egocentric Cognitive Mapping

Figure 3 for ECO: Egocentric Cognitive Mapping

Figure 4 for ECO: Egocentric Cognitive Mapping

Abstract:We present a new method to localize a camera within a previously unseen environment perceived from an egocentric point of view. Although this is, in general, an ill-posed problem, humans can effortlessly and efficiently determine their relative location and orientation and navigate into a previously unseen environments, e.g., finding a specific item in a new grocery store. To enable such a capability, we design a new egocentric representation, which we call ECO (Egocentric COgnitive map). ECO is biologically inspired, by the cognitive map that allows human navigation, and it encodes the surrounding visual semantics with respect to both distance and orientation. ECO possesses three main properties: (1) reconfigurability: complex semantics and geometry is captured via the synthesis of atomic visual representations (e.g., image patch); (2) robustness: the visual semantics are registered in a geometrically consistent way (e.g., aligning with respect to the gravity vector, frontalizing, and rescaling to canonical depth), thus enabling us to learn meaningful atomic representations; (3) adaptability: a domain adaptation framework is designed to generalize the learned representation without manual calibration. As a proof-of-concept, we use ECO to localize a camera within real-world scenes---various grocery stores---and demonstrate performance improvements when compared to existing semantic localization approaches.

Via

Access Paper or Ask Questions