Abstract:Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at varying distances and states, regardless of whether the robot itself is in motion or stationary. This setup is more flexible and practical than conventional human action recognition tasks. However, existing benchmarks designed for traditional action recognition fail to address the unique complexities in N-HRI due to limited data, modalities, task categories, and diversity of subjects and environments. To address these challenges, we introduce ACTIVE (Action from Robotic View), a large-scale dataset tailored specifically for perception-centric robotic views prevalent in mobile service robots. ACTIVE comprises 30 composite action categories, 80 participants, and 46,868 annotated video instances, covering both RGB and point cloud modalities. Participants performed various human actions in diverse environments at distances ranging from 3m to 50m, while the camera platform was also mobile, simulating real-world scenarios of robot perception with varying camera heights due to uneven ground. This comprehensive and challenging benchmark aims to advance action and attribute recognition research in N-HRI. Furthermore, we propose ACTIVE-PC, a method that accurately perceives human actions at long distances using Multilevel Neighborhood Sampling, Layered Recognizers, Elastic Ellipse Query, and precise decoupling of kinematic interference from human actions. Experimental results demonstrate the effectiveness of ACTIVE-PC. Our code is available at: https://github.com/wangzy01/ACTIVE-Action-from-Robotic-View.
Abstract:Grasp generation aims to create complex hand-object interactions with a specified object. While traditional approaches for hand generation have primarily focused on visibility and diversity under scene constraints, they tend to overlook the fine-grained hand-object interactions such as contacts, resulting in inaccurate and undesired grasps. To address these challenges, we propose a controllable grasp generation task and introduce ClickDiff, a controllable conditional generation model that leverages a fine-grained Semantic Contact Map (SCM). Particularly when synthesizing interactive grasps, the method enables the precise control of grasp synthesis through either user-specified or algorithmically predicted Semantic Contact Map. Specifically, to optimally utilize contact supervision constraints and to accurately model the complex physical structure of hands, we propose a Dual Generation Framework. Within this framework, the Semantic Conditional Module generates reasonable contact maps based on fine-grained contact information, while the Contact Conditional Module utilizes contact maps alongside object point clouds to generate realistic grasps. We evaluate the evaluation criteria applicable to controllable grasp generation. Both unimanual and bimanual generation experiments on GRAB and ARCTIC datasets verify the validity of our proposed method, demonstrating the efficacy and robustness of ClickDiff, even with previously unseen objects. Our code is available at https://github.com/adventurer-w/ClickDiff.
Abstract:Channel knowledge map (CKM) has recently emerged to facilitate the placement and trajectory optimization for unmanned aerial vehicle (UAV) communications. This paper investigates a CKM-assisted multi-UAV wireless network, by focusing on the construction and utilization of CKMs for multi-UAV placement optimization. First, we consider the CKM construction problem when data measurements for only a limited number of points are available. Towards this end, we exploit a data-driven interpolation technique to construct CKMs to characterize the signal propagation environments. Next, we study the multi-UAV placement optimization problem by utilizing the constructed CKMs, in which the multiple UAVs aim to optimize their placement locations to maximize the weighted sum rate with their respectively associated ground base stations (GBSs). However, the rate function based on the CKMs is generally non-differentiable. To tackle this issue, we propose a novel iterative algorithm based on derivative-free optimization, in which a series of quadratic functions are iteratively constructed to approximate the objective function under a set of interpolation conditions, and accordingly, the UAVs' placement locations are updated by maximizing the approximate function subject to a trust region constraint. Finally, numerical results are presented to validate the proposed design achieves near-optimal performance, but with much lower implementation complexity.