Recovering whole-body mesh by inferring the abstract pose and shape parameters from visual content can obtain 3D bodies with realistic structures. However, the inferring process is highly non-linear and suffers from image-mesh misalignment, resulting in inaccurate reconstruction. In contrast, 3D keypoint estimation methods utilize the volumetric representation to achieve pixel-level accuracy but may predict unrealistic body structures. To address these issues, this paper presents a novel hybrid inverse kinematics solution, HybrIK, that integrates the merits of 3D keypoint estimation and body mesh recovery in a unified framework. HybrIK directly transforms accurate 3D joints to body-part rotations via twist-and-swing decomposition. The swing rotations are analytically solved with 3D joints, while the twist rotations are derived from visual cues through neural networks. To capture comprehensive whole-body details, we further develop a holistic framework, HybrIK-X, which enhances HybrIK with articulated hands and an expressive face. HybrIK-X is fast and accurate by solving the whole-body pose with a one-stage model. Experiments demonstrate that HybrIK and HybrIK-X preserve both the accuracy of 3D joints and the realistic structure of the parametric human model, leading to pixel-aligned whole-body mesh recovery. The proposed method significantly surpasses the state-of-the-art methods on various benchmarks for body-only, hand-only, and whole-body scenarios. Code and results can be found at https://jeffli.site/HybrIK-X/
Most of the existing blind image Super-Resolution (SR) methods assume that the blur kernels are space-invariant. However, the blur involved in real applications are usually space-variant due to object motion, out-of-focus, etc., resulting in severe performance drop of the advanced SR methods. To address this problem, we firstly introduce two new datasets with out-of-focus blur, i.e., NYUv2-BSR and Cityscapes-BSR, to support further researches of blind SR with space-variant blur. Based on the datasets, we design a novel Cross-MOdal fuSion network (CMOS) that estimate both blur and semantics simultaneously, which leads to improved SR results. It involves a feature Grouping Interactive Attention (GIA) module to make the two modalities interact more effectively and avoid inconsistency. GIA can also be used for the interaction of other features because of the universality of its structure. Qualitative and quantitative experiments compared with state-of-the-art methods on above datasets and real-world images demonstrate the superiority of our method, e.g., obtaining PSNR/SSIM by +1.91/+0.0048 on NYUv2-BSR than MANet.
This letter presents a novel and retractable ring-shaped quadrotor called Ring-Rotor that can adjust the vehicle's length and width simultaneously. Unlike other morphing quadrotors with high platform complexity and poor controllability, Ring-Rotor uses only one servo motor for morphing but reduces the largest dimension of the vehicle by approximately 31.4\%. It can guarantee passibility while flying through small spaces in its compact form and energy saving in its standard form. Meanwhile, the vehicle breaks the cross configuration of general quadrotors with four arms connected to the central body and innovates a ring-shaped mechanical structure with spare central space. Based on this, an ingenious whole-body aerial grasping and transportation scheme is designed to carry various shapes of objects without the external manipulator mechanism. Moreover, we exploit a nonlinear model predictive control (NMPC) strategy that uses a time-variant physical parameter model to adapt to the quadrotor morphology. Above mentioned applications are performed in real-world experiments to demonstrate the system's high versatility.
This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue. Specifically, we first analyze the effect of CS in FL and show that FL training can be accelerated by adequately choosing participants to diversify the training dataset in each round of training. Based on this, we leverage data profiling and determinantal point process (DPP) sampling techniques to develop an algorithm termed Federated Learning with DPP-based Participant Selection (FL-DP$^3$S). This algorithm effectively diversifies the participants' datasets in each round of training while preserving their data privacy. We conduct extensive experiments to examine the efficacy of our proposed method. The results show that our scheme attains a faster convergence rate, as well as a smaller communication overhead than several baselines.
The vanilla fractional order gradient descent may oscillatively converge to a region around the global minimum instead of converging to the exact minimum point, or even diverge, in the case where the objective function is strongly convex. To address this problem, a novel adaptive fractional order gradient descent (AFOGD) method and a novel adaptive fractional order accelerated gradient descent (AFOAGD) method are proposed in this paper. Inspired by the quadratic constraints and Lyapunov stability analysis from robust control theory, we establish a linear matrix inequality to analyse the convergence of our proposed algorithms. We prove that the proposed algorithms can achieve R-linear convergence when the objective function is $\textbf{L-}$smooth and $\textbf{m-}$strongly-convex. Several numerical simulations are demonstrated to verify the effectiveness and superiority of our proposed algorithms.
Optimization-based trajectory generation methods are widely used in whole-body planning for robots. However, existing work either oversimplifies the robot's geometry and environment representation, resulting in a conservative trajectory, or suffers from a huge overhead in maintaining additional information such as the Signed Distance Field (SDF). To bridge the gap, we consider the robot as an implicit function, with its surface boundary represented by the zero-level set of its SDF. Based on this, we further employ another implicit function to lazily compute the signed distance to the swept volume generated by the robot and its trajectory. The computation is efficient by exploiting continuity in space-time, and the implicit function guarantees precise and continuous collision evaluation even for nonconvex robots with complex surfaces. Furthermore, we propose a trajectory optimization pipeline applicable to the implicit SDF. Simulation and real-world experiments validate the high performance of our approach for arbitrarily shaped robot trajectory optimization.
Mobile edge computing (MEC) is a promising paradigm to meet the quality of service (QoS) requirements of latency-sensitive IoT applications. However, attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns, thereby incurring the pattern privacy (PP) issue. Therefore, we propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving PP. Firstly, we formulate the dynamic computation offloading procedure as a Markov decision process (MDP). Next, we develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions. This is achieved by modifying the deep Q-network (DQN) with a Function-output Gaussian process mechanism. We provide a theoretical privacy guarantee and a utility guarantee (learning error bound) for the DP-DQO algorithm and finally, conduct simulations to evaluate the performance of our proposed algorithm by comparing it with greedy and DQN-based algorithms.
With the development of robotics, ground robots are no longer limited to planar motion. Passive height variation due to complex terrain and active height control provided by special structures on robots require a more general navigation planning framework beyond 2D. Existing methods rarely considers both simultaneously, limiting the capabilities and applications of ground robots. In this paper, we proposed an optimization-based planning framework for ground robots considering both active and passive height changes on the z-axis. The proposed planner first constructs a penalty field for chassis motion constraints defined in R3 such that the optimal solution space of the trajectory is continuous, resulting in a high-quality smooth chassis trajectory. Also, by constructing custom constraints in the z-axis direction, it is possible to plan trajectories for different types of ground robots which have z-axis degree of freedom. We performed simulations and realworld experiments to verify the efficiency and trajectory quality of our algorithm.
Mutual localization plays a crucial role in multi-robot systems. In this work, we propose a novel system to estimate the 3D relative pose targeting real-world applications. We design and implement a compact hardware module using active infrared (IR) LEDs, an IR fish-eye camera, an ultra-wideband (UWB) module and an inertial measurement unit (IMU). By leveraging IR light communication, the system solves data association between visual detection and UWB ranging. Ranging measurements from the UWB and directional information from the camera offer relative 3D position estimation. Combining the mutual relative position with neighbors and the gravity constraints provided by IMUs, we can estimate the 3D relative pose from every single frame of sensor fusion. In addition, we design an estimator based on the error-state Kalman filter (ESKF) to enhance system accuracy and robustness. When multiple neighbors are available, a Pose Graph Optimization (PGO) algorithm is applied to further improve system accuracy. We conduct experiments in various environments, and the results show that our system outperforms state-of-the-art accuracy and robustness, especially in challenging environments.
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.