Sampling is widely used in various point cloud tasks as it can effectively reduce resource consumption. Recently, some methods have proposed utilizing neural networks to optimize the sampling process for various task requirements. Currently, deep downsampling methods can be categorized into two main types: generative-based and score-based. Generative-based methods directly generate sampled point clouds using networks, whereas score-based methods assess the importance of points according to specific rules and then select sampled point clouds based on their scores. However, these methods often result in noticeable clustering effects in high-intensity feature areas, compromising their ability to preserve small-scale features and leading to the loss of some structures, thereby affecting the performance of subsequent tasks. In this paper, we propose REPS, a reconstruction-based scoring strategy that evaluates the importance of each vertex by removing and reconstructing them using surrounding vertices. Our reconstruction process comprises point reconstruction and shape reconstruction. The two aforementioned reconstruction methods effectively evaluate the importance of vertices by removing them at different scales for reconstruction. These reconstructions ensure that our method maintains the overall geometric features of the point cloud and avoids disturbing small-scale structures during sampling. Additionally, we propose the Global-Local Fusion Attention (GLFA) module, which aggregates local and global attention features of point clouds, ensuring high-quality reconstruction and sampling effects. Our method outperforms previous approaches in preserving the structural features of the sampled point clouds. Furthermore, abundant experimental results demonstrate the superior performance of our method across various common tasks.
We introduce a novel approach for the reconstruction of tubular shapes from skeletal representations. Our method processes all skeletal points as a whole, eliminating the need for splitting input structure into multiple segments. We represent the tubular shape as a truncated signed distance function (TSDF) in a voxel hashing manner, in which the signed distance between a voxel center and the object is computed through a simple geometric algorithm. Our method does not involve any surface sampling scheme or solving large matrix equations, and therefore is a faster and more elegant solution for tubular shape reconstruction compared to other approaches. Experiments demonstrate the efficiency and effectiveness of the proposed method. Code is avaliable at https://github.com/wlsdzyzl/Dragon.
Most aerial manipulators use serial rigid-link designs, which results in large forces when initiating contacts during manipulation and could cause flight stability difficulty. This limitation could potentially be improved by the compliance of continuum manipulators. To achieve this goal, we present the novel design of a compact, lightweight, and modular cable-driven continuum manipulator for aerial drones. We then derive a complete modeling framework for its kinematics, statics, and stiffness (compliance). The modeling framework can guide the control and design problems to integrate the manipulator to aerial drones. In addition, thanks to the derived stiffness (compliance) matrix, and using a low-cost IMU sensor to capture deformation angles, we present a simple method to estimate manipulation force at the tip of the manipulator. We report preliminary experimental validations of the hardware prototype, providing insights on its manipulation feasibility. We also report preliminary results of the IMU-based force estimation method.
The equilibrium shape of a continuum robot is resulted from both its internal actuation and the external physical interaction with a surrounding environment. A fast and accurate shape estimation method (i) can be used as a feedback to compensate for more accurate motion; and (ii) can reveal rich information about physical interactions (e.g. instrument-anatomy contacts / forces during a surgery). From a prior work that demonstrated an offline calibration of continuum robots, we adopt its shape modal representation and error propagation models that include identification Jacobians. In this work, we present an iterative observer approach to enable online shape estimation. We develop a dual Extended Kalman Filter (EKF) to estimate both the robot state and the shape modal parameters. The dual EKF provides robust estimation on (i) the configuration space variables that are controllable and driven by internal actuation; and (ii) the modal coefficients representing homotopies of shape families that are governed by the physical interactions with the environment. We report results from simulation studies in this work, and plan to investigate methods in the future to use the proposed approach for predicting physical interactions.
Most aerial manipulators use serial rigid-link designs, which results in large forces when initiating contacts during manipulation and could cause flight stability difficulty. This limitation could potentially be improved by the compliance of continuum manipulators. To achieve this goal, we present the novel design of a compact, lightweight, and modular cable-driven continuum manipulator for aerial drones. We then derive a complete modeling framework for its kinematics, statics, and stiffness (compliance). The framework is essential for integrating the manipulator to aerial drones. Finally, we report preliminary experimental validations of the hardware prototype, providing insights on its manipulation feasibility. Future work includes the integration and test of the proposed continuum manipulator with aerial drones.
As a prevailing task in video surveillance and forensics field, person re-identification (re-ID) aims to match person images captured from non-overlapped cameras. In unconstrained scenarios, person images often suffer from the resolution mismatch problem, i.e., \emph{Cross-Resolution Person Re-ID}. To overcome this problem, most existing methods restore low resolution (LR) images to high resolution (HR) by super-resolution (SR). However, they only focus on the HR feature extraction and ignore the valid information from original LR images. In this work, we explore the influence of resolutions on feature extraction and develop a novel method for cross-resolution person re-ID called \emph{\textbf{M}ulti-Resolution \textbf{R}epresentations \textbf{J}oint \textbf{L}earning} (\textbf{MRJL}). Our method consists of a Resolution Reconstruction Network (RRN) and a Dual Feature Fusion Network (DFFN). The RRN uses an input image to construct a HR version and a LR version with an encoder and two decoders, while the DFFN adopts a dual-branch structure to generate person representations from multi-resolution images. Comprehensive experiments on five benchmarks verify the superiority of the proposed MRJL over the relevent state-of-the-art methods.
Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. We name this problem as Cross-Resolution Person Re-identification which brings a great challenge for matching correctly. In this paper, we propose a Deep High-Resolution Pseudo-Siamese Framework (PS-HRNet) to solve the above problem. Specifically, in order to restore the resolution of low-resolution images and make reasonable use of different channel information of feature maps, we introduce and innovate VDSR module with channel attention (CA) mechanism, named as VDSR-CA. Then we reform the HRNet by designing a novel representation head to extract discriminating features, named as HRNet-ReID. In addition, a pseudo-siamese framework is constructed to reduce the difference of feature distributions between low-resolution images and high-resolution images. The experimental results on five cross-resolution person datasets verify the effectiveness of our proposed approach. Compared with the state-of-the-art methods, our proposed PS-HRNet improves 3.4\%, 6.2\%, 2.5\%,1.1\% and 4.2\% at Rank-1 on MLR-Market-1501, MLR-CUHK03, MLR-VIPeR, MLR-DukeMTMC-reID, and CAVIAR datasets, respectively. Our code is available at \url{https://github.com/zhguoqing}.
Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description. The significant feature gap between two modalities makes this task very challenging. Many existing methods attempt to utilize local alignment to address this problem in the fine-grained level. However, most relevant methods introduce additional models or complicated training and evaluation strategies, which are hard to use in realistic scenarios. In order to facilitate the practical application, we propose a simple but effective end-to-end learning framework for text-based person search named TIPCB (i.e., Text-Image Part-based Convolutional Baseline). Firstly, a novel dual-path local alignment network structure is proposed to extract visual and textual local representations, in which images are segmented horizontally and texts are aligned adaptively. Then, we propose a multi-stage cross-modal matching strategy, which eliminates the modality gap from three feature levels, including low level, local level and global level. Extensive experiments are conducted on the widely-used benchmark dataset (CUHK-PEDES) and verify that our method outperforms the state-of-the-art methods by 3.69%, 2.95% and 2.31% in terms of Top-1, Top-5 and Top-10. Our code has been released in https://github.com/OrangeYHChen/TIPCB.
Recently, video-based person re-identification (re-ID) has drawn increasing attention in compute vision community because of its practical application prospects. Due to the inaccurate person detections and pose changes, pedestrian misalignment significantly increases the difficulty of feature extraction and matching. To address this problem, in this paper, we propose a \textbf{R}eference-\textbf{A}ided \textbf{P}art-\textbf{A}ligned (\textbf{RAPA}) framework to disentangle robust features of different parts. Firstly, in order to obtain better references between different videos, a pose-based reference feature learning module is introduced. Secondly, an effective relation-based part feature disentangling module is explored to align frames within each video. By means of using both modules, the informative parts of pedestrian in videos are well aligned and more discriminative feature representation is generated. Comprehensive experiments on three widely-used benchmarks, i.e. iLIDS-VID, PRID-2011 and MARS datasets verify the effectiveness of the proposed framework. Our code will be made publicly available.
Extracting effective and discriminative features is very important for addressing the challenging person re-identification (re-ID) task. Prevailing deep convolutional neural networks (CNNs) usually use high-level features for identifying pedestrian. However, some essential spatial information resided in low-level features such as shape, texture and color will be lost when learning the high-level features, due to extensive padding and pooling operations in the training stage. In addition, most existing person re-ID methods are mainly based on hand-craft bounding boxes where images are precisely aligned. It is unrealistic in practical applications, since the exploited object detection algorithms often produce inaccurate bounding boxes. This will inevitably degrade the performance of existing algorithms. To address these problems, we put forward a novel person re-ID model that fuses high- and low-level embeddings to reduce the information loss caused in learning high-level features. Then we divide the fused embedding into several parts and reconnect them to obtain the global feature and more significant local features, so as to alleviate the affect caused by the inaccurate bounding boxes. In addition, we also introduce the spatial and channel attention mechanisms in our model, which aims to mine more discriminative features related to the target. Finally, we reconstruct the feature extractor to ensure that our model can obtain more richer and robust features. Extensive experiments display the superiority of our approach compared with existing approaches. Our code is available at https://github.com/libraflower/MutipleFeature-for-PRID.