Making top-down human pose estimation method present both good performance and high efficiency is appealing. Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework, as the features provided by the backbone are able to be shared by the two tasks. However, the performance is not as good as traditional two-stage methods. In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency. Specifically, we make improvements on the whole process of pose estimation, which contains feature extraction and keypoint detection. The part of feature extraction is ensured to get enough and valuable information of pose. Then, we introduce a Global Context Module into the keypoints detection branch to enlarge the receptive field, as it is crucial to successful human pose estimation. On the COCO val2017 set, our model using the ResNet-50 backbone achieves an AP of 68.1, which is 2.6 higher than Mask RCNN (AP of 65.5). Compared to the classic two-stage top-down method SimpleBaseline, our model largely narrows the performance gap (68.1 AP vs. 68.9 AP) with a much faster inference speed (77 ms vs. 168 ms), demonstrating the effectiveness of the proposed method. Code is available at: https://github.com/lingl_space/maskrcnn_keypoint_refined.
The vehicular-to-everything (V2X) technology has recently drawn a number of attentions from both academic and industrial areas. However, the openness of the wireless communication system makes it more vulnerable to identity impersonation and information tampering. How to employ the powerful radio frequency fingerprint (RFF) identification technology in V2X systems turns out to be a vital and also challenging task. In this paper, we propose a novel RFF extraction method for Long Term Evolution-V2X (LTE-V2X) systems. In order to conquer the difficulty of extracting transmitter RFF in the presence of wireless channel and receiver noise, we first estimate the wireless channel which excludes the RFF. Then, we remove the impact of the wireless channel based on the channel estimate and obtain initial RFF features. Finally, we conduct RFF denoising to enhance the quality of the initial RFF. Simulation and experiment results both demonstrate that our proposed RFF extraction scheme achieves a high identification accuracy. Furthermore, the performance is also robust to the vehicle speed.
Assembly planning is the core of automating product assembly, maintenance, and recycling for modern industrial manufacturing. Despite its importance and long history of research, planning for mechanical assemblies when given the final assembled state remains a challenging problem. This is due to the complexity of dealing with arbitrary 3D shapes and the highly constrained motion required for real-world assemblies. In this work, we propose a novel method to efficiently plan physically plausible assembly motion and sequences for real-world assemblies. Our method leverages the assembly-by-disassembly principle and physics-based simulation to efficiently explore a reduced search space. To evaluate the generality of our method, we define a large-scale dataset consisting of thousands of physically valid industrial assemblies with a variety of assembly motions required. Our experiments on this new benchmark demonstrate we achieve a state-of-the-art success rate and the highest computational efficiency compared to other baseline algorithms. Our method also generalizes to rotational assemblies (e.g., screws and puzzles) and solves 80-part assemblies within several minutes.
This paper studies a multi-antenna network integrated sensing and communication (ISAC) system, in which a set of multi-antenna base stations (BSs) employ the coordinated transmit beamforming to serve their respectively associated single-antenna communication users (CUs), and at the same time reuse the reflected information signals to perform joint target detection. In particular, we consider two target detection scenarios depending on the time synchronization among BSs. In Scenario \uppercase\expandafter{\romannumeral1}, these BSs are synchronized and can exploit the target-reflected signals over both the direct links (from each BS to target to itself) and the cross links (from each BS to target to other BSs) for joint detection. In Scenario \uppercase\expandafter{\romannumeral2}, these BSs are not synchronized and can only utilize target-reflected signals over the direct links for joint detection. For each scenario, we derive the detection probability under a specific false alarm probability at any given target location. Based on the derivation, we optimize the coordinated transmit beamforming at the BSs to maximize the minimum detection probability over a particular target area, while ensuring the minimum signal-to-interference-plus-noise ratio (SINR) constraints at the CUs, subject to the maximum transmit power constraints at the BSs. We use the semi-definite relaxation (SDR) technique to obtain highly-quality solutions to the formulated problems. Numerical results show that for each scenario, the proposed design achieves higher detection probability than the benchmark scheme based on communication design. It is also shown that the time synchronization among BSs is beneficial in enhancing the detection performance as more reflected signal paths are exploited.
This paper investigates an intelligent reflecting surface (IRS) enabled multiuser integrated sensing and communication (ISAC) system, which consists of one multi-antenna base station (BS), one IRS, multiple single-antenna communication users (CUs), and one extended target at the non-line-of-sight (NLoS) region of the BS. The IRS is deployed to not only assist the communication from the BS to the CUs, but also enable the BS's NLoS target sensing based on the echo signals from the BS-IRS-target-IRS-BS link. To provide full degrees of freedom for sensing, we suppose that the BS sends additional dedicated sensing signals combined with the information signals. Accordingly, we consider two types of CU receivers, namely Type-I and Type-II receivers, which do not have and have the capability of cancelling the interference from the sensing signals, respectively. Under this setup, we jointly optimize the transmit beamforming at the BS and the reflective beamforming at the IRS to minimize the Cram\'er-Rao bound (CRB) for estimating the target response matrix with respect to the IRS, subject to the minimum signal-to-interference-plus-noise ratio (SINR) constraints at the CUs and the maximum transmit power constraint at the BS. We present efficient algorithms to solve the highly non-convex SINR-constrained CRB minimization problems, by using the techniques of alternating optimization and semi-definite relaxation. Numerical results show that the proposed design achieves lower estimation CRB than other benchmark schemes, and the sensing signal interference pre-cancellation is beneficial when the number of CUs is greater than one.
Multi-view graph clustering (MGC) methods are increasingly being studied due to the rising of multi-view data with graph structural information. The critical point of MGC is to better utilize the view-specific and view-common information in features and graphs of multiple views. However, existing works have an inherent limitation that they are unable to concurrently utilize the consensus graph information across multiple graphs and the view-specific feature information. To address this issue, we propose Variational Graph Generator for Multi-View Graph Clustering (VGMGC). Specifically, a novel variational graph generator is proposed to infer a reliable variational consensus graph based on a priori assumption over multiple graphs. Then a simple yet effective graph encoder in conjunction with the multi-view clustering objective is presented to learn the desired graph embeddings for clustering, which embeds the consensus and view-specific graphs together with features. Finally, theoretical results illustrate the rationality of VGMGC by analyzing the uncertainty of the inferred consensus graph with information bottleneck principle. Extensive experiments demonstrate the superior performance of our VGMGC over SOTAs.
Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering.
This paper studies a downlink secure integrated sensing and communication (ISAC) system, in which a multi-antenna base station (BS) transmits confidential messages to a single-antenna communication user (CU) while performing sensing on targets that may act as suspicious eavesdroppers. To ensure the quality of target sensing while preventing their potential eavesdropping, the BS combines the transmit confidential information signals with additional dedicated sensing signals, which play a dual role of artificial noise (AN) for degrading the qualities of eavesdropping channels. Under this setup, we jointly design the transmit information and sensing beamforming, with the objective of minimizing the weighted sum of beampattern matching errors and cross-correlation patterns for sensing subject to secure communication constraints. The robust design takes into account the channel state information (CSI) imperfectness of the eavesdroppers in two practical CSI error scenarios. First, we consider the scenario with bounded CSI errors of eavesdroppers, in which the worst-case secrecy rate constraint is adopted to ensure secure communication performance. In this scenario, we present the optimal solution to the worst-case secrecy rate constrained sensing beampattern optimization problem, by adopting the techniques of S-procedure, semi-definite relaxation (SDR), and a one-dimensional (1D) search, for which the tightness of the SDR is rigorously proved. Next, we consider the scenario with Gaussian CSI errors of eavesdroppers, in which the secrecy outage probability constraint is adopted. In this scenario, we present an efficient algorithm to solve the more challenging secrecy outage-constrained sensing beampattern optimization problem, by exploiting the convex restriction technique based on the Bernstein-type inequality, together with the SDR and 1D search.
For retinal image matching (RIM), we propose SuperRetina, the first end-to-end method with jointly trainable keypoint detector and descriptor. SuperRetina is trained in a novel semi-supervised manner. A small set of (nearly 100) images are incompletely labeled and used to supervise the network to detect keypoints on the vascular tree. To attack the incompleteness of manual labeling, we propose Progressive Keypoint Expansion to enrich the keypoint labels at each training epoch. By utilizing a keypoint-based improved triplet loss as its description loss, SuperRetina produces highly discriminative descriptors at full input image size. Extensive experiments on multiple real-world datasets justify the viability of SuperRetina. Even with manual labeling replaced by auto labeling and thus making the training process fully manual-annotation free, SuperRetina compares favorably against a number of strong baselines for two RIM tasks, i.e. image registration and identity verification. SuperRetina will be open source.
This paper investigates intelligent reflecting surface (IRS) enabled non-line-of-sight (NLoS) wireless sensing, in which an IRS is dedicatedly deployed to assist an access point (AP) to sense a target at its NLoS region. It is assumed that the AP is equipped with multiple antennas and the IRS is equipped with a uniform linear array. We consider two types of target models, namely the point and extended targets, for which the AP aims to estimate the target's direction-of-arrival (DoA) and the target response matrix with respect to the IRS, respectively, based on the echo signals from the AP-IRS-target-IRS-AP link. Under this setup, we jointly design the transmit beamforming at the AP and the reflective beamforming at the IRS to minimize the Cram\'er-Rao bound (CRB) on the estimation error. Towards this end, we first obtain the CRB expressions for the two target models in closed form. It is shown that in the point target case, the CRB for estimating the DoA depends on both the transmit and reflective beamformers; while in the extended target case, the CRB for estimating the target response matrix only depends on the transmit beamformers. Next, for the point target case, we optimize the joint beamforming design to minimize the CRB, via alternating optimization, semi-definite relaxation, and successive convex approximation. For the extended target case, we obtain the optimal transmit beamforming solution to minimize the CRB in closed form. Finally, numerical results show that for both cases, the proposed designs based on CRB minimization achieve improved sensing performance in terms of mean squared error, as compared to other traditional schemes.