Operators from various industries have been pushing the adoption of wireless sensing nodes for industrial monitoring, and such efforts have produced sizeable condition monitoring datasets that can be used to build diagnosis algorithms capable of warning maintenance engineers of impending failure or identifying current system health conditions. However, single operators may not have sufficiently large fleets of systems or component units to collect sufficient data to develop data-driven algorithms. Collecting a satisfactory quantity of fault patterns for safety-critical systems is particularly difficult due to the rarity of faults. Federated learning (FL) has emerged as a promising solution to leverage datasets from multiple operators to train a decentralized asset fault diagnosis model while maintaining data confidentiality. However, there are still considerable obstacles to overcome when it comes to optimizing the federation strategy without leaking sensitive data and addressing the issue of client dataset heterogeneity. This is particularly prevalent in fault diagnosis applications due to the high diversity of operating conditions and system configurations. To address these two challenges, we propose a novel clustering-based FL algorithm where clients are clustered for federating based on dataset similarity. To quantify dataset similarity between clients without explicitly sharing data, each client sets aside a local test dataset and evaluates the other clients' model prediction accuracy and uncertainty on this test dataset. Clients are then clustered for FL based on relative prediction accuracy and uncertainty.
All-in-Focus (AIF) photography is expected to be a commercial selling point for modern smartphones. Standard AIF synthesis requires manual, time-consuming operations such as focal stack compositing, which is unfriendly to ordinary people. To achieve point-and-shoot AIF photography with a smartphone, we expect that an AIF photo can be generated from one shot of the scene, instead of from multiple photos captured by the same camera. Benefiting from the multi-camera module in modern smartphones, we introduce a new task of AIF synthesis from main (wide) and ultra-wide cameras. The goal is to recover sharp details from defocused regions in the main-camera photo with the help of the ultra-wide-camera one. The camera setting poses new challenges such as parallax-induced occlusions and inconsistent color between cameras. To overcome the challenges, we introduce a predict-and-refine network to mitigate occlusions and propose dynamic frequency-domain alignment for color correction. To enable effective training and evaluation, we also build an AIF dataset with 2686 unique scenes. Each scene includes two photos captured by the main camera, one photo captured by the ultrawide camera, and a synthesized AIF photo. Results show that our solution, termed EasyAIF, can produce high-quality AIF photos and outperforms strong baselines quantitatively and qualitatively. For the first time, we demonstrate point-and-shoot AIF photo synthesis successfully from main and ultra-wide cameras.
This article studies the collaborative transportation of a cable-suspended pipe by two quadrotors. A force-coordination control scheme is proposed, where a force-consensus term is introduced to average the load distribution between the quadrotors. Since thrust uncertainty and cable force are coupled together in the acceleration channel, disturbance observer can only obtain the lumped disturbance estimate. Under the quasi-static condition, a disturbance separation strategy is developed to remove the thrust uncertainty estimate for precise cable force estimation. The stability of the overall system is analyzed using Lyapunov theory. Both numerical simulations and indoor experiments using heterogeneous quadrotors validate the effectiveness of thrust uncertainty separation and force-consensus algorithm.
Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.
Point cloud segmentation is a fundamental task in 3D vision that serves a wide range of applications. Although great progresses have been made these years, its practical usability is still limited by the availability of training data. Existing approaches cannot make full use of multiple datasets on hand due to the label mismatch among different datasets. In this paper, we propose a principled approach that supports learning from heterogeneous datasets with different label sets. Our idea is to utilize a pre-trained language model to embed discrete labels to a continuous latent space with the help of their label names. This unifies all labels of different datasets, so that joint training is doable. Meanwhile, classifying points in the continuous 3D space by their vocabulary tokens significantly increase the generalization ability of the model in comparison with existing approaches that have fixed decoder architecture. Besides, we also integrate prompt learning in our framework to alleviate data shifts among different data sources. Extensive experiments demonstrate that our model outperforms the state-of-the-art by a large margin.
Remote photoplethysmography (rPPG) technology has drawn increasing attention in recent years. It can extract Blood Volume Pulse (BVP) from facial videos, making many applications like health monitoring and emotional analysis more accessible. However, as the BVP signal is easily affected by environmental changes, existing methods struggle to generalize well for unseen domains. In this paper, we systematically address the domain shift problem in the rPPG measurement task. We show that most domain generalization methods do not work well in this problem, as domain labels are ambiguous in complicated environmental changes. In light of this, we propose a domain-label-free approach called NEuron STructure modeling (NEST). NEST improves the generalization capacity by maximizing the coverage of feature space during training, which reduces the chance for under-optimized feature activation during inference. Besides, NEST can also enrich and enhance domain invariant features across multi-domain. We create and benchmark a large-scale domain generalization protocol for the rPPG measurement task. Extensive experiments show that our approach outperforms the state-of-the-art methods on both cross-dataset and intra-dataset settings.
Automatic image cropping algorithms aim to recompose images like human-being photographers by generating the cropping boxes with improved composition quality. Cropping box regression approaches learn the beauty of composition from annotated cropping boxes. However, the bias of annotations leads to quasi-trivial recomposing results, which has an obvious tendency to the average location of training samples. The crux of this predicament is that the task is naively treated as a box regression problem, where rare samples might be dominated by normal samples, and the composition patterns of rare samples are not well exploited. Observing that similar composition patterns tend to be shared by the cropping boundaries annotated nearly, we argue to find the beauty of composition from the rare samples by clustering the samples with similar cropping boundary annotations, ie, similar composition patterns. We propose a novel Contrastive Composition Clustering (C2C) to regularize the composition features by contrasting dynamically established similar and dissimilar pairs. In this way, common composition patterns of multiple images can be better summarized, which especially benefits the rare samples and endows our model with better generalizability to render nontrivial results. Extensive experimental results show the superiority of our model compared with prior arts. We also illustrate the philosophy of our design with an interesting analytical visualization.
We study the composition style in deep image matting, a notion that characterizes a data generation flow on how to exploit limited foregrounds and random backgrounds to form a training dataset. Prior art executes this flow in a completely random manner by simply going through the foreground pool or by optionally combining two foregrounds before foreground-background composition. In this work, we first show that naive foreground combination can be problematic and therefore derive an alternative formulation to reasonably combine foregrounds. Our second contribution is an observation that matting performance can benefit from a certain occurrence frequency of combined foregrounds and their associated source foregrounds during training. Inspired by this, we introduce a novel composition style that binds the source and combined foregrounds in a definite triplet. In addition, we also find that different orders of foreground combination lead to different foreground patterns, which further inspires a quadruplet-based composition style. Results under controlled experiments on four matting baselines show that our composition styles outperform existing ones and invite consistent performance improvement on both composited and real-world datasets. Code is available at: https://github.com/coconuthust/composition_styles
Level 5 Autonomous Driving, a technology that a fully automated vehicle (AV) requires no human intervention, has raised serious concerns on safety and stability before widespread use. The capability of understanding and predicting future motion trajectory of road objects can help AV plan a path that is safe and easy to control. In this paper, we propose a network architecture that parallelizes multiple convolutional neural network backbones and fuses features to make multi-mode trajectory prediction. In the 2020 ICRA Nuscene Prediction challenge, our model ranks 15th on the leaderboard across all teams.
We introduce point affiliation into feature upsampling, a notion that describes the affiliation of each upsampled point to a semantic cluster formed by local decoder feature points with semantic similarity. By rethinking point affiliation, we present a generic formulation for generating upsampling kernels. The kernels encourage not only semantic smoothness but also boundary sharpness in the upsampled feature maps. Such properties are particularly useful for some dense prediction tasks such as semantic segmentation. The key idea of our formulation is to generate similarity-aware kernels by comparing the similarity between each encoder feature point and the spatially associated local region of decoder features. In this way, the encoder feature point can function as a cue to inform the semantic cluster of upsampled feature points. To embody the formulation, we further instantiate a lightweight upsampling operator, termed Similarity-Aware Point Affiliation (SAPA), and investigate its variants. SAPA invites consistent performance improvements on a number of dense prediction tasks, including semantic segmentation, object detection, depth estimation, and image matting. Code is available at: https://github.com/poppinace/sapa