Automatic Speech Recognition services (ASRs) inherit deep neural networks' vulnerabilities like crafted adversarial examples. Existing methods often suffer from low efficiency because the target phases are added to the entire audio sample, resulting in high demand for computational resources. This paper proposes a novel scheme named FAAG as an iterative optimization-based method to generate targeted adversarial examples quickly. By injecting the noise over the beginning part of the audio, FAAG generates adversarial audio in high quality with a high success rate timely. Specifically, we use audio's logits output to map each character in the transcription to an approximate position of the audio's frame. Thus, an adversarial example can be generated by FAAG in approximately two minutes using CPUs only and around ten seconds with one GPU while maintaining an average success rate over 85%. Specifically, the FAAG method can speed up around 60% compared with the baseline method during the adversarial example generation process. Furthermore, we found that appending benign audio to any suspicious examples can effectively defend against the targeted adversarial attack. We hope that this work paves the way for inventing new adversarial attacks against speech recognition with computational constraints.
Structural accuracy of segmentation is important for finescale structures in biomedical images. We propose a novel TopologyAttention ConvLSTM Network (TACNet) for 3D image segmentation in order to achieve high structural accuracy for 3D segmentation tasks. Specifically, we propose a Spatial Topology-Attention (STA) module to process a 3D image as a stack of 2D image slices and adopt ConvLSTM to leverage contextual structure information from adjacent slices. In order to effectively transfer topology-critical information across slices, we propose an Iterative-Topology Attention (ITA) module that provides a more stable topology-critical map for segmentation. Quantitative and qualitative results show that our proposed method outperforms various baselines in terms of topology-aware evaluation metrics.
Persistent homology is a widely used theory in topological data analysis. In the context of graph learning, topological features based on persistent homology have been used to capture potentially high-order structural information so as to augment existing graph neural network methods. However, computing extended persistent homology summaries remains slow for large and dense graphs, especially since in learning applications one has to carry out this computation potentially many times. Inspired by recent success in neural algorithmic reasoning, we propose a novel learning method to compute extended persistence diagrams on graphs. The proposed neural network aims to simulate a specific algorithm and learns to compute extended persistence diagrams for new graphs efficiently. Experiments on approximating extended persistence diagrams and several downstream graph representation learning tasks demonstrate the effectiveness of our method. Our method is also efficient; on large and dense graphs, we accelerate the computation by nearly 100 times.
Scene Graph Generation (SGG) aims to build a structured representation of a scene using objects and pairwise relationships, which benefits downstream tasks. However, current SGG methods usually suffer from sub-optimal scene graph generation because of the long-tailed distribution of training data. To address this problem, we propose Resistance Training using Prior Bias (RTPB) for the scene graph generation. Specifically, RTPB uses a distributed-based prior bias to improve models' detecting ability on less frequent relationships during training, thus improving the model generalizability on tail categories. In addition, to further explore the contextual information of objects and relationships, we design a contextual encoding backbone network, termed as Dual Transformer (DTrans). We perform extensive experiments on a very popular benchmark, VG150, to demonstrate the effectiveness of our method for the unbiased scene graph generation. In specific, our RTPB achieves an improvement of over 10% under the mean recall when applied to current SGG methods. Furthermore, DTrans with RTPB outperforms nearly all state-of-the-art methods with a large margin.
Compared to traditional rigid robotics, soft robotics has attracted increasing attention due to its advantages as compliance, safety, and low cost. As an essential part of soft robotics, the soft robotic gripper also shows its superior while grasping the objects with irregular shapes. Recent research has been conducted to improve its grasping performance by adjusting the variable effective length (VEL). However, the VEL achieved by multi-chamber design or tunable stiffness shape memory material requires complex pneumatic circuit design or a time-consuming phase-changing process. This work proposes a fold-based soft robotic actuator made from 3D printed filament, NinjaFlex. It is experimentally tested and represented by the hyperelastic model. Mathematic and finite element modelling is conducted to study the bending behaviour of the proposed soft actuator. Besides, an antagonistic constraint mechanism is proposed to achieve the VEL, and the experiments demonstrate that better conformity is achieved. Finally, a two-mode gripper is designed and evaluated to demonstrate the advances of VEL on grasping performance.
Besides per-pixel accuracy, topological correctness is also crucial for the segmentation of images with fine-scale structures, e.g., satellite images and biomedical images. In this paper, by leveraging the theory of digital topology, we identify locations in an image that are critical for topology. By focusing on these critical locations, we propose a new homotopy warping loss to train deep image segmentation networks for better topological accuracy. To efficiently identity these topologically critical locations, we propose a new algorithm exploiting the distance transform. The proposed algorithm, as well as the loss function, naturally generalize to different topological structures in both 2D and 3D settings. The proposed loss function helps deep nets achieve better performance in terms of topology-aware metrics, outperforming state-of-the-art topology-preserving segmentation methods.
Field robotic harvesting is a promising technique in recent development of agricultural industry. It is vital for robots to recognise and localise fruits before the harvesting in natural orchards. However, the workspace of harvesting robots in orchards is complex: many fruits are occluded by branches and leaves. It is important to estimate a proper grasping pose for each fruit before performing the manipulation. In this study, a geometry-aware network, A3N, is proposed to perform end-to-end instance segmentation and grasping estimation using both color and geometry sensory data from a RGB-D camera. Besides, workspace geometry modelling is applied to assist the robotic manipulation. Moreover, we implement a global-to-local scanning strategy, which enables robots to accurately recognise and retrieve fruits in field environments with two consumer-level RGB-D cameras. We also evaluate the accuracy and robustness of proposed network comprehensively in experiments. The experimental results show that A3N achieves 0.873 on instance segmentation accuracy, with an average computation time of 35 ms. The average accuracy of grasping estimation is 0.61 cm and 4.8$^{\circ}$ in centre and orientation, respectively. Overall, the robotic system that utilizes the global-to-local scanning and A3N, achieves success rate of harvesting ranging from 70\% - 85\% in field harvesting experiments.
Most of the existing methods for debaising in click-through rate (CTR) prediction depend on an oversimplified assumption, i.e., the click probability is the product of observation probability and relevance probability. However, since there is a complicated interplay between these two probabilities, these methods cannot be applied to other scenarios, e.g. query auto completion (QAC) and route recommendation. We propose a general debiasing framework without simplifying the relationships between variables, which can handle all scenarios in CTR prediction. Simulation experiments show that: under the simplest scenario, our method maintains a similar AUC with the state-of-the-art methods; in other scenarios, our method achieves considerable improvements compared with existing methods. Meanwhile, in online experiments, the framework also gains significant improvements consistently.
Graph Neural Network (GNN) has achieved state-of-the-art performance in various high-stake prediction tasks, but multiple layers of aggregations on graphs with irregular structures make GNN a less interpretable model. Prior methods use simpler subgraphs to simulate the full model, or counterfactuals to identify the causes of a prediction. The two families of approaches aim at two distinct objectives, "simulatability" and "counterfactual relevance", but it is not clear how the objectives can jointly influence the human understanding of an explanation. We design a user study to investigate such joint effects and use the findings to design a multi-objective optimization (MOO) algorithm to find Pareto optimal explanations that are well-balanced in simulatability and counterfactual. Since the target model can be of any GNN variants and may not be accessible due to privacy concerns, we design a search algorithm using zeroth-order information without accessing the architecture and parameters of the target model. Quantitative experiments on nine graphs from four applications demonstrate that the Pareto efficient explanations dominate single-objective baselines that use first-order continuous optimization or discrete combinatorial search. The explanations are further evaluated in robustness and sensitivity to show their capability of revealing convincing causes while being cautious about the possible confounders. The diverse dominating counterfactuals can certify the feasibility of algorithmic recourse, that can potentially promote algorithmic fairness where humans are participating in the decision-making using GNN.
Federated Learning (FL) enables a group of clients to jointly train a machine learning model with the help of a centralized server. Clients do not need to submit their local data to the server during training, and hence the local training data of clients is protected. In FL, distributed clients collect their local data independently, so the dataset of each client may naturally form a distinct source domain. In practice, the model trained over multiple source domains may have poor generalization performance on unseen target domains. To address this issue, we propose FedADG to equip federated learning with domain generalization capability. FedADG employs the federated adversarial learning approach to measure and align the distributions among different source domains via matching each distribution to a reference distribution. The reference distribution is adaptively generated (by accommodating all source domains) to minimize the domain shift distance during alignment. In FedADG, the alignment is fine-grained since each class is aligned independently. In this way, the learned feature representation is supposed to be universal, so it can generalize well on the unseen domains. Extensive experiments on various datasets demonstrate that FedADG has better performance than most of the previous solutions even if they have an additional advantage that allows centralized data access. To support study reproducibility, the project codes are available in https://github.com/wzml/FedADG