Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc. Various LPDG methods based on graph embedding and graph neural networks have been recently proposed and achieved state-of-the-art performance. In this paper, we study the vulnerability of LPDG methods and propose the first practical black-box evasion attack. Specifically, given a trained LPDG model, our attack aims to perturb the graph structure, without knowing to model parameters, model architecture, etc., such that the LPDG model makes as many wrong predicted links as possible. We design our attack based on a stochastic policy-based RL algorithm. Moreover, we evaluate our attack on three real-world graph datasets from different application domains. Experimental results show that our attack is both effective and efficient.
Graph neural networks (GNNs) have achieved state-of-the-art performance in many graph-related tasks, e.g., node classification. However, recent works show that GNNs are vulnerable to evasion attacks, i.e., an attacker can slightly perturb the graph structure to fool GNN models. Existing evasion attacks to GNNs have several key drawbacks: 1) they are limited to attack two-layer GNNs; 2) they are not efficient; or/and 3) they need to know GNN model parameters. We address the above drawbacks in this paper and propose an influence-based evasion attack against GNNs. Specifically, we first introduce two influence functions, i.e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively. Then, we build a strong connection between GNNs and LP in terms of influence. Next, we reformulate the evasion attack against GNNs to be related to calculating label influence on LP, which is applicable to multi-layer GNNs and does not need to know the GNN model. We also propose an efficient algorithm to calculate label influence. Finally, we evaluate our influence-based attack on three benchmark graph datasets. Our experimental results show that, compared to state-of-the-art attack, our attack can achieve comparable attack performance, but has a 5-50x speedup when attacking two-layer GNNs. Moreover, our attack is effective to attack multi-layer GNNs.
Graph neural networks (GNNs) have achieved state-of-the-art performance in many graph-related tasks, e.g., node classification. However, recent works show that GNNs are vulnerable to evasion attacks, i.e., an attacker can perturb the graph structure to fool trained GNN models. Existing evasion attacks to GNNs have two key drawbacks. First, perturbing the graph structure to fool GNN models is essentially a binary optimization problem, while it is often solved via approximate algorithms with sub-optimal solutions. Second, existing attacks are only applicable to two-layer GNNs. In this paper, we aim to address the above drawbacks and propose to attack GNNs via influence function, a completely different perspective from existing works. Specifically, we first build the connection between GNNs and label propagation in terms of influence function. Then, instead of solving an approximate algorithm, we reformulate the attack to be related to (label) influence, which is applicable to multi-layer GNNs and whose solution can be calculated directly. We evaluate our attack on various benchmark graph datasets. Experimental results demonstrate that, compared to state-of-the-art attack, our attack can achieve higher attack success rate and has a 10-100x speedup when attacking two-layer GNNs. Moreover, our attack is also very effective to attack multi-layer GNNs.
Query-based moment localization is a new task that localizes the best matched segment in an untrimmed video according to a given sentence query. In this localization task, one should pay more attention to thoroughly mine visual and linguistic information. To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph. Specifically, the joint graph consists of Cross-Modal interaction Graph (CMG) and Self-Modal relation Graph (SMG), where frames and words are represented as nodes, and the relations between cross- and self-modal node pairs are described by an attention mechanism. Through parametric message passing, CMG highlights relevant instances across video and sentence, and then SMG models the pairwise relation inside each modality for frame (word) correlating. With multiple layers of such a joint graph, our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization. Besides, to better comprehend the contextual details in the query, we develop a hierarchical sentence encoder to enhance the query understanding. Extensive experiments on four public datasets demonstrate the effectiveness of our proposed model, and GCSMAN significantly outperforms the state-of-the-arts.
With the development of deep learning technologies, attribute recognition and person re-identification (re-ID) have attracted extensive attention and achieved continuous improvement via executing computing-intensive deep neural networks in cloud datacenters. However, the datacenter deployment cannot meet the real-time requirement of attribute recognition and person re-ID, due to the prohibitive delay of backhaul networks and large data transmissions from cameras to datacenters. A feasible solution thus is to employ mobile edge clouds (MEC) within the proximity of cameras and enable distributed inference. In this paper, we design novel models for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system. We also investigate the problem of distributed inference in the MEC-enabled camera network. To this end, we first propose a novel inference framework with a set of distributed modules, by jointly considering the attribute recognition and person re-ID. We then devise a learning-based algorithm for the distributions of the modules of the proposed distributed inference framework, considering the dynamic MEC-enabled camera network with uncertainties. We finally evaluate the performance of the proposed algorithm by both simulations with real datasets and system implementation in a real testbed. Evaluation results show that the performance of the proposed algorithm with distributed inference framework is promising, by reaching the accuracies of attribute recognition and person identification up to 92.9% and 96.6% respectively, and significantly reducing the inference delay by at least 40.6% compared with existing methods.
Temporal language localization in videos aims to ground one video segment in an untrimmed video based on a given sentence query. To tackle this task, designing an effective model to extract ground-ing information from both visual and textual modalities is crucial. However, most previous attempts in this field only focus on unidirectional interactions from video to query, which emphasizes which words to listen and attends to sentence information via vanilla soft attention, but clues from query-by-video interactions implying where to look are not taken into consideration. In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction. Specifically, in the iterative attention module, each word in the query is first enhanced by attending to each frame in the video through fine-grained attention, then video iteratively attends to the integrated query. Finally, both video and query information is utilized to provide robust cross-modal representation for further moment localization. In addition, to better predict the target segment, we propose a content-oriented localization strategy instead of applying recent anchor-based localization. We evaluate the proposed method on three challenging public benchmarks: Ac-tivityNet Captions, TACoS, and Charades-STA. FIAN significantly outperforms the state-of-the-art approaches.
Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) can suffer from inferior performance due to unstable training, especially for text generation. We propose a new variational GAN training framework which enjoys superior training stability. Our approach is inspired by a connection of GANs and reinforcement learning under a variational perspective. The connection leads to (1) probability ratio clipping that regularizes generator training to prevent excessively large updates, and (2) a sample re-weighting mechanism that stabilizes discriminator training by downplaying bad-quality fake samples. We provide theoretical analysis on the convergence of our approach. By plugging the training approach in diverse state-of-the-art GAN architectures, we obtain significantly improved performance over a range of tasks, including text generation, text style transfer, and image generation.
Despite its high search efficiency, differential architecture search (DARTS) often selects network architectures with dominated skip connections which lead to performance degradation. However, theoretical understandings on this issue remain absent, hindering the development of more advanced methods in a principled way. In this work, we solve this problem by theoretically analyzing the effects of various types of operations, e.g. convolution, skip connection and zero operation, to the network optimization. We prove that the architectures with more skip connections can converge faster than the other candidates, and thus are selected by DARTS. This result, for the first time, theoretically and explicitly reveals the impact of skip connections to fast network optimization and its competitive advantage over other types of operations in DARTS. Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search. Experimental results on image classification tasks validate its advantages. Codes and models will be released.