Recommender systems are often susceptible to well-crafted fake profiles, leading to biased recommendations. The wide application of recommender systems makes studying the defense against attack necessary. Among existing defense methods, data-processing-based methods inevitably exclude normal samples, while model-based methods struggle to enjoy both generalization and robustness. Considering the above limitations, we suggest integrating data processing and robust model and propose a general framework, Triple Cooperative Defense (TCD), which cooperates to improve model robustness through the co-training of three models. Specifically, in each round of training, we sequentially use the high-confidence prediction ratings (consistent ratings) of any two models as auxiliary training data for the remaining model, and the three models cooperatively improve recommendation robustness. Notably, TCD adds pseudo label data instead of deleting abnormal data, which avoids the cleaning of normal data, and the cooperative training of the three models is also beneficial to model generalization. Through extensive experiments with five poisoning attacks on three real-world datasets, the results show that the robustness improvement of TCD significantly outperforms baselines. It is worth mentioning that TCD is also beneficial for model generalizations.
Many data mining tasks rely on graphs to model relational structures among individuals (nodes). Since relational data are often sensitive, there is an urgent need to evaluate the privacy risks in graph data. One famous privacy attack against data analysis models is the model inversion attack, which aims to infer sensitive data in the training dataset and leads to great privacy concerns. Despite its success in grid-like domains, directly applying model inversion attacks on non-grid domains such as graph leads to poor attack performance. This is mainly due to the failure to consider the unique properties of graphs. To bridge this gap, we conduct a systematic study on model inversion attacks against Graph Neural Networks (GNNs), one of the state-of-the-art graph analysis tools in this paper. Firstly, in the white-box setting where the attacker has full access to the target GNN model, we present GraphMI to infer the private training graph data. Specifically, in GraphMI, a projected gradient module is proposed to tackle the discreteness of graph edges and preserve the sparsity and smoothness of graph features; a graph auto-encoder module is used to efficiently exploit graph topology, node attributes, and target model parameters for edge inference; a random sampling module can finally sample discrete edges. Furthermore, in the hard-label black-box setting where the attacker can only query the GNN API and receive the classification results, we propose two methods based on gradient estimation and reinforcement learning (RL-GraphMI). Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
Deep recommender systems jointly leverage the retrieval and ranking operations to generate the recommendation result. The retriever targets selecting a small set of relevant candidates from the entire items with high efficiency; while the ranker, usually more precise but time-consuming, is supposed to identify the best items out of the retrieved candidates with high precision. However, the retriever and ranker are usually trained in poorly-cooperative ways, leading to limited recommendation performances when working as an entirety. In this work, we propose a novel DRS training framework CoRR(short for Cooperative Retriever and Ranker), where the retriever and ranker can be mutually reinforced. On one hand, the retriever is learned from recommendation data and the ranker via knowledge distillation; knowing that the ranker is more precise, the knowledge distillation may provide extra weak-supervision signals for the improvement of retrieval quality. On the other hand, the ranker is trained by learning to discriminate the truth positive items from hard negative candidates sampled from the retriever. With the iteration going on, the ranker may become more precise, which in return gives rise to informative training signals for the retriever; meanwhile, with the improvement of retriever, harder negative candidates can be sampled, which contributes to a higher discriminative capability of the ranker. To facilitate the effective conduct of CoRR, an asymptotic-unbiased approximation of KL divergence is introduced for the knowledge distillation over sampled items; besides, a scalable and adaptive strategy is developed to efficiently sample from the retriever. Comprehensive experimental studies are performed over four large-scale benchmark datasets, where CoRR improves the overall recommendation quality resulting from the cooperation between retriever and ranker.
Affordance-centric Question-driven Task Completion for Egocentric Assistant(AQTC) is a novel task which helps AI assistant learn from instructional videos and scripts and guide the user step-by-step. In this paper, we deal with the AQTC via a two-stage Function-centric approach, which consists of Question2Function Module to ground the question with the related function and Function2Answer Module to predict the action based on the historical steps. We evaluated several possible solutions in each module and obtained significant gains compared to the given baselines. Our code is available at \url{https://github.com/starsholic/LOVEU-CVPR22-AQTC}.
Factorization machines (FMs) are widely used in recommender systems due to their adaptability and ability to learn from sparse data. However, for the ubiquitous non-interactive features in sparse data, existing FMs can only estimate the parameters corresponding to these features via the inner product of their embeddings. Undeniably, they cannot learn the direct interactions of these features, which limits the model's expressive power. To this end, we first present MixFM, inspired by Mixup, to generate auxiliary training data to boost FMs. Unlike existing augmentation strategies that require labor costs and expertise to collect additional information such as position and fields, these extra data generated by MixFM only by the convex combination of the raw ones without any professional knowledge support. More importantly, if the parent samples to be mixed have non-interactive features, MixFM will establish their direct interactions. Second, considering that MixFM may generate redundant or even detrimental instances, we further put forward a novel Factorization Machine powered by Saliency-guided Mixup (denoted as SMFM). Guided by the customized saliency, SMFM can generate more informative neighbor data. Through theoretical analysis, we prove that the proposed methods minimize the upper bound of the generalization error, which hold a beneficial effect on enhancing FMs. Significantly, we give the first generalization bound of FM, implying the generalization requires more data and a smaller embedding size under the sufficient representation capability. Finally, extensive experiments on five datasets confirm that our approaches are superior to baselines. Besides, the results show that "poisoning" mixed data is likewise beneficial to the FM variants.
Recommender retrievers aim to rapidly retrieve a fraction of items from the entire item corpus when a user query requests, with the representative two-tower model trained with the log softmax loss. For efficiently training recommender retrievers on modern hardwares, inbatch sampling, where the items in the mini-batch are shared as negatives to estimate the softmax function, has attained growing interest. However, existing inbatch sampling based strategies just correct the sampling bias of inbatch items with item frequency, being unable to distinguish the user queries within the mini-batch and still incurring significant bias from the softmax. In this paper, we propose a Cache-Augmented Inbatch Importance Resampling (XIR) for training recommender retrievers, which not only offers different negatives to user queries with inbatch items, but also adaptively achieves a more accurate estimation of the softmax distribution. Specifically, XIR resamples items for the given mini-batch training pairs based on certain probabilities, where a cache with more frequently sampled items is adopted to augment the candidate item set, with the purpose of reusing the historical informative samples. XIR enables to sample query-dependent negatives based on inbatch items and to capture dynamic changes of model training, which leads to a better approximation of the softmax and further contributes to better convergence. Finally, we conduct experiments to validate the superior performance of the proposed XIR compared with competitive approaches.
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters. However, the effectiveness of neural-based approaches to this task is severely limited by the available training corpus, especially for domain adaptation where in-domain triplet training data is scarce or nonexistent. In this paper, we propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system. To this end, we first incorporate an additional encoder into the pre-trained E2E-ST model to realize text translation modelling, and then unify the decoder's output representation for text and speech translation tasks by reducing the correspondent representation mismatch in available triplet training data. During domain adaptation, a k-nearest-neighbor (kNN) classifier is introduced to produce the final translation distribution using the external datastore built by the domain-specific text translation corpus, while the universal output representation is adopted to perform a similarity search. Experiments on the Europarl-ST benchmark demonstrate that when in-domain text translation data is involved only, our proposed approach significantly improves baseline by 12.82 BLEU on average in all translation directions, even outperforming the strong in-domain fine-tuning method.
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain. Existing studies in this task attach more attention to the sequence modeling of sentences while largely ignoring the rich domain-invariant semantics embedded in graph structures (i.e., the part-of-speech tags and dependency relations). As an important aspect of exploring characteristics of language comprehension, adaptive graph representations have played an essential role in recent years. To this end, in the paper, we aim to explore the possibility of learning invariant semantic features from graph-like structures in CDSC. Specifically, we present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs. More specifically, we first raise a POS-Transformer module to extract sequential semantic features from the word sequences as well as the part-of-speech tags. Then, we design a Hybrid Graph Attention (HGAT) module to generate syntax-based semantic features by considering the transferable dependency relations. Finally, we devise an Integrated aDaptive Strategy (IDS) to guide the joint learning process of both modules. Extensive experiments on four public datasets indicate that GAST achieves comparable effectiveness to a range of state-of-the-art models.
Network-aware cascade size prediction aims to predict the final reposted number of user-generated information via modeling the propagation process in social networks. Estimating the user's reposting probability by social influence, namely state activation plays an important role in the information diffusion process. Therefore, Graph Neural Networks (GNN), which can simulate the information interaction between nodes, has been proved as an effective scheme to handle this prediction task. However, existing studies including GNN-based models usually neglect a vital factor of user's preference which influences the state activation deeply. To that end, we propose a novel framework to promote cascade size prediction by enhancing the user preference modeling according to three stages, i.e., preference topics generation, preference shift modeling, and social influence activation. Our end-to-end method makes the user activating process of information diffusion more adaptive and accurate. Extensive experiments on two large-scale real-world datasets have clearly demonstrated the effectiveness of our proposed model compared to state-of-the-art baselines.