The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing procedure based on Poincar\'e distance metric, and use the induced relations to augment current graph-based parsers for better schema linking. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences, even when surface forms of mentions and entities differ. Moreover, our probing procedure is entirely unsupervised and requires no additional parameters. Extensive experiments show that our framework sets new state-of-the-art performance on three benchmarks. We empirically verify that our probing procedure can indeed find desired relational structures through qualitative analysis.
Recently, how to protect the Intellectual Property (IP) of deep neural networks (DNN) becomes a major concern for the AI industry. To combat potential model piracy, recent works explore various watermarking strategies to embed secret identity messages into the prediction behaviors or the internals (e.g., weights and neuron activation) of the target model. Sacrificing less functionality and involving more knowledge about the target model, the latter branch of watermarking schemes (i.e., white-box model watermarking) is claimed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts and applications in the industry. In this paper, we present the first effective removal attack which cracks almost all the existing white-box watermarking schemes with provably no performance overhead and no required prior knowledge. By analyzing these IP protection mechanisms at the granularity of neurons, we for the first time discover their common dependence on a set of fragile features of a local neuron group, all of which can be arbitrarily tampered by our proposed chain of invariant neuron transforms. On $9$ state-of-the-art white-box watermarking schemes and a broad set of industry-level DNN architectures, our attack for the first time reduces the embedded identity message in the protected models to be almost random. Meanwhile, unlike known removal attacks, our attack requires no prior knowledge on the training data distribution or the adopted watermark algorithms, and leaves model functionality intact.
Temporal action detection (TAD) is extensively studied in the video understanding community by following the object detection pipelines in images. However, complex designs are not uncommon in TAD, such as two-stream feature extraction, multi-stage training, complex temporal modeling, and global context fusion. In this paper, we do not aim to introduce any novel technique for TAD. Instead, we study a simple, straightforward, yet must-known baseline given the current status of complex design and low efficiency in TAD. In our simple baseline (BasicTAD), we decompose the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head. We empirically investigate the existing techniques in each component for this baseline and, more importantly, perform end-to-end training over the entire pipeline thanks to the simplicity in design. Our BasicTAD yields an astounding RGB-Only baseline very close to the state-of-the-art methods with two-stream inputs. In addition, we further improve the BasicTAD by preserving more temporal and spatial information in network representation (termed as BasicTAD Plus). Empirical results demonstrate that our BasicTAD Plus is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction. Our approach can serve as a strong baseline for TAD. The code will be released at https://github.com/MCG-NJU/BasicTAD.
Recently, how to protect the Intellectual Property (IP) of deep neural networks (DNN) becomes a major concern for the AI industry. To combat potential model piracy, recent works explore various watermarking strategies to embed secret identity messages into the prediction behaviors or the internals (e.g., weights and neuron activation) of the target model. Sacrificing less functionality and involving more knowledge about the target model, the latter branch of watermarking schemes (i.e., white-box model watermarking) is claimed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts and applications in the industry. In this paper, we present the first effective removal attack which cracks almost all the existing white-box watermarking schemes with provably no performance overhead and no required prior knowledge. By analyzing these IP protection mechanisms at the granularity of neurons, we for the first time discover their common dependence on a set of fragile features of a local neuron group, all of which can be arbitrarily tampered by our proposed chain of invariant neuron transforms. On $9$ state-of-the-art white-box watermarking schemes and a broad set of industry-level DNN architectures, our attack for the first time reduces the embedded identity message in the protected models to be almost random. Meanwhile, unlike known removal attacks, our attack requires no prior knowledge on the training data distribution or the adopted watermark algorithms, and leaves model functionality intact.
The adversarial robustness of a neural network mainly relies on two factors, one is the feature representation capacity of the network, and the other is its resistance ability to perturbations. In this paper, we study the anti-perturbation ability of the network from the feature maps of convolutional layers. Our theoretical analysis discovers that larger convolutional features before average pooling can contribute to better resistance to perturbations, but the conclusion is not true for max pooling. Based on the theoretical findings, we present two feasible ways to improve the robustness of existing neural networks. The proposed approaches are very simple and only require upsampling the inputs or modifying the stride configuration of convolution operators. We test our approaches on several benchmark neural network architectures, including AlexNet, VGG16, RestNet18 and PreActResNet18, and achieve non-trivial improvements on both natural accuracy and robustness under various attacks. Our study brings new insights into the design of robust neural networks. The code is available at \url{https://github.com/MTandHJ/rcm}.
Deep image inpainting research mainly focuses on constructing various neural network architectures or imposing novel optimization objectives. However, on the one hand, building a state-of-the-art deep inpainting model is an extremely complex task, and on the other hand, the resulting performance gains are sometimes very limited. We believe that besides the frameworks of inpainting models, lightweight traditional image processing techniques, which are often overlooked, can actually be helpful to these deep models. In this paper, we enhance the deep image inpainting models with the help of classical image complexity metrics. A knowledge-assisted index composed of missingness complexity and forward loss is presented to guide the batch selection in the training procedure. This index helps find samples that are more conducive to optimization in each iteration and ultimately boost the overall inpainting performance. The proposed approach is simple and can be plugged into many deep inpainting models by changing only a few lines of code. We experimentally demonstrate the improvements for several recently developed image inpainting models on various datasets.
Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.
This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep learning methods, as well as new applications of NLG technology. This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various NLG tasks and datasets, and draw attention to the challenges in NLG evaluation, focusing on different evaluation methods and their relationships; (c) highlight some future emphasis and relatively recent research issues that arise due to the increasing synergy between NLG and other artificial intelligence areas, such as computer vision, text and computational creativity.
Deep neural networks (DNN) have achieved great success in the recommender systems (RS) domain. However, to achieve remarkable performance, DNN-based recommender models often require numerous parameters, which inevitably bring redundant neurons and weights, a phenomenon referred to as over-parameterization. In this paper, we plan to exploit such redundancy phenomena to improve the performance of RS. Specifically, we propose PCRec, a top-N item \underline{rec}ommendation framework that leverages collaborative training of two DNN-based recommender models with the same network structure, termed \underline{p}eer \underline{c}ollaboration. PCRec can reactivate and strengthen the unimportant (redundant) weights during training, which achieves higher prediction accuracy but maintains its original inference efficiency. To realize this, we first introduce two criteria to identify the importance of weights of a given recommender model. Then, we rejuvenate the unimportant weights by transplanting outside information (i.e., weights) from its peer network. After such an operation and retraining, the original recommender model is endowed with more representation capacity by possessing more functional model parameters. To show its generality, we instantiate PCRec by using three well-known recommender models. We conduct extensive experiments on three real-world datasets, and show that PCRec yields significantly better recommendations than its counterpart with the same model (parameter) size.