Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Indiscriminate Poisoning Attacks Are Shortcuts

Nov 01, 2021
Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

Indiscriminate data poisoning attacks, which add imperceptible perturbations to training data to maximize the test error of trained models, have become a trendy topic because they are thought to be capable of preventing unauthorized use of data. In this work, we investigate why these perturbations work in principle. We find that the perturbations of advanced poisoning attacks are almost \textbf{linear separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective. This important population property has not been unveiled before. Moreover, we further verify that linear separability is indeed the workhorse for poisoning attacks. We synthesize linear separable data as perturbations and show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the \emph{shortcut learning} problem is more serious than previously believed as deep learning heavily relies on shortcuts even if they are of an imperceptible scale and mixed together with the normal features. This finding also suggests that pre-trained feature extractors would disable these poisoning attacks effectively.

  Access Paper or Ask Questions

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

Oct 20, 2021
Mun-Hak Lee, Joon-Hyuk Chang

The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

* 4page + 1page for citation + 2 pages for appendix 

  Access Paper or Ask Questions

Formalisation of Action with Durations in Answer Set Programming

Sep 17, 2021
Etienne Tignon

In this paper, I will discuss the work I am currently doing as a Ph.D. student at the University of Potsdam, under the tutoring of T. Schaub. I'm currently looking into action description in ASP. More precisely, my goal is to explore how to represent actions with durations in ASP, in different contexts. Right now, I'm focused on Multi-Agent Path Finding (MAPF), looking at how to represent speeds for different agents and contexts. Before tackling duration, I wanted to explore and compare different representations of action taking in ASP. For this, I started comparing different simple encodings tackling the MAPF problem. Even in simple code, choices and assumptions have been made in their creations. The objective of my work is to present the consequences of those design decisions in terms of performance and knowledge representation. As far as I know, there is no current research on this topic. Besides that, I'm also exploring different ways to represent duration and to solve related problems. I planed to compare them the same way I described before. I also want this to help me find innovative and effective ways to solve problems with duration.

* EPTCS 345, 2021, pp. 305-309 
* In Proceedings ICLP 2021, arXiv:2109.07914 

  Access Paper or Ask Questions

TransClaw U-Net: Claw U-Net with Transformers for Medical Image Segmentation

Jul 12, 2021
Yao Chang, Hu Menghan, Zhai Guangtao, Zhang Xiao-Ping

In recent years, computer-aided diagnosis has become an increasingly popular topic. Methods based on convolutional neural networks have achieved good performance in medical image segmentation and classification. Due to the limitations of the convolution operation, the long-term spatial features are often not accurately obtained. Hence, we propose a TransClaw U-Net network structure, which combines the convolution operation with the transformer operation in the encoding part. The convolution part is applied for extracting the shallow spatial features to facilitate the recovery of the image resolution after upsampling. The transformer part is used to encode the patches, and the self-attention mechanism is used to obtain global information between sequences. The decoding part retains the bottom upsampling structure for better detail segmentation performance. The experimental results on Synapse Multi-organ Segmentation Datasets show that the performance of TransClaw U-Net is better than other network structures. The ablation experiments also prove the generalization performance of TransClaw U-Net.

* 8 page, 3 figures 

  Access Paper or Ask Questions

Dynamical System Parameter Identification using Deep Recurrent Cell Networks

Jul 06, 2021
Erdem Akagündüz, Oguzhan Cifdaloz

In this paper, we investigate the parameter identification problem in dynamical systems through a deep learning approach. Focusing mainly on second-order, linear time-invariant dynamical systems, the topic of damping factor identification is studied. By utilizing a six-layer deep neural network with different recurrent cells, namely GRUs, LSTMs or BiLSTMs; and by feeding input-output sequence pairs captured from a dynamical system simulator, we search for an effective deep recurrent architecture in order to resolve damping factor identification problem. Our study results show that, although previously not utilized for this task in the literature, bidirectional gated recurrent cells (BiLSTMs) provide better parameter identification results when compared to unidirectional gated recurrent memory cells such as GRUs and LSTM. Thus, indicating that an input-output sequence pair of finite length, collected from a dynamical system and when observed anachronistically, may carry information in both time directions for prediction of a dynamical systems parameter.

* Final version published in Journal of Neural Computing and Applications 

  Access Paper or Ask Questions

Optimal Accounting of Differential Privacy via Characteristic Function

Jun 16, 2021
Yuqing Zhu, Jinshuo Dong, Yu-Xiang Wang

Characterizing the privacy degradation over compositions, i.e., privacy accounting, is a fundamental topic in differential privacy (DP) with many applications to differentially private machine learning and federated learning. We propose a unification of recent advances (Renyi DP, privacy profiles, $f$-DP and the PLD formalism) via the characteristic function ($\phi$-function) of a certain ``worst-case'' privacy loss random variable. We show that our approach allows natural adaptive composition like Renyi DP, provides exactly tight privacy accounting like PLD, and can be (often losslessly) converted to privacy profile and $f$-DP, thus providing $(\epsilon,\delta)$-DP guarantees and interpretable tradeoff functions. Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of $\phi$-functions symbolically and uses Gaussian quadrature for numerical computation. On several popular DP mechanisms and their subsampled counterparts, we demonstrate the flexibility and tightness of our approach in theory and experiments.

  Access Paper or Ask Questions

Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment

Jun 03, 2021
Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo

First person action recognition is an increasingly researched topic because of the growing popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic environmental bias. This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods in real settings where trimmed labeled data are not available during training. In this work, we propose to leverage over the intrinsic complementary nature of audio-visual signals to learn a representation that works well on data seen during training, while being able to generalize across different domains. To this end, we introduce an audio-visual loss that aligns the contributions from the two modalities by acting on the magnitude of their feature norm representations. This new loss, plugged into a minimal multi-modal action recognition architecture, leads to strong results in cross-domain first person action recognition, as demonstrated by extensive experiments on the popular EPIC-Kitchens dataset.

* 11 pages, 7 figures 

  Access Paper or Ask Questions

Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors

May 06, 2021
Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qionghai Dai, Yebin Liu

Human volumetric capture is a long-standing topic in computer vision and computer graphics. Although high-quality results can be achieved using sophisticated off-line systems, real-time human volumetric capture of complex scenarios, especially using light-weight setups, remains challenging. In this paper, we propose a human volumetric capture method that combines temporal volumetric fusion and deep implicit functions. To achieve high-quality and temporal-continuous reconstruction, we propose dynamic sliding fusion to fuse neighboring depth observations together with topology consistency. Moreover, for detailed and complete surface generation, we propose detail-preserving deep implicit functions for RGBD input which can not only preserve the geometric details on the depth inputs but also generate more plausible texturing results. Results and experiments show that our method outperforms existing methods in terms of view sparsity, generalization capacity, reconstruction quality, and run-time efficiency.

* CVPR 2021 Oral Paper, Project Page:, THuman2.0 dataset available. Youtube: 

  Access Paper or Ask Questions

Serial-parallel Multi-Scale Feature Fusion for Anatomy-Oriented Hand Joint Detection

Feb 19, 2021
Bin Li, Hong Fu, Ruimin Li, Wendi Wang

Accurate hand joints detection from images is a fundamental topic which is essential for many applications in computer vision and human computer interaction. This paper presents a two stage network for hand joints detection from single unmarked image by using serial-parallel multi-scale feature fusion. In stage I, the hand regions are located by a pre-trained network, and the features of each detected hand region are extracted by a shallow spatial hand features representation module. The extracted hand features are then fed into stage II, which consists of serially connected feature extraction modules with similar structures, called "multi-scale feature fusion" (MSFF). A MSFF contains parallel multi-scale feature extraction branches, which generate initial hand joint heatmaps. The initial heatmaps are then mutually reinforced by the anatomic relationship between hand joints. The experimental results on five hand joints datasets show that the proposed network overperforms the state-of-the-art methods.

  Access Paper or Ask Questions

Time to Transfer: Predicting and Evaluating Machine-Human Chatting Handoff

Dec 14, 2020
Jiawei Liu, Zhe Gao, Yangyang Kang, Zhuoren Jiang, Guoxiu He, Changlong Sun, Xiaozhong Liu, Wei Lu

Is chatbot able to completely replace the human agent? The short answer could be - "it depends...". For some challenging cases, e.g., dialogue's topical spectrum spreads beyond the training corpus coverage, the chatbot may malfunction and return unsatisfied utterances. This problem can be addressed by introducing the Machine-Human Chatting Handoff (MHCH), which enables human-algorithm collaboration. To detect the normal/transferable utterances, we propose a Difficulty-Assisted Matching Inference (DAMI) network, utilizing difficulty-assisted encoding to enhance the representations of utterances. Moreover, a matching inference mechanism is introduced to capture the contextual matching features. A new evaluation metric, Golden Transfer within Tolerance (GT-T), is proposed to assess the performance by considering the tolerance property of the MHCH. To provide insights into the task and validate the proposed model, we collect two new datasets. Extensive experimental results are presented and contrasted against a series of baseline models to demonstrate the efficacy of our model on MHCH.

* 9pages, 4 figures, accepted by AAAI 2021 

  Access Paper or Ask Questions