Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Utilizing Textual Reviews in Latent Factor Models for Recommender Systems

Nov 16, 2021
Tatev Karen Aslanyan, Flavius Frasincar

Most of the existing recommender systems are based only on the rating data, and they ignore other sources of information that might increase the quality of recommendations, such as textual reviews, or user and item characteristics. Moreover, the majority of those systems are applicable only on small datasets (with thousands of observations) and are unable to handle large datasets (with millions of observations). We propose a recommender algorithm that combines a rating modelling technique (i.e., Latent Factor Model) with a topic modelling method based on textual reviews (i.e., Latent Dirichlet Allocation), and we extend the algorithm such that it allows adding extra user- and item-specific information to the system. We evaluate the performance of the algorithm using datasets with different sizes, corresponding to 23 product categories. After comparing the built model to four other models we found that combining textual reviews with ratings leads to better recommendations. Moreover, we found that adding extra user and item features to the model increases its prediction accuracy, which is especially true for medium and large datasets.

* The 36th ACM/SIGAPP Symposium on Applied Computing (SAC '21), March 22--26, 2021, Virtual Event, Republic of Korea 

  Access Paper or Ask Questions

Classifying Human Activities with Inertial Sensors: A Machine Learning Approach

Nov 09, 2021
Hamza Ali Imran, Saad Wazir, Usman Iftikhar, Usama Latif

Human Activity Recognition (HAR) is an ongoing research topic. It has applications in medical support, sports, fitness, social networking, human-computer interfaces, senior care, entertainment, surveillance, and the list goes on. Traditionally, computer vision methods were employed for HAR, which has numerous problems such as secrecy or privacy, the influence of environmental factors, less mobility, higher running costs, occlusion, and so on. A new trend in the use of sensors, especially inertial sensors, has lately emerged. There are several advantages of employing sensor data as an alternative to traditional computer vision algorithms. Many of the limitations of computer vision algorithms have been documented in the literature, including research on Deep Neural Network (DNN) and Machine Learning (ML) approaches for activity categorization utilizing sensor data. We examined and analyzed different Machine Learning and Deep Learning approaches for Human Activity Recognition using inertial sensor data of smartphones. In order to identify which approach is best suited for this application.

  Access Paper or Ask Questions

Indiscriminate Poisoning Attacks Are Shortcuts

Nov 01, 2021
Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

Indiscriminate data poisoning attacks, which add imperceptible perturbations to training data to maximize the test error of trained models, have become a trendy topic because they are thought to be capable of preventing unauthorized use of data. In this work, we investigate why these perturbations work in principle. We find that the perturbations of advanced poisoning attacks are almost \textbf{linear separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective. This important population property has not been unveiled before. Moreover, we further verify that linear separability is indeed the workhorse for poisoning attacks. We synthesize linear separable data as perturbations and show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the \emph{shortcut learning} problem is more serious than previously believed as deep learning heavily relies on shortcuts even if they are of an imperceptible scale and mixed together with the normal features. This finding also suggests that pre-trained feature extractors would disable these poisoning attacks effectively.

  Access Paper or Ask Questions

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

Oct 20, 2021
Mun-Hak Lee, Joon-Hyuk Chang

The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

* 4page + 1page for citation + 2 pages for appendix 

  Access Paper or Ask Questions

Formalisation of Action with Durations in Answer Set Programming

Sep 17, 2021
Etienne Tignon

In this paper, I will discuss the work I am currently doing as a Ph.D. student at the University of Potsdam, under the tutoring of T. Schaub. I'm currently looking into action description in ASP. More precisely, my goal is to explore how to represent actions with durations in ASP, in different contexts. Right now, I'm focused on Multi-Agent Path Finding (MAPF), looking at how to represent speeds for different agents and contexts. Before tackling duration, I wanted to explore and compare different representations of action taking in ASP. For this, I started comparing different simple encodings tackling the MAPF problem. Even in simple code, choices and assumptions have been made in their creations. The objective of my work is to present the consequences of those design decisions in terms of performance and knowledge representation. As far as I know, there is no current research on this topic. Besides that, I'm also exploring different ways to represent duration and to solve related problems. I planed to compare them the same way I described before. I also want this to help me find innovative and effective ways to solve problems with duration.

* EPTCS 345, 2021, pp. 305-309 
* In Proceedings ICLP 2021, arXiv:2109.07914 

  Access Paper or Ask Questions

TransClaw U-Net: Claw U-Net with Transformers for Medical Image Segmentation

Jul 12, 2021
Yao Chang, Hu Menghan, Zhai Guangtao, Zhang Xiao-Ping

In recent years, computer-aided diagnosis has become an increasingly popular topic. Methods based on convolutional neural networks have achieved good performance in medical image segmentation and classification. Due to the limitations of the convolution operation, the long-term spatial features are often not accurately obtained. Hence, we propose a TransClaw U-Net network structure, which combines the convolution operation with the transformer operation in the encoding part. The convolution part is applied for extracting the shallow spatial features to facilitate the recovery of the image resolution after upsampling. The transformer part is used to encode the patches, and the self-attention mechanism is used to obtain global information between sequences. The decoding part retains the bottom upsampling structure for better detail segmentation performance. The experimental results on Synapse Multi-organ Segmentation Datasets show that the performance of TransClaw U-Net is better than other network structures. The ablation experiments also prove the generalization performance of TransClaw U-Net.

* 8 page, 3 figures 

  Access Paper or Ask Questions

Dynamical System Parameter Identification using Deep Recurrent Cell Networks

Jul 06, 2021
Erdem Akagündüz, Oguzhan Cifdaloz

In this paper, we investigate the parameter identification problem in dynamical systems through a deep learning approach. Focusing mainly on second-order, linear time-invariant dynamical systems, the topic of damping factor identification is studied. By utilizing a six-layer deep neural network with different recurrent cells, namely GRUs, LSTMs or BiLSTMs; and by feeding input-output sequence pairs captured from a dynamical system simulator, we search for an effective deep recurrent architecture in order to resolve damping factor identification problem. Our study results show that, although previously not utilized for this task in the literature, bidirectional gated recurrent cells (BiLSTMs) provide better parameter identification results when compared to unidirectional gated recurrent memory cells such as GRUs and LSTM. Thus, indicating that an input-output sequence pair of finite length, collected from a dynamical system and when observed anachronistically, may carry information in both time directions for prediction of a dynamical systems parameter.

* Final version published in Journal of Neural Computing and Applications 

  Access Paper or Ask Questions

Optimal Accounting of Differential Privacy via Characteristic Function

Jun 16, 2021
Yuqing Zhu, Jinshuo Dong, Yu-Xiang Wang

Characterizing the privacy degradation over compositions, i.e., privacy accounting, is a fundamental topic in differential privacy (DP) with many applications to differentially private machine learning and federated learning. We propose a unification of recent advances (Renyi DP, privacy profiles, $f$-DP and the PLD formalism) via the characteristic function ($\phi$-function) of a certain ``worst-case'' privacy loss random variable. We show that our approach allows natural adaptive composition like Renyi DP, provides exactly tight privacy accounting like PLD, and can be (often losslessly) converted to privacy profile and $f$-DP, thus providing $(\epsilon,\delta)$-DP guarantees and interpretable tradeoff functions. Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of $\phi$-functions symbolically and uses Gaussian quadrature for numerical computation. On several popular DP mechanisms and their subsampled counterparts, we demonstrate the flexibility and tightness of our approach in theory and experiments.

  Access Paper or Ask Questions

Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment

Jun 03, 2021
Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo

First person action recognition is an increasingly researched topic because of the growing popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic environmental bias. This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods in real settings where trimmed labeled data are not available during training. In this work, we propose to leverage over the intrinsic complementary nature of audio-visual signals to learn a representation that works well on data seen during training, while being able to generalize across different domains. To this end, we introduce an audio-visual loss that aligns the contributions from the two modalities by acting on the magnitude of their feature norm representations. This new loss, plugged into a minimal multi-modal action recognition architecture, leads to strong results in cross-domain first person action recognition, as demonstrated by extensive experiments on the popular EPIC-Kitchens dataset.

* 11 pages, 7 figures 

  Access Paper or Ask Questions

Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors

May 06, 2021
Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qionghai Dai, Yebin Liu

Human volumetric capture is a long-standing topic in computer vision and computer graphics. Although high-quality results can be achieved using sophisticated off-line systems, real-time human volumetric capture of complex scenarios, especially using light-weight setups, remains challenging. In this paper, we propose a human volumetric capture method that combines temporal volumetric fusion and deep implicit functions. To achieve high-quality and temporal-continuous reconstruction, we propose dynamic sliding fusion to fuse neighboring depth observations together with topology consistency. Moreover, for detailed and complete surface generation, we propose detail-preserving deep implicit functions for RGBD input which can not only preserve the geometric details on the depth inputs but also generate more plausible texturing results. Results and experiments show that our method outperforms existing methods in terms of view sparsity, generalization capacity, reconstruction quality, and run-time efficiency.

* CVPR 2021 Oral Paper, Project Page:, THuman2.0 dataset available. Youtube: 

  Access Paper or Ask Questions