Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

A Random Matrix Perspective on Random Tensors

Aug 02, 2021
José Henrique de Morais Goulart, Romain Couillet, Pierre Comon

Tensor models play an increasingly prominent role in many fields, notably in machine learning. In several applications of such models, such as community detection, topic modeling and Gaussian mixture learning, one must estimate a low-rank signal from a noisy tensor. Hence, understanding the fundamental limits and the attainable performance of estimators of that signal inevitably calls for the study of random tensors. Substantial progress has been achieved on this subject thanks to recent efforts, under the assumption that the tensor dimensions grow large. Yet, some of the most significant among these results--in particular, a precise characterization of the abrupt phase transition (in terms of signal-to-noise ratio) that governs the performance of the maximum likelihood (ML) estimator of a symmetric rank-one model with Gaussian noise--were derived on the basis of statistical physics ideas, which are not easily accessible to non-experts. In this work, we develop a sharply distinct approach, relying instead on standard but powerful tools brought by years of advances in random matrix theory. The key idea is to study the spectra of random matrices arising from contractions of a given random tensor. We show how this gives access to spectral properties of the random tensor itself. In the specific case of a symmetric rank-one model with Gaussian noise, our technique yields a hitherto unknown characterization of the local maximum of the ML problem that is global above the phase transition threshold. This characterization is in terms of a fixed-point equation satisfied by a formula that had only been previously obtained via statistical physics methods. Moreover, our analysis sheds light on certain properties of the landscape of the ML problem in the large-dimensional setting. Our approach is versatile and can be extended to other models, such as asymmetric, non-Gaussian and higher-order ones.

  Access Paper or Ask Questions

A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

Sep 05, 2020
Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Yi Guo, Jun Cheng, Bin Hu, Xinwang Liu

Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40% Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information. Our codes are available at

* Codes are available at This article is an extended version of our conference (IJCAI-2020) paper at arXiv admin note: substantial text overlap with arXiv:2008.09435 

  Access Paper or Ask Questions

Examination of community sentiment dynamics due to covid-19 pandemic: a case study from Australia

Jun 22, 2020
Jianlong Zhou, Shuiqiao Yang, Chun Xiao, Fang Chen

The outbreak of the novel Coronavirus Disease 2019 (COVID-19) has caused unprecedented impacts to people's daily life around the world. Various measures and policies such as lock-down and social-distancing are implemented by governments to combat the disease during the pandemic period. These measures and policies as well as virus itself may cause different mental health issues to people such as depression, anxiety, sadness, etc. In this paper, we exploit the massive text data posted by Twitter users to analyse the sentiment dynamics of people living in the state of New South Wales (NSW) in Australia during the pandemic period. Different from the existing work that mostly focuses the country-level and static sentiment analysis, we analyse the sentiment dynamics at the fine-grained local government areas (LGAs). Based on the analysis of around 94 million tweets that posted by around 183 thousand users located at different LGAs in NSW in five months, we found that people in NSW showed an overall positive sentimental polarity and the COVID-19 pandemic decreased the overall positive sentimental polarity during the pandemic period. The fine-grained analysis of sentiment in LGAs found that despite the dominant positive sentiment most of days during the study period, some LGAs experienced significant sentiment changes from positive to negative. This study also analysed the sentimental dynamics delivered by the hot topics in Twitter such as government policies (e.g. the Australia's JobKeeper program, lock-down, social-distancing) as well as the focused social events (e.g. the Ruby Princess Cruise). The results showed that the policies and events did affect people's overall sentiment, and they affected people's overall sentiment differently at different stages.

  Access Paper or Ask Questions

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network

Apr 11, 2020
Md. Rezaul Karim, Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae, Michael Cochez

Exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices but also enables people to express anti-social behaviour like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize these data for social and anti-social behaviours analysis, document characterization, and sentiment analysis by predicting the contexts mostly for highly resourced languages such as English. However, there are languages that are under-resources, e.g., South Asian languages like Bengali, Tamil, Assamese, Telugu that lack of computational resources for the NLP tasks. In this paper, we provide several classification benchmarks for Bengali, an \texttt{under-resourced language}. We prepared three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively. We built the largest Bengali word embedding models to date based on 250 million articles, which we call \texttt{BengFastText}. We perform three different experiments, covering document classification, sentiment analysis, and hate speech detection. We incorporate word embeddings into a Multichannel Convolutional-LSTM~(\texttt{MConv-LSTM}) network for predicting different types of hate speech, document classification, and sentiment analysis. Experiments demonstrate that \texttt{BengFastText} can capture the semantics of words from respective contexts correctly. Evaluations against several baseline embedding models, e.g., Word2Vec and GloVe yield up to 92.30\%, 82.25\%, and 90.45\% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.

* This paper is under review in the Journal of Natural Language Engineering 

  Access Paper or Ask Questions

Taking a Stance on Fake News: Towards Automatic Disinformation Assessment via Deep Bidirectional Transformer Language Models for Stance Detection

Nov 27, 2019
Chris Dulhanty, Jason L. Deglint, Ibrahim Ben Daya, Alexander Wong

The exponential rise of social media and digital news in the past decade has had the unfortunate consequence of escalating what the United Nations has called a global topic of concern: the growing prevalence of disinformation. Given the complexity and time-consuming nature of combating disinformation through human assessment, one is motivated to explore harnessing AI solutions to automatically assess news articles for the presence of disinformation. A valuable first step towards automatic identification of disinformation is stance detection, where given a claim and a news article, the aim is to predict if the article agrees, disagrees, takes no position, or is unrelated to the claim. Existing approaches in literature have largely relied on hand-engineered features or shallow learned representations (e.g., word embeddings) to encode the claim-article pairs, which can limit the level of representational expressiveness needed to tackle the high complexity of disinformation identification. In this work, we explore the notion of harnessing large-scale deep bidirectional transformer language models for encoding claim-article pairs in an effort to construct state-of-the-art stance detection geared for identifying disinformation. Taking advantage of bidirectional cross-attention between claim-article pairs via pair encoding with self-attention, we construct a large-scale language model for stance detection by performing transfer learning on a RoBERTa deep bidirectional transformer language model, and were able to achieve state-of-the-art performance (weighted accuracy of 90.01%) on the Fake News Challenge Stage 1 (FNC-I) benchmark. These promising results serve as motivation for harnessing such large-scale language models as powerful building blocks for creating effective AI solutions to combat disinformation.

* Accepted to the AI for Social Good Workshop at NeurIPS 2019 

  Access Paper or Ask Questions

Capsule and convolutional neural network-based SAR ship classification in Sentinel-1 data

Oct 11, 2019
Leonardo De Laurentiis, Andrea Pomente, Fabio Del Frate, Giovanni Schiavon

Synthetic Aperture Radar (SAR) constitutes a fundamental asset for wide-areas monitoring with high-resolution requirements. The first SAR sensors have given rise to coarse coastal and maritime monitoring applications, including oil spill, ship and ice floes detection. With the upgrade to very high-resolution sensors in the recent years, with relatively new SAR missions such as Sentinel-1, a great deal of data providing a stronger information content has been released, enabling more refined studies on general targets features and thus permitting complex classifications, as for ship classification, which has become increasingly relevant given the growing need for coastal surveillance in commercial and military segments. In the last decade, several works focused on this topic have been presented, generally based on radiometric features processing; furthermore, in the very recent years a significant amount of research works have focused on emerging deep learning techniques, in particular on Convolutional Neural Networks (CNN). Recently Capsule Neural Networks (CapsNets) have been presented, demonstrating a notable improvement in capturing the properties of given entities, improving the use of spatial informations, in particular of spatial dependence between features, a severely lacking feature in CNNs. In fact, CNNs pooling operations have been criticized for losing spatial relations, thus special capsules, along with a new iterative routing-by-agreement mechanism, have been proposed. In this work a comparison between Capsule and CNNs potential in the ship classification application domain is shown, by leveraging the OpenSARShip, a SAR Sentinel-1 ship chips dataset; in particular, a performance comparison between capsule and various convolutional architectures is built, demonstrating better performances of CapsNet in classifying ships within a small dataset.

* SPIE Remote Sensing 2019: Proceedings Volume 11154, Active and Passive Microwave Remote Sensing for Environmental Monitoring III; 1115405 (2019) 
* Please check out the original SPIE paper for a complete list of figures, tables, references and general content 

  Access Paper or Ask Questions

Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue

Oct 15, 2021
Lena Reed, Cecilia Li, Angela Ramirez, Liren Wu, Marilyn Walker

One challenge with open-domain dialogue systems is the need to produce high-quality responses on any topic. We aim to improve the quality and coverage of Athena, an Alexa Prize dialogue system. We utilize Athena's response generators (RGs) to create training data for two new neural Meaning-to-Text RGs, Athena-GPT-Neo and Athena-Jurassic, for the movies, music, TV, sports, and video game domains. We conduct few-shot experiments, both within and cross-domain, with different tuning set sizes (2, 3, 10), prompt formats, and meaning representations (MRs) for sets of WikiData KG triples, and dialogue acts with 14 possible attribute combinations. Our evaluation uses BLEURT and human evaluation metrics, and shows that with 10-shot tuning, Athena-Jurassic's performance is significantly better for coherence and semantic accuracy. Experiments with 2-shot tuning on completely novel MRs results in a huge performance drop for Athena-GPT-Neo, whose semantic accuracy falls to 0.41, and whose untrue hallucination rate increases to 12%. Experiments with dialogue acts for video games show that with 10-shot tuning, both models learn to control dialogue acts, but Athena-Jurassic has significantly higher coherence, and only 4% untrue hallucinations. Our results suggest that Athena-Jurassic can reliably produce outputs of high-quality for live systems with real users. To our knowledge, these are the first results demonstrating that few-shot tuning on a massive language model can create NLGs that generalize to new domains, and produce high-quality, semantically-controlled, conversational responses directly from MRs and KG triples.

* The 12th International Workshop on Spoken Dialog System Technology, IWSDS 2021 

  Access Paper or Ask Questions

Multimodal Deception Detection in Videos via Analyzing Emotional State-based Feature

Apr 16, 2021
Jun-Teng Yang, Guei-Ming Liu, Scott C. -H Huang

Deception detection is an important task that has been a hot research topic due to its potential applications. It can be applied to many areas from national security (e.g, airport security, jurisprudence, and law enforcement) to real-life applications (e.g., business and computer vision). However, some critical problems still exist and worth more investigation. One of the major challenges is the data scarcity problem. Until now, only one multimodal benchmark dataset on deception detection has been published, which contains 121 video clips for deception detection (61 for deceptive class and 60 for truthful class). This amount of data is hard to drive deep neural network-based methods. Hence, they often suffered from the overfitting problem and the bad generalization ability. Also, the ground truth data contains some unusable frames for many factors including the face is too small to be recognized the facial expression, face is covered by text, file corruption, etc. However, most of the literature did not consider these problems. In this paper, we design a series of data preprocessing methods to deal with the problem first. Then, we propose a multimodal deception detection framework to construct our novel emotional state-based feature and used open toolkit openSMILE to extract the features from audio modality. A voting scheme is also designed to combine the emotional state information obtained from both visual modality and audio modality. Finally, the novel emotion state transformation (EST) feature is determined by our algorithm. The critical analysis and comparison of the proposed methods with the state-of-the-art multimodal method are showed that the overall performance has a great improvement of accuracy from 84.16% to 91.67% and ROC-AUC from 0.9211 to 0.9244.

* 7 pages 

  Access Paper or Ask Questions

Method and Dataset Entity Mining in Scientific Literature: A CNN + Bi-LSTM Model with Self-attention

Oct 26, 2020
Linlin Hou, Ji Zhang, Ou Wu, Ting Yu, Zhen Wang, Zhao Li, Jianliang Gao, Yingchun Ye, Rujing Yao

Literature analysis facilitates researchers to acquire a good understanding of the development of science and technology. The traditional literature analysis focuses largely on the literature metadata such as topics, authors, abstracts, keywords, references, etc., and little attention was paid to the main content of papers. In many scientific domains such as science, computing, engineering, etc., the methods and datasets involved in the scientific papers published in those domains carry important information and are quite useful for domain analysis as well as algorithm and dataset recommendation. In this paper, we propose a novel entity recognition model, called MDER, which is able to effectively extract the method and dataset entities from the main textual content of scientific papers. The model utilizes rule embedding and adopts a parallel structure of CNN and Bi-LSTM with the self-attention mechanism. We evaluate the proposed model on datasets which are constructed from the published papers of four research areas in computer science, i.e., NLP, CV, Data Mining and AI. The experimental results demonstrate that our model performs well in all the four areas and it features a good learning capacity for cross-area learning and recognition. We also conduct experiments to evaluate the effectiveness of different building modules within our model which indicate that the importance of different building modules in collectively contributing to the good entity recognition performance as a whole. The data augmentation experiments on our model demonstrated that data augmentation positively contributes to model training, making our model much more robust in dealing with the scenarios where only small number of training samples are available. We finally apply our model on PAKDD papers published from 2009-2019 to mine insightful results from scientific papers published in a longer time span.

  Access Paper or Ask Questions

BWCFace: Open-set Face Recognition using Body-worn Camera

Sep 24, 2020
Ali Almadan, Anoop Krishnan, Ajita Rattani

With computer vision reaching an inflection point in the past decade, face recognition technology has become pervasive in policing, intelligence gathering, and consumer applications. Recently, face recognition technology has been deployed on bodyworn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic using traditional techniques on datasets with small sample size. This paper aims to bridge the gap in the state-of-the-art face recognition using bodyworn cameras (BWC). To this aim, the contribution of this work is two-fold: (1) collection of a dataset called BWCFace consisting of a total of 178K facial images of 132 subjects captured using the body-worn camera in in-door and daylight conditions, and (2) open-set evaluation of the latest deep-learning-based Convolutional Neural Network (CNN) architectures combined with five different loss functions for face identification, on the collected dataset. Experimental results on our BWCFace dataset suggest a maximum of 33.89% Rank-1 accuracy obtained when facial features are extracted using SENet-50 trained on a large scale VGGFace2 facial image dataset. However, performance improved up to a maximum of 99.00% Rank-1 accuracy when pretrained CNN models are fine-tuned on a subset of identities in our BWCFace dataset. Equivalent performances were obtained across body-worn camera sensor models used in existing face datasets. The collected BWCFace dataset and the pretrained/ fine-tuned algorithms are publicly available to promote further research and development in this area. A downloadable link of this dataset and the algorithms is available by contacting the authors.

* 19th IEEE International Conference On Machine Learning And Applications 2020 | Miami, Florida 

  Access Paper or Ask Questions