Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Risk-Monotonicity via Distributional Robustness

Dec 16, 2020
Zakaria Mhammedi, Hisham Husain

Acquisition of data is a difficult task in most applications of Machine Learning (ML), and it is only natural that one hopes and expects lower populating risk (better performance) with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms such as the Empirical Risk Minimizer (ERM). Non-monotonic behaviour of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems not only highlight our lack of understanding of learning algorithms and generalization but rather render our efforts at data acquisition in vain. It is, therefore, crucial to pursue this concern and provide a characterization of such behaviour. In this paper, we derive the first consistent and risk-monotonic algorithms for a general statistical learning setting under weak assumptions, consequently resolving an open problem (Viering et al. 2019) on how to avoid non-monotonic behaviour of risk curves. Our algorithms make use of Distributionally Robust Optimization (DRO) -- a technique that has shown promise in other complications of deep learning such as adversarial training. Our work makes a significant contribution to the topic of risk-monotonicity, which may be key in resolving empirical phenomena such as double descent.

* There is a mistake in one of the derivations. It can be fixed, but the paper will change significantly 

  Access Paper or Ask Questions

Physical deep learning based on optimal control of dynamical systems

Dec 16, 2020
Genki Furuhata, Tomoaki Niiyama, Satoshi Sunada

A central topic in recent artificial intelligence technologies is deep learning, which can be regarded as a multilayer feedforward neural network. An essence of deep learning is the information propagation through the layers, suggesting a connection between deep neural networks and dynamical systems, in the sense that the information propagation is explicitly modeled by the time-evolution of dynamical systems. Here, we present a pattern recognition based on optimal control of continuous-time dynamical systems, which is suitable for physical hardware implementation. The learning is based on the adjoint method to optimally control dynamical systems, and the deep (virtual) network structures based on the time evolution of the systems can be used for processing input information. As an example, we apply the dynamics-based recognition approach to an optoelectronic delay system and show that the use of the delay system enables image recognition and nonlinear classifications with only a few control signals, in contrast to conventional multilayer neural networks which require training of a large number of weight parameters. The proposed approach enables to gain insight into mechanisms of deep network processing in the framework of an optimal control problem and opens a novel pathway to realize physical computing hardware.

* 13 pages, 9 figures 

  Access Paper or Ask Questions

Analysing Social Media Network Data with R: Semi-Automated Screening of Users, Comments and Communication Patterns

Nov 26, 2020
Dennis Klinkhammer

Communication on social media platforms is not only culturally and politically relevant, it is also increasingly widespread across societies. Users not only communicate via social media platforms, but also search specifically for information, disseminate it or post information themselves. However, fake news, hate speech and even radicalizing elements are part of this modern form of communication: Sometimes with far-reaching effects on individuals and societies. A basic understanding of these mechanisms and communication patterns could help to counteract negative forms of communication, e.g. bullying among children or extreme political points of view. To this end, a method will be presented in order to break down the underlying communication patterns, to trace individual users and to inspect their comments and range on social media platforms; Or to contrast them later on via qualitative research. This approeach can identify particularly active users with an accuracy of 100 percent, if the framing social networks as well as the topics are taken into account. However, methodological as well as counteracting approaches must be even more dynamic and flexible to ensure sensitivity and specifity regarding users who spread hate speech, fake news and radicalizing elements.

* 14 pages, 2 figures 

  Access Paper or Ask Questions

PairRE: Knowledge Graph Embeddings via Paired Relation Vectors

Nov 07, 2020
Linlin Chao, Jianshan He, Taifeng Wang, Wei Chu

Distance based knowledge graph embedding methods show promising results on link prediction task, on which two topics have been widely studied: one is the ability to handle complex relations, such as N-to-1, 1-to-N and N-to-N, the other is to encode various relation patterns, such as symmetry/antisymmetry. However, the existing methods fail to solve these two problems at the same time, which leads to unsatisfactory results. To mitigate this problem, we propose PairRE, a model with improved expressiveness and low computational requirement. PairRE represents each relation with paired vectors, where these paired vectors project connected two entities to relation specific locations. Beyond its ability to solve the aforementioned two problems, PairRE is advantageous to represent subrelation as it can capture both the similarities and differences of subrelations effectively. Given simple constraints on relation representations, PairRE can be the first model that is capable of encoding symmetry/antisymmetry, inverse, composition and subrelation relations. Experiments on link prediction benchmarks show PairRE can achieve either state-of-the-art or highly competitive performances. In addition, PairRE has shown encouraging results for encoding subrelation.

  Access Paper or Ask Questions

Commands 4 Autonomous Vehicles (C4AV) Workshop Summary

Sep 18, 2020
Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Yu Liu, Luc Van Gool, Matthew Blaschko, Tinne Tuytelaars, Marie-Francine Moens

The task of visual grounding requires locating the most relevant region or object in an image, given a natural language query. So far, progress on this task was mostly measured on curated datasets, which are not always representative of human spoken language. In this work, we deviate from recent, popular task settings and consider the problem under an autonomous vehicle scenario. In particular, we consider a situation where passengers can give free-form natural language commands to a vehicle which can be associated with an object in the street scene. To stimulate research on this topic, we have organized the \emph{Commands for Autonomous Vehicles} (C4AV) challenge based on the recent \emph{Talk2Car} dataset (URL: This paper presents the results of the challenge. First, we compare the used benchmark against existing datasets for visual grounding. Second, we identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding, in addition to detecting potential failure cases by evaluating on carefully selected subsets. Finally, we discuss several possibilities for future work.

  Access Paper or Ask Questions

What is important about the No Free Lunch theorems?

Jul 21, 2020
David H. Wolpert

The No Free Lunch theorems prove that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally. As I discuss in this chapter, the importance of the theorems arises by using them to analyze scenarios involving {non-uniform} distributions, and to compare different algorithms, without any assumption about the distribution over problems at all. In particular, the theorems prove that {anti}-cross-validation (choosing among a set of candidate algorithms based on which has {worst} out-of-sample behavior) performs as well as cross-validation, unless one makes an assumption -- which has never been formalized -- about how the distribution over induction problems, on the one hand, is related to the set of algorithms one is choosing among using (anti-)cross validation, on the other. In addition, they establish strong caveats concerning the significance of the many results in the literature which establish the strength of a particular algorithm without assuming a particular distribution. They also motivate a ``dictionary'' between supervised learning and improve blackbox optimization, which allows one to ``translate'' techniques from supervised learning into the domain of blackbox optimization, thereby strengthening blackbox optimization algorithms. In addition to these topics, I also briefly discuss their implications for philosophy of science.

* 15 pages, 11 of main text, to be published in "Black Box Optimization, Machine Learning and No-Free Lunch Theorems", P. Pardalos, V. Rasskazova, M.N. Vrahatis, Ed., Springer 

  Access Paper or Ask Questions

Unsupervised CT Metal Artifact Learning using Attention-guided beta-CycleGAN

Jul 07, 2020
Junghyun Lee, Jawook Gu, Jong Chul Ye

Metal artifact reduction (MAR) is one of the most important research topics in computed tomography (CT). With the advance of deep learning technology for image reconstruction,various deep learning methods have been also suggested for metal artifact removal, among which supervised learning methods are most popular. However, matched non-metal and metal image pairs are difficult to obtain in real CT acquisition. Recently, a promising unsupervised learning for MAR was proposed using feature disentanglement, but the resulting network architecture is complication and difficult to handle large size clinical images. To address this, here we propose a much simpler and much effective unsupervised MAR method for CT. The proposed method is based on a novel beta-cycleGAN architecture derived from the optimal transport theory for appropriate feature space disentanglement. Another important contribution is to show that attention mechanism is the key element to effectively remove the metal artifacts. Specifically, by adding the convolutional block attention module (CBAM) layers with a proper disentanglement parameter, experimental results confirm that we can get more improved MAR that preserves the detailed texture of the original image.

  Access Paper or Ask Questions

Conversational Question Answering over Passages by Leveraging Word Proximity Networks

Apr 27, 2020
Magdalena Kaiser, Rishiraj Saha Roy, Gerhard Weikum

Question answering (QA) over text passages is a problem of long-standing interest in information retrieval. Recently, the conversational setting has attracted attention, where a user asks a sequence of questions to satisfy her information needs around a topic. While this setup is a natural one and similar to humans conversing with each other, it introduces a key research challenge: understanding the context left implicit by the user in follow-up questions. In this work, we demonstrate CROWN (Conversational passage ranking by Reasoning Over Word Networks): an unsupervised yet effective system for conversational QA with passage responses, that supports several modes of context propagation over multiple turns. To this end, CROWN first builds a word proximity network (WPN) from large corpora to store statistically significant term co-occurrences. At answering time, passages are ranked by a combination of their similarity to the question, and coherence of query terms within: these factors are measured by reading off node and edge weights from the WPN. CROWN provides an interface that is both intuitive for end-users, and insightful for experts for reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods.

* SIGIR 2020 Demonstrations 

  Access Paper or Ask Questions

Towards an Integrated Platform for Big Data Analysis

Apr 27, 2020
Mahdi Bohlouli, Frank Schulz, Lefteris Angelis, David Pahor, Ivona Brandic, David Atlan, Rosemary Tate

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, additional complexity arises from dealing with semi-structured or unstructured data, and from time critical processing requirements. In order to understand these massive amounts of data, advanced visualization and data exploration techniques are required. Innovative approaches to these challenges have been developed during recent years, and continue to be a hot topic for re-search and industry in the future. An investigation of current approaches reveals that usually only one or two aspects are ad-dressed, either in the data management, processing, analysis or visualization. This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, a more efficient usage of system resources, and an improved usability during the end-to-end data analysis process.

  Access Paper or Ask Questions