Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Time": models, code, and papers

Using the profile of publishers to predict barriers across news articles

Jan 13, 2023
Abdul Sittar, Dunja Mladenic

Detection of news propagation barriers, being economical, cultural, political, time zonal, or geographical, is still an open research issue. We present an approach to barrier detection in news spreading by utilizing Wikipedia-concepts and metadata associated with each barrier. Solving this problem can not only convey the information about the coverage of an event but it can also show whether an event has been able to cross a specific barrier or not. Experimental results on IPoNews dataset (dataset for information spreading over the news) reveals that simple classification models are able to detect barriers with high accuracy. We believe that our approach can serve to provide useful insights which pave the way for the future development of a system for predicting information spreading barriers over the news.

Via

Access Paper or Ask Questions

Prompting Neural Machine Translation with Translation Memories

Jan 13, 2023
Abudurexiti Reheman, Tao Zhou, Yingfeng Luo, Di Yang, Tong Xiao, Jingbo Zhu

Figure 1 for Prompting Neural Machine Translation with Translation Memories

Figure 2 for Prompting Neural Machine Translation with Translation Memories

Figure 3 for Prompting Neural Machine Translation with Translation Memories

Figure 4 for Prompting Neural Machine Translation with Translation Memories

Improving machine translation (MT) systems with translation memories (TMs) is of great interest to practitioners in the MT community. However, previous approaches require either a significant update of the model architecture and/or additional training efforts to make the models well-behaved when TMs are taken as additional input. In this paper, we present a simple but effective method to introduce TMs into neural machine translation (NMT) systems. Specifically, we treat TMs as prompts to the NMT model at test time, but leave the training process unchanged. The result is a slight update of an existing NMT system, which can be implemented in a few hours by anyone who is familiar with NMT. Experimental results on several datasets demonstrate that our system significantly outperforms strong baselines.

* Accepted to AAAI 2023

Via

Access Paper or Ask Questions

PowerQuant: Automorphism Search for Non-Uniform Quantization

Jan 24, 2023
Edouard Yvinec, Arnaud Dapogny, Matthieu Cord, Kevin Bailly

Figure 1 for PowerQuant: Automorphism Search for Non-Uniform Quantization

Figure 2 for PowerQuant: Automorphism Search for Non-Uniform Quantization

Figure 3 for PowerQuant: Automorphism Search for Non-Uniform Quantization

Figure 4 for PowerQuant: Automorphism Search for Non-Uniform Quantization

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such as computer vision. However, due to their high latency, the deployment of DNNs hinges on the development of compression techniques such as quantization which consists in lowering the number of bits used to encode the weights and activations. Growing concerns for privacy and security have motivated the development of data-free techniques, at the expanse of accuracy. In this paper, we identity the uniformity of the quantization operator as a limitation of existing approaches, and propose a data-free non-uniform method. More specifically, we argue that to be readily usable without dedicated hardware and implementation, non-uniform quantization shall not change the nature of the mathematical operations performed by the DNN. This leads to search among the continuous automorphisms of $(\mathbb{R}_+^*,\times)$, which boils down to the power functions defined by their exponent. To find this parameter, we propose to optimize the reconstruction error of each layer: in particular, we show that this procedure is locally convex and admits a unique solution. At inference time, we show that our approach, dubbed PowerQuant, only require simple modifications in the quantized DNN activation functions. As such, with only negligible overhead, it significantly outperforms existing methods in a variety of configurations.

Via

Access Paper or Ask Questions

Implementation of the Critical Wave Groups Method with Computational Fluid Dynamics and Neural Networks

Jan 24, 2023
Kevin M. Silva, Kevin J. Maki

Figure 1 for Implementation of the Critical Wave Groups Method with Computational Fluid Dynamics and Neural Networks

Figure 2 for Implementation of the Critical Wave Groups Method with Computational Fluid Dynamics and Neural Networks

Figure 3 for Implementation of the Critical Wave Groups Method with Computational Fluid Dynamics and Neural Networks

Figure 4 for Implementation of the Critical Wave Groups Method with Computational Fluid Dynamics and Neural Networks

Accurate and efficient prediction of extreme ship responses continues to be a challenging problem in ship hydrodynamics. Probabilistic frameworks in conjunction with computationally efficient numerical hydrodynamic tools have been developed that allow researchers and designers to better understand extremes. However, the ability of these hydrodynamic tools to represent the physics quantitatively during extreme events is limited. Previous research successfully implemented the critical wave groups (CWG) probabilistic method with computational fluid dynamics (CFD). Although the CWG method allows for less simulation time than a Monte Carlo approach, the large quantity of simulations required is cost prohibitive. The objective of the present paper is to reduce the computational cost of implementing CWG with CFD, through the construction of long short-term memory (LSTM) neural networks. After training the models with a limited quantity of simulations, the models can provide a larger quantity of predictions to calculate the probability. The new framework is demonstrated with a 2-D midship section of the Office of Naval Research Tumblehome (ONRT) hull in Sea State 7 and beam seas at zero speed. The new framework is able to produce predictions that are representative of a purely CFD-driven CWG framework, with two orders of magnitude of computational cost savings.

Via

Access Paper or Ask Questions

On Dynamic Regret and Constraint Violations in Constrained Online Convex Optimization

Jan 24, 2023
Rahul Vaze

Figure 1 for On Dynamic Regret and Constraint Violations in Constrained Online Convex Optimization

Figure 2 for On Dynamic Regret and Constraint Violations in Constrained Online Convex Optimization

A constrained version of the online convex optimization (OCO) problem is considered. With slotted time, for each slot, first an action is chosen. Subsequently the loss function and the constraint violation penalty evaluated at the chosen action point is revealed. For each slot, both the loss function as well as the function defining the constraint set is assumed to be smooth and strongly convex. In addition, once an action is chosen, local information about a feasible set within a small neighborhood of the current action is also revealed. An algorithm is allowed to compute at most one gradient at its point of choice given the described feedback to choose the next action. The goal of an algorithm is to simultaneously minimize the dynamic regret (loss incurred compared to the oracle's loss) and the constraint violation penalty (penalty accrued compared to the oracle's penalty). We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action. We show that both the dynamic regret and the constraint violation is order-wise bounded by the {\it path-length}, the sum of the distances between the consecutive optimal actions. Moreover, we show that the derived bounds are the best possible.

* in Proc. WiOpt 2022

Via

Access Paper or Ask Questions

Probabilistic Bilevel Coreset Selection

Jan 24, 2023
Xiao Zhou, Renjie Pi, Weizhong Zhang, Yong Lin, Tong Zhang

Figure 1 for Probabilistic Bilevel Coreset Selection

Figure 2 for Probabilistic Bilevel Coreset Selection

Figure 3 for Probabilistic Bilevel Coreset Selection

Figure 4 for Probabilistic Bilevel Coreset Selection

The goal of coreset selection in supervised learning is to produce a weighted subset of data, so that training only on the subset achieves similar performance as training on the entire dataset. Existing methods achieved promising results in resource-constrained scenarios such as continual learning and streaming. However, most of the existing algorithms are limited to traditional machine learning models. A few algorithms that can handle large models adopt greedy search approaches due to the difficulty in solving the discrete subset selection problem, which is computationally costly when coreset becomes larger and often produces suboptimal results. In this work, for the first time we propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample. The overall objective is posed as a bilevel optimization problem, where 1) the inner loop samples coresets and train the model to convergence and 2) the outer loop updates the sample probability progressively according to the model's performance. Importantly, we develop an efficient solver to the bilevel optimization problem via unbiased policy gradient without trouble of implicit differentiation. We provide the convergence property of our training procedure and demonstrate the superiority of our algorithm against various coreset selection methods in various tasks, especially in more challenging label-noise and class-imbalance scenarios.

Via

Access Paper or Ask Questions

Accelerating Machine Learning Training Time for Limit Order Book Prediction

Jun 17, 2022
Mark Joseph Bennett

Figure 1 for Accelerating Machine Learning Training Time for Limit Order Book Prediction

Figure 2 for Accelerating Machine Learning Training Time for Limit Order Book Prediction

Figure 3 for Accelerating Machine Learning Training Time for Limit Order Book Prediction

Financial firms are interested in simulation to discover whether a given algorithm involving financial machine learning will operate profitably. While many versions of this type of algorithm have been published recently by researchers, the focus herein is on a particular machine learning training project due to the explainable nature and the availability of high frequency market data. For this task, hardware acceleration is expected to speed up the time required for the financial machine learning researcher to obtain the results. As the majority of the time can be spent in classifier training, there is interest in faster training steps. A published Limit Order Book algorithm for predicting stock market direction is our subject, and the machine learning training process can be time-intensive especially when considering the iterative nature of model development. To remedy this, we deploy Graphical Processing Units (GPUs) produced by NVIDIA available in the data center where the computer architecture is geared to parallel high-speed arithmetic operations. In the studied configuration, this leads to significantly faster training time allowing more efficient and extensive model development.

Via

Access Paper or Ask Questions

Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Jan 20, 2023
Soumyadeep Pal, Ren Wang, Yuguang Yao, Sijia Liu

Figure 1 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Figure 2 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Figure 3 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Figure 4 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Recent studies on backdoor attacks in model training have shown that polluting a small portion of training data is sufficient to produce incorrect manipulated predictions on poisoned test-time data while maintaining high clean accuracy in downstream tasks. The stealthiness of backdoor attacks has imposed tremendous defense challenges in today's machine learning paradigm. In this paper, we explore the potential of self-training via additional unlabeled data for mitigating backdoor attacks. We begin by making a pilot study to show that vanilla self-training is not effective in backdoor mitigation. Spurred by that, we propose to defend the backdoor attacks by leveraging strong but proper data augmentations in the self-training pseudo-labeling stage. We find that the new self-training regime help in defending against backdoor attacks to a great extent. Its effectiveness is demonstrated through experiments for different backdoor triggers on CIFAR-10 and a combination of CIFAR-10 with an additional unlabeled 500K TinyImages dataset. Finally, we explore the direction of combining self-supervised representation learning with self-training for further improvement in backdoor defense.

* Accepted at SafeAI 2023: AAAI's Workshop on Artificial Intelligence Safety

Via

Access Paper or Ask Questions

Feature Relevance Analysis to Explain Concept Drift -- A Case Study in Human Activity Recognition

Jan 20, 2023
Pekka Siirtola, Juha Röning

Figure 1 for Feature Relevance Analysis to Explain Concept Drift -- A Case Study in Human Activity Recognition

Figure 2 for Feature Relevance Analysis to Explain Concept Drift -- A Case Study in Human Activity Recognition

Figure 3 for Feature Relevance Analysis to Explain Concept Drift -- A Case Study in Human Activity Recognition

Figure 4 for Feature Relevance Analysis to Explain Concept Drift -- A Case Study in Human Activity Recognition

This article studies how to detect and explain concept drift. Human activity recognition is used as a case study together with a online batch learning situation where the quality of the labels used in the model updating process starts to decrease. Drift detection is based on identifying a set of features having the largest relevance difference between the drifting model and a model that is known to be accurate and monitoring how the relevance of these features changes over time. As a main result of this article, it is shown that feature relevance analysis cannot only be used to detect the concept drift but also to explain the reason for the drift when a limited number of typical reasons for the concept drift are predefined. To explain the reason for the concept drift, it is studied how these predefined reasons effect to feature relevance. In fact, it is shown that each of these has an unique effect to features relevance and these can be used to explain the reason for concept drift.

* Accepted to HASCA 2022 workshop in conjunction with UbiComp/ISWC2022

Via

Access Paper or Ask Questions

Traffic Prediction in Cellular Networks using Graph Neural Networks

Jan 30, 2023
Maryam Khalid

Figure 1 for Traffic Prediction in Cellular Networks using Graph Neural Networks

Figure 2 for Traffic Prediction in Cellular Networks using Graph Neural Networks

Figure 3 for Traffic Prediction in Cellular Networks using Graph Neural Networks

Figure 4 for Traffic Prediction in Cellular Networks using Graph Neural Networks

Cellular networks are ubiquitous entities that provide major means of communication all over the world. One major challenge in cellular networks is a dynamic change in the number of users and their usage of telecommunication service which results in overloading at certain base stations. One class of solution to deal with this overloading issue is the deployment of drones that can act as temporary base stations and offload the traffic from the overloaded base station. There are two main challenges in the development of this solution. Firstly, the drone is expected to be present around the base station where an overload would occur in the future thus requiring a prediction of traffic overload. Secondly, drones are highly constrained in their resources and can only fly for a few minutes. If the affected base station is really far, drones can never reach there. This requires the initial placement of drones in sectors where overloading can occur thus again requiring a traffic forecast but at a different spatial scale. It must be noted that the spatial extent of the region that the problem poses and the extremely limited power resources available to the drone pose a great challenge that is hard to overcome without deploying the drones in strategic positions to reduce the time to fly to the required high-demand zone. Moreover, since drone fly at a finite speed, it is important that a predictive solution that can forecast traffic surges is adopted so that drones are available to offload the overload before it actually happens. Both these goals require analysis and forecast of cellular network traffic which is the main goal of this project

Via

Access Paper or Ask Questions