Videogames are nowadays one of the biggest entertainment industries in the world. Being part of this industry means competing against lots of other companies and developers, thus, making fanbases of vital importance. They are a group of clients that constantly support your company because your video games are fun. Videogames are most entertaining when the difficulty level is a good match for the player's skill, increasing the player engagement. However, not all players are equally proficient, so some kind of difficulty selection is required. In this paper, we will present Dynamic Difficulty Adjustment (DDA), a recently arising research topic, which aims to develop an automated difficulty selection mechanism that keeps the player engaged and properly challenged, neither bored nor overwhelmed. We will present some recent research addressing this issue, as well as an overview of how to implement it. Satisfactorily solving the DDA problem directly affects the player's experience when playing the game, making it of high interest to any game developer, from independent ones, to 100 billion dollar businesses, because of the potential impacts in player retention and monetization.
Pedestrian path prediction is an essential topic in computer vision and video understanding. Having insight into the movement of pedestrians is crucial for ensuring safe operation in a variety of applications including autonomous vehicles, social robots, and environmental monitoring. Current works in this area utilize complex generative or recurrent methods to capture many possible futures. However, despite the inherent real-time nature of predicting future paths, little work has been done to explore accurate and computationally efficient approaches for this task. To this end, we propose a convolutional approach for real-time pedestrian path prediction, CARPe. It utilizes a variation of Graph Isomorphism Networks in combination with an agile convolutional neural network design to form a fast and accurate path prediction approach. Notable results in both inference speed and prediction accuracy are achieved, improving FPS by at least 8x in comparison to current state-of-the-art methods while delivering competitive accuracy on well-known path prediction datasets.
Gradient tree boosting (e.g. XGB) is one of the most widely usedmachine learning models in practice. How to build a secure XGB inface of data isolation problem becomes a hot research topic. However, existing works tend to leak intermediate information and thusraise potential privacy risk. In this paper, we propose a novel framework for two parties to build secure XGB with vertically partitioneddata. Specifically, we associate Homomorphic Encryption (HE) domain with Secret Sharing (SS) domain by providing the two-waytransformation primitives. The framework generally promotes theefficiency for privacy preserving machine learning and offers theflexibility to implement other machine learning models. Then weelaborate two secure XGB training algorithms as well as a corresponding prediction algorithm under the hybrid security domains.Next, we compare our proposed two training algorithms throughboth complexity analysis and experiments. Finally, we verify themodel performance on benchmark dataset and further apply ourwork to a real-world scenario.
The real-world data usually exhibits heterogeneous properties such as modalities, views, or resources, which brings some unique challenges wherein the key is Heterogeneous Representation Learning (HRL) termed in this paper. This brief survey covers the topic of HRL, centered around several major learning settings and real-world applications. First of all, from the mathematical perspective, we present a unified learning framework which is able to model most existing learning settings with the heterogeneous inputs. After that, we conduct a comprehensive discussion on the HRL framework by reviewing some selected learning problems along with the mathematics perspectives, including multi-view learning, heterogeneous transfer learning, Learning using privileged information and heterogeneous multi-task learning. For each learning task, we also discuss some applications under these learning problems and instantiates the terms in the mathematical framework. Finally, we highlight the challenges that are less-touched in HRL and present future research directions. To the best of our knowledge, there is no such framework to unify these heterogeneous problems, and this survey would benefit the community.
The task of concept prerequisite chain learning is to automatically determine the existence of prerequisite relationships among concept pairs. In this paper, we frame learning prerequisite relationships among concepts as an unsupervised task with no access to labeled concept pairs during training. We propose a model called the Relational-Variational Graph AutoEncoder (R-VGAE) to predict concept relations within a graph consisting of concept and resource nodes. Results show that our unsupervised approach outperforms graph-based semi-supervised methods and other baseline methods by up to 9.77% and 10.47% in terms of prerequisite relation prediction accuracy and F1 score. Our method is notably the first graph-based model that attempts to make use of deep learning representations for the task of unsupervised prerequisite learning. We also expand an existing corpus which totals 1,717 English Natural Language Processing (NLP)-related lecture slide files and manual concept pair annotations over 322 topics.
Music genre classification is one of the sub-disciplines of music information retrieval (MIR) with growing popularity among researchers, mainly due to the already open challenges. Although research has been prolific in terms of number of published works, the topic still suffers from a problem in its foundations: there is no clear and formal definition of what genre is. Music categorizations are vague and unclear, suffering from human subjectivity and lack of agreement. In its first part, this paper offers a survey trying to cover the many different aspects of the matter. Its main goal is give the reader an overview of the history and the current state-of-the-art, exploring techniques and datasets used to the date, as well as identifying current challenges, such as this ambiguity of genre definitions or the introduction of human-centric approaches. The paper pays special attention to new trends in machine learning applied to the music annotation problem. Finally, we also include a music genre classification experiment that compares different machine learning models using Audioset.
This paper presents an approach based on supervised machine learning methods to discriminate between positive, negative and neutral Arabic reviews in online newswire. The corpus is labeled for subjectivity and sentiment analysis (SSA) at the sentence-level. The model uses both count and TF-IDF representations and apply six machine learning algorithms; Multinomial Naive Bayes, Support Vector Machines (SVM), Random Forest, Logistic Regression, Multi-layer perceptron and k-nearest neighbors using uni-grams, bi-grams features. With the goal of extracting users sentiment from written text. Experimental results showed that n-gram features could substantially improve performance; and showed that the Multinomial Naive Bayes approach is the most accurate in predicting topic polarity. Best results were achieved using count vectors trained by combination of word-based uni-grams and bi-grams with an overall accuracy of 85.57% over two classes and 65.64% over three classes.
We address the problem of predicting the leading political ideology, i.e., left-center-right bias, for YouTube channels of news media. Previous work on the problem has focused exclusively on text and on analysis of the language used, topics discussed, sentiment, and the like. In contrast, here we study videos, which yields an interesting multimodal setup. Starting with gold annotations about the leading political ideology of major world news media from Media Bias/Fact Check, we searched on YouTube to find their corresponding channels, and we downloaded a recent sample of videos from each channel. We crawled more than 1,000 YouTube hours along with the corresponding subtitles and metadata, thus producing a new multimodal dataset. We further developed a multimodal deep-learning architecture for the task. Our analysis shows that the use of acoustic signal helped to improve bias detection by more than 6% absolute over using text and metadata only. We release the dataset to the research community, hoping to help advance the field of multi-modal political bias detection.
Person re-identification (re-ID) solves the task of matching images across cameras and is among the research topics in vision community. Since query images in real-world scenarios might suffer from resolution loss, how to solve the resolution mismatch problem during person re-ID becomes a practical problem. Instead of applying separate image super-resolution models, we propose a novel network architecture of Resolution Adaptation and re-Identification Network (RAIN) to solve cross-resolution person re-ID. Advancing the strategy of adversarial learning, we aim at extracting resolution-invariant representations for re-ID, while the proposed model is learned in an end-to-end training fashion. Our experiments confirm that the use of our model can recognize low-resolution query images, even if the resolution is not seen during training. Moreover, the extension of our model for semi-supervised re-ID further confirms the scalability of our proposed method for real-world scenarios and applications.
The posit number system is arguably the most promising and discussed topic in Arithmetic nowadays. The recent breakthroughs claimed by the format proposed by John L. Gustafson have put posits in the spotlight. In this work, we first describe an algorithm for multiplying two posit numbers, even when the number of exponent bits is zero. This configuration, scarcely tackled in literature, is particularly interesting because it allows the deployment of a fast sigmoid function. The proposed multiplication algorithm is then integrated as a template into the well-known FloPoCo framework. Synthesis results are shown to compare with the floating point multiplication offered by FloPoCo as well. Second, the performance of posits is studied in the scenario of Neural Networks in both training and inference stages. To the best of our knowledge, this is the first time that training is done with posit format, achieving promising results for a binary classification problem even with reduced posit configurations. In the inference stage, 8-bit posits are as good as floating point when dealing with the MNIST dataset, but lose some accuracy with CIFAR-10.