Sports competitions are widely researched in computer and social science, with the goal of understanding how players act under uncertainty. While there is an abundance of computational work on player metrics prediction based on past performance, very few attempts to incorporate out-of-game signals have been made. Specifically, it was previously unclear whether linguistic signals gathered from players' interviews can add information which does not appear in performance metrics. To bridge that gap, we define text classification tasks of predicting deviations from mean in NBA players' in-game actions, which are associated with strategic choices, player behavior and risk, using their choice of language prior to the game. We collected a dataset of transcripts from key NBA players' pre-game interviews and their in-game performance metrics, totaling in 5,226 interview-metric pairs. We design neural models for players' action prediction based on increasingly more complex aspects of the language signals in their open-ended interviews. Our models can make their predictions based on the textual signal alone, or on a combination with signals from past-performance metrics. Our text-based models outperform strong baselines trained on performance metrics only, demonstrating the importance of language usage for action prediction. Moreover, the models that employ both textual input and past-performance metrics produced the best results. Finally, as neural networks are notoriously difficult to interpret, we propose a method for gaining further insight into what our models have learned. Particularly, we present an LDA-based analysis, where we interpret model predictions in terms of correlated topics. We find that our best performing textual model is most associated with topics that are intuitively related to each prediction task and that better models yield higher correlation with more informative topics.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks with Long Short-Term Memory (LSTM) cells. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detects the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms it for web document retrieval task.
This thesis investigates the controllability of deep learning-based, end-to-end, generative dialogue systems in both task-oriented and chit-chat scenarios. In particular, we study the different aspects of controlling generative dialogue systems, including controlling styles and topics and continuously adding and combining dialogue skills. In the three decades since the first dialogue system was commercialized, the basic architecture of such systems has remained substantially unchanged, consisting of four pipelined basic components, namely, natural language understanding (NLU), dialogue state tracking (DST), a dialogue manager (DM) and natural language generation (NLG). The dialogue manager, which is the critical component of the modularized system, controls the response content and style. This module is usually programmed by rules and is designed to be highly controllable and easily extendable. With the emergence of powerful "deep learning" architectures, end-to-end generative dialogue systems have been proposed to optimize overall system performance and simplify training. However, these systems cannot be easily controlled and extended as the modularized dialogue manager can. This is because a single neural system is used, which is usually a large pre-trained language model (e.g., GPT-2), and thus it is hard to surgically change desirable attributes (e.g., style, topics, etc.). More importantly, uncontrollable dialogue systems can generate offensive and even toxic responses. Therefore, in this thesis, we study controllable methods for end-to-end generative dialogue systems in task-oriented and chit-chat scenarios. Throughout the chapters, we describe 1) how to control the style and topics of chit-chat models, 2) how to continuously control and extend task-oriented dialogue systems, and 3) how to compose and control multi-skill dialogue models.
Background: Misinformation spread through social media is a growing problem, and the emergence of COVID-19 has caused an explosion in new activity and renewed focus on the resulting threat to public health. Given this increased visibility, in-depth analysis of COVID-19 misinformation spread is critical to understanding the evolution of ideas with potential negative public health impact. Methods: Using a curated data set of COVID-19 tweets (N ~120 million tweets) spanning late January to early May 2020, we applied methods including regular expression filtering, supervised machine learning, sentiment analysis, geospatial analysis, and dynamic topic modeling to trace the spread of misinformation and to characterize novel features of COVID-19 conspiracy theories. Results: Random forest models for four major misinformation topics provided mixed results, with narrowly-defined conspiracy theories achieving F1 scores of 0.804 and 0.857, while more broad theories performed measurably worse, with scores of 0.654 and 0.347. Despite this, analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. We were able to identify distinct increases in negative sentiment, theory-specific trends in geospatial spread, and the evolution of conspiracy theory topics and subtopics over time. Conclusions: COVID-19 related conspiracy theories show that history frequently repeats itself, with the same conspiracy theories being recycled for new situations. We use a combination of supervised learning, unsupervised learning, and natural language processing techniques to look at the evolution of theories over the first four months of the COVID-19 outbreak, how these theories intertwine, and to hypothesize on more effective public health messaging to combat misinformation in online spaces.
With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to gain valuable insights from these large volumes of freely available user generated content. With the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities, we designed and developed a framework that provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modeling, and transportation-specific text classifiers, as well as, aggregation and data visualization. We performed an exploratory data analysis of geo-located tweets in 5 different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and Melbourne, comprising a total of more than 43 million tweets in a period of 3 months. Furthermore, we performed a large scale topic modelling comparison between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are shared between both cities which despite being in the same country are considered very different regarding population, economy and lifestyle. We take advantage of recent developments in word embeddings and train such representations from the collections of geo-located tweets. We then use a combination of bag-of-embeddings and traditional bag-of-words to train travel-related classifiers in both Portuguese and English to filter travel-related content from non-related. We created specific gold-standard data to perform empirical evaluation of the resulting classifiers. Results are in line with research work in other application areas by showing the robustness of using word embeddings to learn word similarities that bag-of-words is not able to capture.
The objective of the study is to examine coronavirus disease (COVID-19) related discussions, concerns, and sentiments that emerged from tweets posted by Twitter users. We collected 22 million Twitter messages related to the COVID-19 pandemic using a list of 25 hashtags such as "coronavirus," "COVID-19," "quarantine" from March 1 to April 21 in 2020. We used a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigram, bigrams, salient topics and themes, and sentiments in the collected Tweets. Popular unigrams included "virus," "lockdown," and "quarantine." Popular bigrams included "COVID-19," "stay home," "corona virus," "social distancing," and "new cases." We identified 13 discussion topics and categorized them into different themes, such as "Measures to slow the spread of COVID-19," "Quarantine and shelter-in-place order in the U.S.," "COVID-19 in New York," "Virus misinformation and fake news," "A need for a vaccine to stop the spread," "Protest against the lockdown," and "Coronavirus new cases and deaths." The dominant sentiments for the spread of coronavirus were anticipation that measures that can be taken, followed by a mixed feeling of trust, anger, and fear for different topics. The public revealed a significant feeling of fear when they discussed the coronavirus new cases and deaths. The study concludes that Twitter continues to be an essential source for infodemiology study by tracking rapidly evolving public sentiment and measuring public interests and concerns. Already emerged pandemic fear, stigma, and mental health concerns may continue to influence public trust when there occurs a second wave of COVID-19 or a new surge of the imminent pandemic. Hearing and reacting to real concerns from the public can enhance trust between the healthcare systems and the public as well as prepare for a future public health emergency.
Target-guided open-domain conversation aims to proactively and naturally guide a dialogue agent or human to achieve specific goals, topics or keywords during open-ended conversations. Existing methods mainly rely on single-turn datadriven learning and simple target-guided strategy without considering semantic or factual knowledge relations among candidate topics/keywords. This results in poor transition smoothness and low success rate. In this work, we adopt a structured approach that controls the intended content of system responses by introducing coarse-grained keywords, attains smooth conversation transition through turn-level supervised learning and knowledge relations between candidate keywords, and drives an conversation towards an specified target with discourse-level guiding strategy. Specially, we propose a novel dynamic knowledge routing network (DKRN) which considers semantic knowledge relations among candidate keywords for accurate next topic prediction of next discourse. With the help of more accurate keyword prediction, our keyword-augmented response retrieval module can achieve better retrieval performance and more meaningful conversations. Besides, we also propose a novel dual discourse-level target-guided strategy to guide conversations to reach their goals smoothly with higher success rate. Furthermore, to push the research boundary of target-guided open-domain conversation to match real-world scenarios better, we introduce a new large-scale Chinese target-guided open-domain conversation dataset (more than 900K conversations) crawled from Sina Weibo. Quantitative and human evaluations show our method can produce meaningful and effective target-guided conversations, significantly improving over other state-of-the-art methods by more than 20% in success rate and more than 0.6 in average smoothness score.