In this paper, we lay out a vision for analysing semantic trajectory traces and generating synthetic semantic trajectory data (SSTs) using generative language model. Leveraging the advancements in deep learning, as evident by progress in the field of natural language processing (NLP), computer vision, etc. we intend to create intelligent models that can study the semantic trajectories in various contexts, predicting future trends, increasing machine understanding of the movement of animals, humans, goods, etc. enhancing human-computer interactions, and contributing to an array of applications ranging from urban-planning to personalized recommendation engines and business strategy.
We introduce NightPulse, an interactive tool for Night-time light (NTL) data visualization and analytics, which enables researchers and stakeholders to explore and analyze NTL data with a user-friendly platform. Powered by efficient system architecture, NightPulse supports image segmentation, clustering, and change pattern detection to identify urban development and sprawl patterns. It captures temporal trends of NTL and semantics of cities, answering questions about demographic factors, city boundaries, and unusual differences.
The increasing reliance on online communities for healthcare information by patients and caregivers has led to the increase in the spread of misinformation, or subjective, anecdotal and inaccurate or non-specific recommendations, which, if acted on, could cause serious harm to the patients. Hence, there is an urgent need to connect users with accurate and tailored health information in a timely manner to prevent such harm. This paper proposes an innovative approach to suggesting reliable information to participants in online communities as they move through different stages in their disease or treatment. We hypothesize that patients with similar histories of disease progression or course of treatment would have similar information needs at comparable stages. Specifically, we pose the problem of predicting topic tags or keywords that describe the future information needs of users based on their profiles, traces of their online interactions within the community (past posts, replies) and the profiles and traces of online interactions of other users with similar profiles and similar traces of past interaction with the target users. The result is a variant of the collaborative information filtering or recommendation system tailored to the needs of users of online health communities. We report results of our experiments on an expert curated data set which demonstrate the superiority of the proposed approach over the state of the art baselines with respect to accurate and timely prediction of topic tags (and hence information sources of interest).
Analyzing the geographic movement of humans, animals, and other phenomena is a growing field of research. This research has benefited urban planning, logistics, animal migration understanding, and much more. Typically, the movement is captured as precise geographic coordinates and time stamps with Global Positioning Systems (GPS). Although some research uses computational techniques to take advantage of implicit movement in descriptions of route directions, hiking paths, and historical exploration routes, innovation would accelerate with a large and diverse corpus. We created a corpus of sentences labeled as describing geographic movement or not and including the type of entity moving. Creating this corpus proved difficult without any comparable corpora to start with, high human labeling costs, and since movement can at times be interpreted differently. To overcome these challenges, we developed an iterative process employing hand labeling, crowd voting for confirmation, and machine learning to predict more labels. By merging advances in word embeddings with traditional machine learning models and model ensembling, prediction accuracy is at an acceptable level to produce a large silver-standard corpus despite the small gold-standard corpus training set. Our corpus will likely benefit computational processing of geography in text and spatial cognition, in addition to detection of movement.
Federated Learning (FL) is designed to protect the data privacy of each client during the training process by transmitting only models instead of the original data. However, the trained model may memorize certain information about the training data. With the recent legislation on right to be forgotten, it is crucially essential for the FL model to possess the ability to forget what it has learned from each client. We propose a novel federated unlearning method to eliminate a client's contribution by subtracting the accumulated historical updates from the model and leveraging the knowledge distillation method to restore the model's performance without using any data from the clients. This method does not have any restrictions on the type of neural networks and does not rely on clients' participation, so it is practical and efficient in the FL system. We further introduce backdoor attacks in the training process to help evaluate the unlearning effect. Experiments on three canonical datasets demonstrate the effectiveness and efficiency of our method.
Understanding movement described in text documents is important since text descriptions of movement contain a wealth of geographic and contextual information about the movement of people, wildlife, goods, and much more. Our research makes several contributions to improve our understanding of movement descriptions in text. First, we show how interpreting geographic movement described in text is challenging because of general spatial terms, linguistic constructions that make the thing(s) moving unclear, and many types of temporal references and groupings, among others. Next, as a step to overcome these challenges, we report on an experiment with human subjects through which we identify multiple important characteristics of movement descriptions (found in text) that humans use to differentiate one movement description from another. Based on our empirical results, we provide recommendations for computational analysis using movement described in text documents. Our findings contribute towards an improved understanding of the important characteristics of the underused information about geographic movement that is in the form of text descriptions.
Understanding the movement of animals is crucial to conservation efforts. Past research often focuses on factors affecting movement, rather than locations of interest that animals return to or habitat. We explore the use of clustering to identify locations of interest to African Elephants in regions of Sub-Saharan Africa. Our analysis was performed using publicly available datasets for tracking African elephants at Kruger National Park (KNP), South Africa; Etosha National Park, Namibia; as well as areas in Burkina Faso and the Congo. Using the DBSCAN and KMeans clustering algorithms, we calculate clusters and centroids to simplify elephant movement data and highlight important locations of interest. Through a comparison of feature spaces with and without temperature, we show that temperature is an important feature to explain movement clustering. Recognizing the importance of temperature, we develop a technique to add external temperature data from an API to other geospatial datasets that would otherwise not have temperature data. After addressing the hurdles of using external data with marginally different timestamps, we consider the quality of this data, and the quality of the centroids of the clusters calculated based on this external temperature data. Finally, we overlay these centroids onto satellite imagery and locations of human settlements to validate the real-life applications of the calculated centroids to identify locations of interest for elephants. As expected, we confirmed that elephants tend to cluster their movement around sources of water as well as some human settlements, especially those with water holes. Identifying key locations of interest for elephants is beneficial in predicting the movement of elephants and preventing poaching. These methods may in the future be applied to other animals beyond elephants to identify locations of interests for them.
Major League Baseball (MLB) has a storied history of using statistics to better understand and discuss the game of baseball, with an entire discipline of statistics dedicated to the craft, known as sabermetrics. At their core, all sabermetrics seek to quantify some aspect of the game, often a specific aspect of a player's skill set - such as a batter's ability to drive in runs (RBI) or a pitcher's ability to keep batters from reaching base (WHIP). While useful, such statistics are fundamentally limited by the fact that they are derived from an account of what happened on the field, not how it happened. As a first step towards alleviating this shortcoming, we present a novel, contrastive learning-based framework for describing player form in the MLB. We use form to refer to the way in which a player has impacted the course of play in their recent appearances. Concretely, a player's form is described by a 72-dimensional vector. By comparing clusters of players resulting from our form representations and those resulting from traditional abermetrics, we demonstrate that our form representations contain information about how players impact the course of play, not present in traditional, publicly available statistics. We believe these embeddings could be utilized to predict both in-game and game-level events, such as the result of an at-bat or the winner of a game.
Presentation slides describing the content of scientific and technical papers are an efficient and effective way to present that work. However, manually generating presentation slides is labor intensive. We propose a method to automatically generate slides for scientific papers based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites. The sentence labeling module of our method is based on SummaRuNNer, a neural sequence model for extractive summarization. Instead of ranking sentences based on semantic similarities in the whole document, our algorithm measures importance and novelty of sentences by combining semantic and lexical features within a sentence window. Our method outperforms several baseline methods including SummaRuNNer by a significant margin in terms of ROUGE score.
Malicious clients can attack federated learning systems by using malicious data, including backdoor samples, during the training phase. The compromised global model will perform well on the validation dataset designed for the task. However, a small subset of data with backdoor patterns may trigger the model to make a wrong prediction. Previously, there was an arms race. Attackers tried to conceal attacks and defenders tried to detect attacks during the aggregation stage of training on the server-side in a federated learning system. In this work, we propose a new method to mitigate backdoor attacks after the training phase. Specifically, we designed a federated pruning method to remove redundant neurons in the network and then adjust the model's extreme weight values. Experiments conducted on distributed Fashion-MNIST have shown that our method can reduce the average attack success rate from 99.7% to 1.9% with a 5.5% loss of test accuracy on the validation dataset. To minimize the pruning influence on test accuracy, we can fine-tune after pruning, and the attack success rate drops to 6.4%, with only a 1.7% loss of test accuracy.