Occlusion is a long-standing problem in computer vision, particularly in instance segmentation. ACM MMSports 2023 DeepSportRadar has introduced a dataset that focuses on segmenting human subjects within a basketball context and a specialized evaluation metric for occlusion scenarios. Given the modest size of the dataset and the highly deformable nature of the objects to be segmented, this challenge demands the application of robust data augmentation techniques and wisely-chosen deep learning architectures. Our work (ranked 1st in the competition) first proposes a novel data augmentation technique, capable of generating more training samples with wider distribution. Then, we adopt a new architecture - Hybrid Task Cascade (HTC) framework with CBNetV2 as backbone and MaskIoU head to improve segmentation performance. Furthermore, we employ a Stochastic Weight Averaging (SWA) training strategy to improve the model's generalization. As a result, we achieve a remarkable occlusion score (OM) of 0.533 on the challenge dataset, securing the top-1 position on the leaderboard. Source code is available at this https://github.com/nguyendinhson-kaist/MMSports23-Seg-AutoID.
Traditional deep learning (DL) suffers from two core problems. Firstly, it assumes training samples are independent and identically distributed. However, numerous real-world datasets group samples by shared measurements (e.g., study participants or cells), violating this assumption. In these scenarios, DL can show compromised performance, limited generalization, and interpretability issues, coupled with cluster confounding causing Type 1 and 2 errors. Secondly, models are typically trained for overall accuracy, often neglecting underrepresented groups and introducing biases in crucial areas like loan approvals or determining health insurance rates, such biases can significantly impact one's quality of life. To address both of these challenges simultaneously, we present a mixed effects deep learning (MEDL) framework. MEDL separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through the introduction of: 1) a cluster adversary which encourages the learning of cluster-invariant FE, 2) a Bayesian neural network which quantifies the RE, and a mixing function combining the FE an RE into a mixed-effect prediction. We marry this MEDL with adversarial debiasing, which promotes equality-of-odds fairness across FE, RE, and ME predictions for fairness-sensitive variables. We evaluated our approach using three datasets: two from census/finance focusing on income classification and one from healthcare predicting hospitalization duration, a regression task. Our framework notably enhances fairness across all sensitive variables-increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status. Besides promoting fairness, our method maintains the robust performance and clarity of MEDL. It's versatile, suitable for various dataset types and tasks, making it broadly applicable. Our GitHub repository houses the implementation.
Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers' expectations when they define and use API methods. To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e.g., variables or method calls) in the given context. In ARIST, the LMs and the recommending features are used to suggest the promising candidates identified by PA. Meanwhile, PA navigates the LMs and the features working on the set of the valid candidates which satisfy syntax, accessibility, and type-compatibility constraints defined by the programming language in use. Our evaluation on a large dataset of real-world projects shows that ARIST improves the state-of-the-art approach by 19% and 18% in top-1 precision and recall for recommending arguments of frequently-used libraries. For general argument recommendation task, i.e., recommending arguments for every method call, ARIST outperforms the baseline approaches by up to 125% top-1 accuracy. Moreover, for newly-encountered projects, ARIST achieves more than 60% top-3 accuracy when evaluating on a larger dataset. For working/maintaining projects, with a personalized LM to capture developers' coding practice, ARIST can productively rank the expected arguments at the top-1 position in 7/10 requests.
Vehicle arrival time prediction has been studied widely. With the emergence of IoT devices and deep learning techniques, estimated time of arrival (ETA) has become a critical component in intelligent transportation systems. Though many tools exist for ETA, ETA for special vehicles, such as ambulances, fire engines, etc., is still challenging due to the limited amount of traffic data for special vehicles. Existing works use one model for all types of vehicles, which can lead to low accuracy. To tackle this, as the first in the field, we propose a deep transfer learning framework TLETA for the driving time prediction. TLETA constructs cellular spatial-temporal knowledge grids for extracting driving patterns, combined with the road network structure embedding to build a deep neural network for ETA. TLETA contains transferable layers to support knowledge transfer between different categories of vehicles. Importantly, our transfer models only train the last layers to map the transferred knowledge, that reduces the training time significantly. The experimental studies show that our model predicts travel time with high accuracy and outperforms many state-of-the-art approaches.
Technology for open-ended language generation, a key application of artificial intelligence, has advanced to a great extent in recent years. Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere, from virtual assistants to conversational bots. While these language models output fluent text, existing research shows that these models can and do capture human biases. Many of these biases, especially those that could potentially cause harm, are being well investigated. On the other hand, studies that infer and change personality traits inherited by these models have been scarce or non-existent. In this work, we explore the personality traits of several large-scale language models designed for open-ended text generation and the datasets used for training them. Our work builds on the popular Big Five factors and develops robust methods that quantify the personality traits of these models and their underlying datasets. In particular, we trigger the models with a questionnaire designed for personality assessment and subsequently classify the text responses into quantifiable traits using a Zero-shot classifier. Our classification sheds light on an important anthropomorphic element found in such AI models and can help stakeholders decide how they should be applied and how society could perceive them. We augment our analysis by studying approaches that can alter these personalities.
In this paper, we consider the IoT data discovery problem in very large and growing scale networks. Through analysis, examples, and experimental studies, we show the importance of peer-to-peer, unstructured routing for IoT data discovery and point out the space efficiency issue that has been overlooked in keyword-based routing algorithms in unstructured networks. Specifically, as the first in the field, this paper investigates routing table designs and various compression techniques to support effective and space-efficient IoT data discovery routing. Novel summarization algorithms, including alphabetical, hash, and meaning-based summarization and their corresponding coding schemes, are proposed. We also consider routing table design to support summarization without degrading lookup efficiency for discovery query routing. The issue of potentially misleading routing due to summarization is also investigated. Subsequently, we analyze the strategy of when to summarize to balance the tradeoff between the routing table compression rate and the chance of causing misleading routing. For the experimental study, we have collected 100K IoT data streams from various IoT databases as the input dataset. Experimental results show that our summarization solution can reduce the routing table size by 20 to 30 folds with a 2-5% increase in latency compared with similar peer-to-peer discovery routing algorithms without summarization. Also, our approach outperforms DHT-based approaches by 2 to 6 folds in terms of latency and traffic.
In this paper, we present a Generative Adversarial Network (GAN) machine learning model to interpolate irregularly distributed measurements across the spatial domain to construct a smooth radio frequency map (RFMap) and then perform localization using a deep neural network. Monitoring wireless spectrum over spatial, temporal, and frequency domains will become a critical feature in facilitating dynamic spectrum access (DSA) in beyond-5G and 6G communication technologies. Localization, wireless signal detection, and spectrum policy-making are several of the applications where distributed spectrum sensing will play a significant role. Detection and positioning of wireless emitters is a very challenging task in a large spectral and spatial area. In order to construct a smooth RFMap database, a large number of measurements are required which can be very expensive and time consuming. One approach to help realize these systems is to collect finite localized measurements across a given area and then interpolate the measurement values to construct the database. Current methods in the literature employ channel modeling to construct the radio frequency map, which lacks the granularity for accurate localization whereas our proposed approach reconstructs a new generalized RFMap. Localization results are presented and compared with conventional channel models.
In this paper, we consider the IoT data discovery problem in very large and growing scale networks. Specifically, we investigate in depth the routing table summarization techniques to support effective and space-efficient IoT data discovery routing. Novel summarization algorithms, including alphabetical based, hash based, and meaning based summarization and their corresponding coding schemes are proposed. The issue of potentially misleading routing due to summarization is also investigated. Subsequently, we analyze the strategy of when to summarize in order to balance the tradeoff between the routing table compression rate and the chance of causing misleading routing. For experimental study, we have collected 100K IoT data streams from various IoT databases as the input dataset. Experimental results show that our summarization solution can reduce the routing table size by 20 to 30 folds with 2-5% increase in latency when compared with similar peer-to-peer discovery routing algorithms without summarization. Also, our approach outperforms DHT based approaches by 2 to 6 folds in terms of latency and traffic.
Approximate inference in deep Bayesian networks exhibits a dilemma of how to yield high fidelity posterior approximations while maintaining computational efficiency and scalability. We tackle this challenge by introducing a new variational structured approximation inspired by the interpretation of Dropout training as approximate inference in Bayesian probabilistic models. Concretely, we focus on restrictions of the factorized structure of Dropout posterior which is inflexible to capture rich correlations among weight parameters of the true posterior, and we then propose a novel method called Variational Structured Dropout (VSD) to overcome this limitation. VSD employs an orthogonal transformation to learn a structured representation on the variational Dropout noise and consequently induces statistical dependencies in the approximate posterior. We further gain expressive Bayesian modeling for VSD via proposing a hierarchical Dropout procedure that corresponds to the joint inference in a Bayesian network. Moreover, we can scale up VSD to modern deep convolutional networks in a direct way with a low computational cost. Finally, we conduct extensive experiments on standard benchmarks to demonstrate the effectiveness of VSD over state-of-the-art methods on both predictive accuracy and uncertainty estimation.