Order dispatch is one of the central problems to ride-sharing platforms. Recently, value-based reinforcement learning algorithms have shown promising performance on this problem. However, in real-world applications, the non-stationarity of the demand-supply system poses challenges to re-utilizing data generated in different time periods to learn the value function. In this work, motivated by the fact that the relative relationship between the values of some states is largely stable across various environments, we propose a pattern transfer learning framework for value-based reinforcement learning in the order dispatch problem. Our method efficiently captures the value patterns by incorporating a concordance penalty. The superior performance of the proposed method is supported by experiments.
This paper presents a novel real-time tracking system capable of improving body pose estimation algorithms in distributed camera networks. The first stage of our approach introduces a linear Kalman filter operating at the body joints level, used to fuse single-view body poses coming from different detection nodes of the network and to ensure temporal consistency between them. The second stage, instead, refines the Kalman filter estimates by fitting a hierarchical model of the human body having constrained link sizes in order to ensure the physical consistency of the tracking. The effectiveness of the proposed approach is demonstrated through a broad experimental validation, performed on a set of sequences whose ground truth references are generated by a commercial marker-based motion capture system. The obtained results show how the proposed system outperforms the considered state-of-the-art approaches, granting accurate and reliable estimates. Moreover, the developed methodology constrains neither the number of persons to track, nor the number, position, synchronization, frame-rate, and manufacturer of the RGB-D cameras used. Finally, the real-time performances of the system are of paramount importance for a large number of real-world applications.
User reviews have an essential role in the success of the developed mobile apps. User reviews in the textual form are unstructured data, creating a very high complexity when processed for sentiment analysis. Previous approaches that have been used often ignore the context of reviews. In addition, the relatively small data makes the model overfitting. A new approach, BERT, has been introduced as a transfer learning model with a pre-trained model that has previously been trained to have a better context representation. This study examines the effectiveness of fine-tuning BERT for sentiment analysis using two different pre-trained models. Besides the multilingual pre-trained model, we use the pre-trained model that only has been trained in Indonesian. The dataset used is Indonesian user reviews of the ten best apps in 2020 in Google Play sites. We also perform hyper-parameter tuning to find the optimum trained model. Two training data labeling approaches were also tested to determine the effectiveness of the model, which is score-based and lexicon-based. The experimental results show that pre-trained models trained in Indonesian have better average accuracy on lexicon-based data. The pre-trained Indonesian model highest accuracy is 84%, with 25 epochs and a training time of 24 minutes. These results are better than all of the machine learning and multilingual pre-trained models.
Commodity RGB-D sensors capture color images along with dense pixel-wise depth information in real-time. Typical RGB-D sensors are provided with a factory calibration and exhibit erratic depth readings due to coarse calibration values, ageing and thermal influence effects. This limits their applicability in computer vision and robotics. We propose a novel method to accurately calibrate depth considering spatial and thermal influences jointly. Our work is based on Gaussian Process Regression in a four dimensional Cartesian and thermal domain. We propose to leverage modern GPUs for dense depth map correction in real-time. For reproducibility we make our dataset and source code publicly available.
Deep learning (DL) based language models achieve high performance on various benchmarks for Natural Language Inference (NLI). And at this time, symbolic approaches to NLI are receiving less attention. Both approaches (symbolic and DL) have their advantages and weaknesses. However, currently, no method combines them in a system to solve the task of NLI. To merge symbolic and deep learning methods, we propose an inference framework called NeuralLog, which utilizes both a monotonicity-based logical inference engine and a neural network language model for phrase alignment. Our framework models the NLI task as a classic search problem and uses the beam search algorithm to search for optimal inference paths. Experiments show that our joint logic and neural inference system improves accuracy on the NLI task and can achieve state-of-art accuracy on the SICK and MED datasets.
Many automotive applications, such as Advanced Driver Assistance Systems (ADAS) for collision avoidance and warnings, require estimating the future automotive risk of a driving scene. We present a low-cost system that predicts the collision risk over an intermediate time horizon from a monocular video source, such as a dashboard-mounted camera. The modular system includes components for object detection, object tracking, and state estimation. We introduce solutions to the object tracking and distance estimation problems. Advanced approaches to the other tasks are used to produce real-time predictions of the automotive risk for the next 10 s at over 5 Hz. The system is designed such that alternative components can be substituted with minimal effort. It is demonstrated on common physical hardware, specifically an off-the-shelf gaming laptop and a webcam. We extend the framework to support absolute speed estimation and more advanced risk estimation techniques.
Over the past decade, Artificial Intelligence (AI) has provided enormous new possibilities and opportunities, but also new demands and requirements for software systems. In particular, Machine Learning (ML) has proven useful in almost every vertical application domain. Although other sub-disciplines of AI, such as intelligent agents and Multi-Agent Systems (MAS) did not become promoted to the same extent, they still possess the potential to be integrated into the mainstream technology stacks and ecosystems, for example, due to the ongoing prevalence of the Internet of Things (IoT) and smart Cyber-Physical Systems (CPS). However, in the decade ahead, an unprecedented paradigm shift from classical computing towards Quantum Computing (QC) is expected, with perhaps a quantum-classical hybrid model. We expect the Model-Driven Engineering (MDE) paradigm to be an enabler and a facilitator, when it comes to the quantum and the quantum-classical hybrid applications as it has already proven beneficial in the highly complex domains of IoT, smart CPS and AI with inherently heterogeneous hardware and software platforms, and APIs. This includes not only automated code generation, but also automated model checking and verification, as well as model analysis in the early design phases, and model-to-model transformations both at the design-time and at the runtime. In this paper, the vision is focused on MDE for Quantum AI, and a holistic approach integrating all of the above.
Learning representations for graphs plays a critical role in a wide spectrum of downstream applications. In this paper, we summarize the limitations of the prior works in three folds: representation space, modeling dynamics and modeling uncertainty. To bridge this gap, we propose to learn dynamic graph representation in hyperbolic space, for the first time, which aims to infer stochastic node representations. Working with hyperbolic space, we present a novel Hyperbolic Variational Graph Neural Network, referred to as HVGNN. In particular, to model the dynamics, we introduce a Temporal GNN (TGNN) based on a theoretically grounded time encoding approach. To model the uncertainty, we devise a hyperbolic graph variational autoencoder built upon the proposed TGNN to generate stochastic node representations of hyperbolic normal distributions. Furthermore, we introduce a reparameterisable sampling algorithm for the hyperbolic normal distribution to enable the gradient-based learning of HVGNN. Extensive experiments show that HVGNN outperforms state-of-the-art baselines on real-world datasets.
This paper proposes an efficient and robust algorithm to estimate target trajectories via multi-sensor bearing-only measurements with unknown target detection profiles and clutter rates. In particular, we propose to combine the multi-sensor Generalized Labeled Multi-Bernoulli (MS-GLMB) filter to estimate target trajectories and robust Cardinalized Probability Hypothesis Density (CPHD) filters to estimate the clutter rates. Experimental results show that the proposed robust filter exhibits near-optimal performance in the sense that it is comparable to the optimal MS-GLMB operating with the true clutter rate. More importantly, it outperforms other studied filters when the detection profile and clutter rate are unknown an time-variant. This is attributed to the ability of the robust filter to learn the background parameters on-the-fly.
We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data back into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known Diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm which obtains an even higher accuracy for off-line applications.