Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Deep Learning based Urban Vehicle Trajectory Analytics

Nov 15, 2021
Seongjin Choi

A `trajectory' refers to a trace generated by a moving object in geographical spaces, usually represented by of a series of chronologically ordered points, where each point consists of a geo-spatial coordinate set and a timestamp. Rapid advancements in location sensing and wireless communication technology enabled us to collect and store a massive amount of trajectory data. As a result, many researchers use trajectory data to analyze mobility of various moving objects. In this dissertation, we focus on the `urban vehicle trajectory,' which refers to trajectories of vehicles in urban traffic networks, and we focus on `urban vehicle trajectory analytics.' The urban vehicle trajectory analytics offers unprecedented opportunities to understand vehicle movement patterns in urban traffic networks including both user-centric travel experiences and system-wide spatiotemporal patterns. The spatiotemporal features of urban vehicle trajectory data are structurally correlated with each other, and consequently, many previous researchers used various methods to understand this structure. Especially, deep-learning models are getting attentions of many researchers due to its powerful function approximation and feature representation abilities. As a result, the objective of this dissertation is to develop deep-learning based models for urban vehicle trajectory analytics to better understand the mobility patterns of urban traffic networks. Particularly, this dissertation focuses on two research topics, which has high necessity, importance and applicability: Next Location Prediction, and Synthetic Trajectory Generation. In this study, we propose various novel models for urban vehicle trajectory analytics using deep learning.

* 110 pages, PhD dissertation 

  Access Paper or Ask Questions

Reinforcement Learning-powered Semantic Communication via Semantic Similarity

Aug 27, 2021
Kun Lu, Rongpeng Li, Xianfu Chen, Zhifeng Zhao, Honggang Zhang

We introduce a new semantic communication mechanism, whose key idea is to preserve the semantic information instead of strictly securing the bit-level precision. Starting by analyzing the defects of existing joint source channel coding (JSCC) methods, we show that the commonly used bit-level metrics are vulnerable of catching important semantic meaning and structures. To address this problem, we take advantage of learning from semantic similarity, instead of relying on conventional paired bit-level supervisions like cross entropy and bit error rate. However, to develop such a semantic communication system is indeed a nontrivial task, considering the nondifferentiability of most semantic metrics as well as the instability from noisy channels. To further resolve these issues, we put forward a reinforcement learning (RL)-based solution which allows us to simultaneously optimize any user-defined semantic measurement by using the policy gradient technique, and to interact with the surrounding noisy environment in a natural way. We have testified the proposed method in the challenging European-parliament dataset. Experiments on both AWGN and phase-invariant fading channel have confirmed the superiority of our method in revealing the semantic meanings, and better handling the channel noise especially in low-SNR situations. Apart from the experimental results, we further provide an indepth look at how the semantics model behaves, along with its superb generalization ability in real-life examples. As a brand new method in learning-based JSCC tasks, we also exemplify an RL-based image transmission paradigm, both to prove the generalization ability, and to leave this new topic for future discussion.

* 13 pages, 7 figures. Codes available on Github 

  Access Paper or Ask Questions

A Novel Upsampling and Context Convolution for Image Semantic Segmentation

Mar 20, 2021
Khwaja Monib Sediqi, Hyo Jong Lee

Semantic segmentation, which refers to pixel-wise classification of an image, is a fundamental topic in computer vision owing to its growing importance in robot vision and autonomous driving industries. It provides rich information about objects in the scene such as object boundary, category, and location. Recent methods for semantic segmentation often employ an encoder-decoder structure using deep convolutional neural networks. The encoder part extracts feature of the image using several filters and pooling operations, whereas the decoder part gradually recovers the low-resolution feature maps of the encoder into a full input resolution feature map for pixel-wise prediction. However, the encoder-decoder variants for semantic segmentation suffer from severe spatial information loss, caused by pooling operations or convolutions with stride, and does not consider the context in the scene. In this paper, we propose a dense upsampling convolution method based on guided filtering to effectively preserve the spatial information of the image in the network. We further propose a novel local context convolution method that not only covers larger-scale objects in the scene but covers them densely for precise object boundary delineation. Theoretical analyses and experimental results on several benchmark datasets verify the effectiveness of our method. Qualitatively, our approach delineates object boundaries at a level of accuracy that is beyond the current excellent methods. Quantitatively, we report a new record of 82.86% and 81.62% of pixel accuracy on ADE20K and Pascal-Context benchmark datasets, respectively. In comparison with the state-of-the-art methods, the proposed method offers promising improvements.

* Sensors 2021, 21, 2170 
* 11 pages, published in sensors journal 

  Access Paper or Ask Questions

Shift Equivariance for Pixel-based Self-supervised SAR-optical Feature Fusion

Mar 13, 2021
Yuxing Chen, Lorenzo Bruzzone

The effective combination of the complementary information provided by the huge amount of unlabeled multi-sensor data (e.g., Synthetic Aperture Radar (SAR), optical images) is a critical topic in remote sensing. Recently, contrastive learning methods have reached remarkable success in obtaining meaningful feature representations from multi-view data. However, these methods only focus on the image-level features, which may not satisfy the requirement for dense prediction tasks such as the land-cover mapping. In this work, we propose a new self-supervised approach to SAR-optical data fusion that can learn disentangled pixel-wise feature representations directly by taking advantage of both multi-view contrastive loss and the bootstrap your own latent (BYOL) methods. Two key contributions of the proposed approach are a multi-view contrastive loss to encode the multimodal images and a shift operation to reconstruct learned representations for each pixel by building the local consistency between different augmented views. In the experimental period, we first verified the effectiveness of multi-view contrastive loss and BYOL in self-supervised learning on SAR-optical fusion using an image-level classification task. Then we validated the proposed approach on a land-cover mapping task by training it with unlabeled SAR-optical image pairs. There we used labeled data pairs to evaluate the discriminative capability of learned features in downstream tasks. Results show that the proposed approach extracts features that result in higher accuracy and that reduces the dimension of representations with respect to the image-level contrastive learning method.

* 11 pages, 5 figures 

  Access Paper or Ask Questions

Should Answer Immediately or Wait for Further Information? A Novel Wait-or-Answer Task and Its Predictive Approach

May 27, 2020
Zehao Lin, Shaobo Cui, Xiaoming Kang, Guodun Li, Feng Ji, Haiqing Chen, Yin Zhang

Different people have different habits of describing their intents in conversations. Some people may tend to deliberate their full intents in several successive utterances, i.e., they use several consistent messages for readability instead of a long sentence to express their question. This creates a predicament faced by dialogue systems' application, especially in real-world industrial scenarios, in which the dialogue system is unsure that whether it should answer the user's query immediately or wait for users' further supplementary input. Motivated by such interesting quandary, we define a novel task: Wait-or-Answer to better tackle this dilemma faced by dialogue systems. We shed light on a new research topic about how the dialogue system can be more competent to behave in this Wait-or-Answer quandary. Further, we propose a predictive approach dubbed Imagine-then-Arbitrate (ITA) to resolve this Wait-or-Answer task. More specifically, we take advantage of an arbitrator model to help the dialogue system decide to wait or answer. The arbitrator's decision is made with the assistance of two ancillary imaginator models: a wait imaginator and an answer imaginator. The wait imaginator tries to predict what the user would supplement and use its prediction to persuade the arbitrator that the user has some information to add, so the dialogue system should wait. The answer imaginator, nevertheless, struggles to predict the answer of the dialogue system and convince the arbitrator that it's a superior choice to answer the users' query immediately. To our best knowledge, our paper is the first work to explicitly define the Wait-or-Answer task in the dialogue system. Additionally, our proposed ITA approach significantly outperforms the existing models in solving this Wait-or-Answer problem.

* This previously appeared as arXiv:2002.09616v2, which was mistakenly submitted as a replacement. arXiv admin note: text overlap with arXiv:2002.09616v3 

  Access Paper or Ask Questions

Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization

Aug 07, 2019
Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Yi Niu, Shiliang Pu, Fei Wu, Futai Zou

Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizing the video-level classification loss, which exploits the most discriminative parts of actions but ignores the minor regions. In this paper, we propose a novel weakly-supervised framework by adversarial learning of two modules for eliminating such demerits. Specifically, the first module is designed as a well-designed Seeded Sequence Growing (SSG) Network for progressively extending seed regions (namely the highly reliable regions initialized by a CAS-based framework) to their expected boundaries. The second module is a specific classifier for mining trivial or incomplete action regions, which is trained on the shared features after erasing the seeded regions activated by SSG. In this way, a whole network composed of these two modules can be trained in an adversarial manner. The goal of the adversary is to mine features that are difficult for the action classifier. That is, erasion from SSG will force the classifier to discover minor or even new action regions on the input feature sequence, and the classifier will drive the seeds to grow, alternately. At last, we could obtain the action locations and categories from the well-trained SSG and the classifier. Extensive experiments on two public benchmarks THUMOS'14 and ActivityNet1.3 demonstrate the impressive performance of our proposed method compared with the state-of-the-arts.

* To be appeared in ACM MM2019 

  Access Paper or Ask Questions

Detection of Malfunctioning Smart Electricity Meter

Jul 26, 2019
Ming Liu, Dongpeng Liu, Guangyu Sun, Yi Zhao, Duolin Wang, Fangxing Liu, Xiang Fang, Qing He, Dong Xu

In this paper, a method for malfunctioning smart meter detection, based on Long Short-Term Memory (LSTM) and Temporal Phase Convolutional Neural Network (TPCNN), is proposed originally. This method is very useful for some developing countries where smart meters have not been popularized but in high demand. In addition, it is a new topic that people try to increase the service life span of smart meters to prevent unnecessary waste by detecting malfunctioning meters. We are the first people complete a combination of malfunctioning meters detection and prediction model based on deep learning methods. To the best our knowledge, our approach is the first method that achieves the malfunctioning meter detection of specific residential areas with their residents' data in practice. The procedure proposed creatively in this paper mainly consists of four components: data collecting and cleaning, prediction about electricity consumption based on LSTM, sliding window detection, and single user classification based on CNN. To make better classifying of malfunctioned user meters, we combine recurrence plots as image-input and combine them with sequence-input, which is the first work that applies one and two dimensions as two paths CNN's input for sequence data classification. Finally, many classical methods are compared with the method proposed in this paper. After comparison with classical methods, Elastic Net and Gradient Boosting Regression, the result shows that our method has higher accuracy. The average area under the Receiver Operating Characteristic (ROC) curve is 0.80 and the standard deviation is 0.04. The average area under the Precision-Recall Curve (PRC) is 0.84.

  Access Paper or Ask Questions

On Transfer Learning For Chatter Detection in Turning Using Wavelet Packet Transform and Empirical Mode Decomposition

May 03, 2019
Melih C. Yesilli, Firas A. Khasawneh, Andreas Otto

The increasing availability of sensor data at machine tools makes automatic chatter detection algorithms a trending topic in metal cutting. Two prominent and advanced methods for feature extraction via signal decomposition are Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We apply these two methods to time series acquired from an acceleration sensor at the tool holder of a lathe. Different turning experiments with varying dynamic behavior of the machine tool structure were performed. We compare the performance of these two methods with Support Vector Machine (SVM) classifier combined with Recursive Feature Elimination (RFE). We also show that the common WPT-based approach of choosing wavelet packets with the highest energy ratios as representative features for chatter does not always result in packets that enclose the chatter frequency, thus reducing the classification accuracy. Further, we test the transfer learning capability of each of these methods by training the classifier on one of the cutting configurations and then testing it on the other cases. It is found that when training and testing on data from the same cutting configuration both methods yield high accuracies reaching in one of the cases as high as 94% and 91%, respectively, for WPT and EEMD. However, EEMD is shown to outperform WPT in transfer learning applications with accuracy of up to 84%. Therefore, for systems where the movement of the cutting center leads to significant variations in the stiffness of the machine-tool system, we recommend using EEMD over WPT for training a classifier. This is because EEMD retains higher accuracy rates in comparison to WPT when the input data stream deviates from the data that was used to train the classifier.

  Access Paper or Ask Questions

Acceleration of expensive computations in Bayesian statistics using vector operations

Feb 25, 2019
David J. Warne, Scott A. Sisson, Christopher Drovandi

Many applications in Bayesian statistics are extremely computationally intensive. However, they are also often inherently parallel, making them prime targets for modern massively parallel central processing unit (CPU) architectures. While the use of multi-core and distributed computing is widely applied in the Bayesian community, very little attention has been given to fine-grain parallelisation using single instruction multiple data (SIMD) operations that are available on most modern commodity CPUs. Rather, most fine-grain tuning in the literature has centred around general purpose graphics processing units (GPGPUs). Since the effective utilisation of GPGPUs typically requires specialised programming languages, such technologies are not ideal for the wider Bayesian community. In this work, we practically demonstrate, using standard programming libraries, the utility of the SIMD approach for several topical Bayesian applications. In particular, we consider sampling of the prior predictive distribution for approximate Bayesian computation (ABC), and the computation of Bayesian $p$-values for testing prior weak informativeness. Through minor code alterations, we show that SIMD operations can improve the floating point arithmetic performance resulting in up to $6\times$ improvement in the overall serial algorithm performance. Furthermore $4$-way parallel versions can lead to almost $19\times$ improvement over a na\"{i}ve serial implementation. We illustrate the potential of SIMD operations for accelerating Bayesian computations and provide the reader with essential implementation techniques required to exploit modern massively parallel processing environments using standard software development tools.

  Access Paper or Ask Questions

Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture

Oct 01, 2017
R. Stuart Geiger

Scholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. I report from an ethnography of infrastructure in Wikipedia to discuss an often understudied aspect of this topic: the local, contextual, learned expertise involved in participating in a highly automated social-technical environment. Today, the organizational culture of Wikipedia is deeply intertwined with various data-driven algorithmic systems, which Wikipedians rely on to help manage and govern the "anyone can edit" encyclopedia at a massive scale. These bots, scripts, tools, plugins, and dashboards make Wikipedia more efficient for those who know how to work with them, but like all organizational culture, newcomers must learn them if they want to fully participate. I illustrate how cultural and organizational expertise is enacted around algorithmic agents by discussing two autoethnographic vignettes, which relate my personal experience as a veteran in Wikipedia. I present thick descriptions of how governance and gatekeeping practices are articulated through and in alignment with these automated infrastructures. Over the past 15 years, Wikipedian veterans and administrators have made specific decisions to support administrative and editorial workflows with automation in particular ways and not others. I use these cases of Wikipedia's bot-supported bureaucracy to discuss several issues in the fields of critical algorithms studies, critical data studies, and fairness, accountability, and transparency in machine learning -- most principally arguing that scholarship and practice must go beyond trying to "open up the black box" of such systems and also examine sociocultural processes like newcomer socialization.

* Big Data & Society 4(2). 2017 
* 14 pages, typo fixed in v2 

  Access Paper or Ask Questions