Automated pavement distresses detection using road images remains a challenging topic in the computer vision research community. Recent developments in deep learning has led to considerable research activity directed towards improving the efficacy of automated pavement distress identification and rating. Deep learning models require a large ground truth data set, which is often not readily available in the case of pavements. In this study, a labeled dataset approach is introduced as a first step towards a more robust, easy-to-deploy pavement condition assessment system. The technique is termed herein as the Pavement Image Dataset (PID) method. The dataset consists of images captured from two camera views of an identical pavement segment, i.e., a wide-view and a top-down view. The wide-view images were used to classify the distresses and to train the deep learning frameworks, while the top-down view images allowed calculation of distress density, which will be used in future studies aimed at automated pavement rating. For the wide view group dataset, 7,237 images were manually annotated and distresses classified into nine categories. Images were extracted using the Google Application Programming Interface (API), selecting street-view images using a python-based code developed for this project. The new dataset was evaluated using two mainstream deep learning frameworks: You Only Look Once (YOLO v2) and Faster Region Convolution Neural Network (Faster R-CNN). Accuracy scores using the F1 index were found to be 0.84 for YOLOv2 and 0.65 for the Faster R-CNN model runs; both quite acceptable considering the convenience of utilizing Google maps images.
Due to the potential risk of inducing cancers, radiation dose of X-ray CT should be reduced for routine patient scanning. However, in low-dose X-ray CT, severe artifacts usually occur due to photon starvation, beamhardening, etc, which decrease the reliability of diagnosis. Thus, high quality reconstruction from low-dose X-ray CT data has become one of the important research topics in CT community. Conventional model-based denoising approaches are, however, computationally very expensive, and image domain denoising approaches hardly deal with CT specific noise patterns. To address these issues, we propose an algorithm using a deep convolutional neural network (CNN), which is applied to wavelet transform coefficients of low-dose CT images. Specifically, by using a directional wavelet transform for extracting directional component of artifacts and exploiting the intra- and inter-band correlations, our deep network can effectively suppress CT specific noises. Moreover, our CNN is designed to have various types of residual learning architecture for faster network training and better denoising. Experimental results confirm that the proposed algorithm effectively removes complex noise patterns of CT images, originated from the reduced X-ray dose. In addition, we show that wavelet domain CNN is efficient in removing the noises from low-dose CT compared to an image domain CNN. Our results were rigorously evaluated by several radiologists and won the second place award in 2016 AAPM Low-Dose CT Grand Challenge. To the best of our knowledge, this work is the first deep learning architecture for low-dose CT reconstruction that has been rigorously evaluated and proven for its efficacy.
Multi-view graph embedding has become a widely studied problem in the area of graph learning. Most of the existing works on multi-view graph embedding aim to find a shared common node embedding across all the views of the graph by combining the different views in a specific way. Hub detection, as another essential topic in graph mining has also drawn extensive attentions in recent years, especially in the context of brain network analysis. Both the graph embedding and hub detection relate to the node clustering structure of graphs. The multi-view graph embedding usually implies the node clustering structure of the graph based on the multiple views, while the hubs are the boundary-spanning nodes across different node clusters in the graph and thus may potentially influence the clustering structure of the graph. However, none of the existing works in multi-view graph embedding considered the hubs when learning the multi-view embeddings. In this paper, we propose to incorporate the hub detection task into the multi-view graph embedding framework so that the two tasks could benefit each other. Specifically, we propose an auto-weighted framework of Multi-view Graph Embedding with Hub Detection (MVGE-HD) for brain network analysis. The MVGE-HD framework learns a unified graph embedding across all the views while reducing the potential influence of the hubs on blurring the boundaries between node clusters in the graph, thus leading to a clear and discriminative node clustering structure for the graph. We apply MVGE-HD on two real multi-view brain network datasets (i.e., HIV and Bipolar). The experimental results demonstrate the superior performance of the proposed framework in brain network analysis for clinical investigation and application.
In recent years, autonomous driving algorithms using low-cost vehicle-mounted cameras have attracted increasing endeavors from both academia and industry. There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks. This represents a nascent research topic in computer vision. The technical contributions of this work are three-fold. First, the model is learned and evaluated on real human driving videos that are time-synchronized with other vehicle sensors. This differs from many prior models trained from synthetic data in racing games. Second, state-of-the-art models, such as PilotNet, mostly predict the wheel angles independently on each video frame, which contradicts common understanding of driving as a stateful process. Instead, our proposed model strikes a combination of spatial and temporal cues, jointly investigating instantaneous monocular camera observations and vehicle's historical states. This is in practice accomplished by inserting carefully-designed recurrent units (e.g., LSTM and Conv-LSTM) at proper network layers. Third, to facilitate the interpretability of the learned model, we utilize a visual back-propagation scheme for discovering and visualizing image regions crucially influencing the final steering prediction. Our experimental study is based on about 6 hours of human driving data provided by Udacity. Comprehensive quantitative evaluations demonstrate the effectiveness and robustness of our model, even under scenarios like drastic lighting changes and abrupt turning. The comparison with other state-of-the-art models clearly reveals its superior performance in predicting the due wheel angle for a self-driving car.
While going deeper has been witnessed to improve the performance of convolutional neural networks (CNN), going smaller for CNN has received increasing attention recently due to its attractiveness for mobile/embedded applications. It remains an active and important topic how to design a small network while retaining the performance of large and deep CNNs (e.g., Inception Nets, ResNets). Albeit there are already intensive studies on compressing the size of CNNs, the considerable drop of performance is still a key concern in many designs. This paper addresses this concern with several new contributions. First, we propose a simple yet powerful method for compressing the size of deep CNNs based on parameter binarization. The striking difference from most previous work on parameter binarization/quantization lies at different treatments of $1\times 1$ convolutions and $k\times k$ convolutions ($k>1$), where we only binarize $k\times k$ convolutions into binary patterns. The resulting networks are referred to as pattern networks. By doing this, we show that previous deep CNNs such as GoogLeNet and Inception-type Nets can be compressed dramatically with marginal drop in performance. Second, in light of the different functionalities of $1\times 1$ (data projection/transformation) and $k\times k$ convolutions (pattern extraction), we propose a new block structure codenamed the pattern residual block that adds transformed feature maps generated by $1\times 1$ convolutions to the pattern feature maps generated by $k\times k$ convolutions, based on which we design a small network with $\sim 1$ million parameters. Combining with our parameter binarization, we achieve better performance on ImageNet than using similar sized networks including recently released Google MobileNets.
Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels. This social interaction by online users includes submission of feedback, opinions and recommendations about various individuals, entities, topics, and events. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Therefore, through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence, which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, natural language processing tasks and other aspects derived from the published studies. Such multi-source information fusion plays a fundamental role in mining of people's social opinions from social media platforms. These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. Future research directions are presented, whereas further research and development has the potential of leaving a wider academic and societal impact.
Despite the tremendous success of neural dialogue models in recent years, it suffers a lack of relevance, diversity, and some times coherence in generated responses. Lately, transformer-based models, such as GPT-2, have revolutionized the landscape of dialogue generation by capturing the long-range structures through language modeling. Though these models have exhibited excellent language coherence, they often lack relevance and terms when used for domain-specific response generation. In this paper, we present DSRNet (Domain Specific Response Network), a transformer-based model for dialogue response generation by reinforcing domain-specific attributes. In particular, we extract meta attributes from context and infuse them with the context utterances for better attention over domain-specific key terms and relevance. We study the use of DSRNet in a multi-turn multi-interlocutor environment for domain-specific response generation. In our experiments, we evaluate DSRNet on Ubuntu dialogue datasets, which are mainly composed of various technical domain related dialogues for IT domain issue resolutions and also on CamRest676 dataset, which contains restaurant domain conversations. Trained with maximum likelihood objective, our model shows significant improvement over the state-of-the-art for multi-turn dialogue systems supported by better BLEU and semantic similarity (BertScore) scores. Besides, we also observe that the responses produced by our model carry higher relevance due to the presence of domain-specific key attributes that exhibit better overlap with the attributes of the context. Our analysis shows that the performance improvement is mostly due to the infusion of key terms along with dialogues which result in better attention over domain-relevant terms. Other contributing factors include joint modeling of dialogue context with the domain-specific meta attributes and topics.
With the wider availability of sensor technology, a number of Structural Health Monitoring (SHM) systems are deployed to monitor civil infrastructure. The continuous monitoring provides valuable information about the structure that can help in providing a decision support system for retrofits and other structural modifications. However, when the sensors are exposed to harsh environmental conditions, the data measured by the SHM systems tend to be affected by multiple anomalies caused by faulty or broken sensors. Given a deluge of high-dimensional data collected continuously over time, research into using machine learning methods to detect anomalies are a topic of great interest to the SHM community. This paper contributes to this effort by proposing the use of a relatively new time series representation named Shapelet Transform in combination with a Random Forest classifier to autonomously identify anomalies in SHM data. The shapelet transform is a unique time series representation that is solely based on the shape of the time series data. In consideration of the individual characteristics unique to every anomaly, the application of this transform yields a new shape-based feature representation that can be combined with any standard machine learning algorithm to detect anomalous data with no manual intervention. For the present study, the anomaly detection framework consists of three steps: identifying unique shapes from anomalous data, using these shapes to transform the SHM data into a local-shape space and training machine learning algorithm on this transformed data to identify anomalies. The efficacy of this method is demonstrated by the identification of anomalies in acceleration data from a SHM system installed on a long-span bridge in China. The results show that multiple data anomalies in SHM data can be automatically detected with high accuracy using the proposed method.
Precise and efficient automated identification of Gastrointestinal (GI) tract diseases can help doctors treat more patients and improve the rate of disease detection and identification. Currently, automatic analysis of diseases in the GI tract is a hot topic in both computer science and medical-related journals. Nevertheless, the evaluation of such an automatic analysis is often incomplete or simply wrong. Algorithms are often only tested on small and biased datasets, and cross-dataset evaluations are rarely performed. A clear understanding of evaluation metrics and machine learning models with cross datasets is crucial to bring research in the field to a new quality level. Towards this goal, we present comprehensive evaluations of five distinct machine learning models using Global Features and Deep Neural Networks that can classify 16 different key types of GI tract conditions, including pathological findings, anatomical landmarks, polyp removal conditions, and normal findings from images captured by common GI tract examination instruments. In our evaluation, we introduce performance hexagons using six performance metrics such as recall, precision, specificity, accuracy, F1-score, and Matthews Correlation Coefficient to demonstrate how to determine the real capabilities of models rather than evaluating them shallowly. Furthermore, we perform cross-dataset evaluations using different datasets for training and testing. With these cross-dataset evaluations, we demonstrate the challenge of actually building a generalizable model that could be used across different hospitals. Our experiments clearly show that more sophisticated performance metrics and evaluation methods need to be applied to get reliable models rather than depending on evaluations of the splits of the same dataset, i.e., the performance metrics should always be interpreted together rather than relying on a single metric.
While 3D reconstruction is a well-established and widely explored research topic, semantic 3D reconstruction has only recently witnessed an increasing share of attention from the Computer Vision community. Semantic annotations allow in fact to enforce strong class-dependent priors, as planarity for ground and walls, which can be exploited to refine the reconstruction often resulting in non-trivial performance improvements. State-of-the art methods propose volumetric approaches to fuse RGB image data with semantic labels; even if successful, they do not scale well and fail to output high resolution meshes. In this paper we propose a novel method to refine both the geometry and the semantic labeling of a given mesh. We refine the mesh geometry by applying a variational method that optimizes a composite energy made of a state-of-the-art pairwise photo-metric term and a single-view term that models the semantic consistency between the labels of the 3D mesh and those of the segmented images. We also update the semantic labeling through a novel Markov Random Field (MRF) formulation that, together with the classical data and smoothness terms, takes into account class-specific priors estimated directly from the annotated mesh. This is in contrast to state-of-the-art methods that are typically based on handcrafted or learned priors. We are the first, jointly with the very recent and seminal work of [M. Blaha et al arXiv:1706.08336, 2017], to propose the use of semantics inside a mesh refinement framework. Differently from [M. Blaha et al arXiv:1706.08336, 2017], which adopts a more classical pairwise comparison to estimate the flow of the mesh, we apply a single-view comparison between the semantically annotated image and the current 3D mesh labels; this improves the robustness in case of noisy segmentations.