In this paper, we propose a novel laser-inertial odometry and mapping method to achieve real-time, low-drift and robust pose estimation in large-scale highway environments. The proposed method is mainly composed of four sequential modules, namely scan pre-processing module, dynamic object detection module, laser-inertial odometry module and laser mapping module. Scan pre-processing module uses inertial measurements to compensate the motion distortion of each laser scan. Then, the dynamic object detection module is used to detect and remove dynamic objects from each laser scan by applying CNN segmentation network. After obtaining the undistorted point cloud without moving objects, the laser inertial odometry module uses an Error State Kalman Filter to fuse the data of laser and IMU and output the coarse pose estimation at high frequency. Finally, the laser mapping module performs a fine processing step and the "Frame-to-Model" scan matching strategy is used to create a static global map. We compare the performance of our method with two state-ofthe-art methods, LOAM and SuMa, using KITTI dataset and real highway scene dataset. Experiment results show that our method performs better than the state-of-the-art methods in real highway environments and achieves competitive accuracy on the KITTI dataset.
Small intestinal capsule endoscopy is the mainstream method for inspecting small intestinal lesions,but a single small intestinal capsule endoscopy will produce 60,000 - 120,000 images, the majority of which are similar and have no diagnostic value. It takes 2 - 3 hours for doctors to identify lesions from these images. This is time-consuming and increase the probability of misdiagnosis and missed diagnosis since doctors are likely to experience visual fatigue while focusing on a large number of similar images for an extended period of time.In order to solve these problems, we proposed a similar wireless capsule endoscope (WCE) image screening method based on structural similarity analysis and the hierarchical clustering of visually salient sub-image blocks. The similarity clustering of images was automatically identified by hierarchical clustering based on the hue,saturation,value (HSV) spatial color characteristics of the images,and the keyframe images were extracted based on the structural similarity of the visually salient sub-image blocks, in order to accurately identify and screen out similar small intestinal capsule endoscopic images. Subsequently, the proposed method was applied to the capsule endoscope imaging workstation. After screening out similar images in the complete data gathered by the Type I OMOM Small Intestinal Capsule Endoscope from 52 cases covering 17 common types of small intestinal lesions, we obtained a lesion recall of 100% and an average similar image reduction ratio of 76%. With similar images screened out, the average play time of the OMOM image workstation was 18 minutes, which greatly reduced the time spent by doctors viewing the images.
Telepresence robots are used in various forms in various use-cases that helps to avoid physical human presence at the scene of action. In this work, we focus on a telepresence robot that can be used to attend a meeting remotely with a group of people. Unlike a one-to-one meeting, participants in a group meeting can be located at a different part of the room, especially in an informal setup. As a result, all of them may not be at the viewing angle of the robot, a.k.a. the remote participant. In such a case, to provide a better meeting experience, the robot should localize the speaker and bring the speaker at the center of the viewing angle. Though sound source localization can easily be done using a microphone-array, bringing the speaker or set of speakers at the viewing angle is not a trivial task. First of all, the robot should react only to a human voice, but not to the random noises. Secondly, if there are multiple speakers, to whom the robot should face or should it rotate continuously with every new speaker? Lastly, most robotic platforms are resource-constrained and to achieve a real-time response, i.e., avoiding network delay, all the algorithms should be implemented within the robot itself. This article presents a study and implementation of an attention shifting scheme in a telepresence meeting scenario which best suits the needs and expectations of the collocated and remote attendees. We define a policy to decide when a robot should rotate and how much based on real-time speaker localization. Using user satisfaction study, we show the efficacy and usability of our system in the meeting scenario. Moreover, our system can be easily adapted to other scenarios where multiple people are located.
An essential problem in automated machine learning (AutoML) is that of model selection. A unique challenge in the sequential setting is the fact that the optimal model itself may vary over time, depending on the distribution of features and labels available up to each point in time. In this paper, we propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting. This is accomplished by treating the performance at each time step as its own black-box function. In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions using deep kernel learning (DKL). To the best of our knowledge, we are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose. Using multiple real-world datasets, we verify that our proposed method outperforms both standard BO and multi-objective BO algorithms on a variety of sequence prediction tasks.
Building Automatic Speech Recognition (ASR) systems from scratch is significantly challenging, mostly due to the time-consuming and financially-expensive process of annotating a large amount of audio data with transcripts. Although several unsupervised pre-training models have been proposed, applying such models directly might still be sub-optimal if more labeled, training data could be obtained without a large cost. In this paper, we present a weakly supervised framework for constructing ASR systems with massive video data. As videos often contain human-speech audios aligned with subtitles, we consider videos as an important knowledge source, and propose an effective approach to extract high-quality audios aligned with transcripts from videos based on Optical Character Recognition (OCR). The underlying ASR model can be fine-tuned to fit any domain-specific target training datasets after weakly supervised pre-training. Extensive experiments show that our framework can easily produce state-of-the-art results on six public datasets for Mandarin speech recognition.
Determining when and whether to provide personalized support is a well-known challenge called the assistance dilemma. A core problem in solving the assistance dilemma is the need to discover when students are unproductive so that the tutor can intervene. Such a task is particularly challenging for open-ended domains, even those that are well-structured with defined principles and goals. In this paper, we present a set of data-driven methods to classify, predict, and prevent unproductive problem-solving steps in the well-structured open-ended domain of logic. This approach leverages and extends the Hint Factory, a set of methods that leverages prior student solution attempts to build data-driven intelligent tutors. We present a HelpNeed classification, that uses prior student data to determine when students are likely to be unproductive and need help learning optimal problem-solving strategies. We present a controlled study to determine the impact of an Adaptive pedagogical policy that provides proactive hints at the start of each step based on the outcomes of our HelpNeed predictor: productive vs. unproductive. Our results show that the students in the Adaptive condition exhibited better training behaviors, with lower help avoidance, and higher help appropriateness (a higher chance of receiving help when it was likely to be needed), as measured using the HelpNeed classifier, when compared to the Control. Furthermore, the results show that the students who received Adaptive hints based on HelpNeed predictions during training significantly outperform their Control peers on the posttest, with the former producing shorter, more optimal solutions in less time. We conclude with suggestions on how these HelpNeed methods could be applied in other well-structured open-ended domains.
There has been an increasing necessity for scalable optimization methods, especially due to the explosion in the size of datasets and model complexity in modern machine learning applications. Scalable solvers often distribute the computation over a network of processing units. For simple algorithms such as gradient descent the dependency of the convergence time with the topology of this network is well-known. However, for more involved algorithms such as the Alternating Direction Methods of Multipliers (ADMM) much less is known. At the heart of many distributed optimization algorithms there exists a gossip subroutine which averages local information over the network, and whose efficiency is crucial for the overall performance of the method. In this paper we review recent research in this area and, with the goal of isolating such a communication exchange behaviour, we compare different algorithms when applied to a canonical distributed averaging consensus problem. We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence and optimal parameter tuning in terms of spectral properties of the network. Finally, we empirically study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
Online Social Media platforms (such as Twitter and Facebook) are extensively used for spreading the news to a wider public effortlessly at a rapid pace. However, now a days these platforms are also used with an aim of spreading rumors and fake news to a large audience in a short time span that can cause panic, fear, and financial loss to society. Thus, it is important to detect and control these rumors before it spreads to the masses. One way to control the spread of these rumors is by identifying possible suspicious users who are often involved in spreading the rumors. Our basic assumption is that the users who are often involved in spreading rumors are more likely to be suspicious in contrast to the users whose involvement in spreading rumors are less. This is due to the fact that sometimes, users may posts the rumor tweets by accident. In this paper, we use PHEME rumor tweet dataset which contains rumor and non-rumor tweets information on five incidents, that is, i) Charlie hebdo, ii)German wings crash, iii)Ottawa shooting, iv)Sydney siege, and v)Ferguson. We transform this rumor tweets dataset into suspicious users dataset before leveraging Graph Neural Network (GNN) based approach for identifying suspicious users. Specifically, we explore Graph Convolutional Network (GCN),which is a type of GNN, for identifying suspicious users and then we compare GCN results with the other three approaches which act as baseline approaches: SVM, RF and LSTM based deep learning architecture. Extensive experiments performed on real-world dataset, where we achieve up to 0.864 value for F1-Score and 0.720 value for AUC ROC, shows the effectiveness of GNN based approach for identifying suspicious users.
Source code summarizing is a task of writing short, natural language descriptions of source code behavior during run time. Such summaries are extremely useful for software development and maintenance but are expensive to manually author,hence it is done for small fraction of the code that is produced and is often ignored. Automatic code documentation can possibly solve this at a low cost. This is thus an emerging research field with further applications to program comprehension, and software maintenance. Traditional methods often relied on cognitive models that were built in the form of templates and by heuristics and had varying degree of adoption by the developer community. But with recent advancements, end to end data-driven approaches based on neural techniques have largely overtaken the traditional techniques. Much of the current landscape employs neural translation based architectures with recurrence and attention which is resource and time intensive training procedure. In this paper, we employ neural techniques to solve the task of source code summarizing and specifically compare NMT based techniques to more simplified and appealing Transformer architecture on a dataset of Java methods and comments. We bring forth an argument to dispense the need of recurrence in the training procedure. To the best of our knowledge, transformer based models have not been used for the task before. With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50% and achieve the BLEU score of 17.99 for the test set of examples.
SVM with an RBF kernel is usually one of the best classification algorithms for most data sets, but it is important to tune the two hyperparameters $C$ and $\gamma$ to the data itself. In general, the selection of the hyperparameters is a non-convex optimization problem and thus many algorithms have been proposed to solve it, among them: grid search, random search, Bayesian optimization, simulated annealing, particle swarm optimization, Nelder Mead, and others. There have also been proposals to decouple the selection of $\gamma$ and $C$. We empirically compare 18 of these proposed search algorithms (with different parameterizations for a total of 47 combinations) on 115 real-life binary data sets. We find (among other things) that trees of Parzen estimators and particle swarm optimization select better hyperparameters with only a slight increase in computation time with respect to a grid search with the same number of evaluations. We also find that spending too much computational effort searching the hyperparameters will not likely result in better performance for future data and that there are no significant differences among the different procedures to select the best set of hyperparameters when more than one is found by the search algorithms.