Aerial robot solutions are becoming ubiquitous for an increasing number of tasks. Among the various types of aerial robots, blimps are very well suited to perform long-duration tasks while being energy efficient, relatively silent and safe. To address the blimp navigation and control task, in our recent work, we have developed a software-in-the-loop simulation and a PID-based controller for large blimps in the presence of wind disturbance. However, blimps have a deformable structure and their dynamics are inherently non-linear and time-delayed, often resulting in large trajectory tracking errors. Moreover, the buoyancy of a blimp is constantly changing due to changes in the ambient temperature and pressure. In the present paper, we explore a deep reinforcement learning (DRL) approach to address these issues. We train only in simulation, while keeping conditions as close as possible to the real-world scenario. We derive a compact state representation to reduce the training time and a discrete action space to enforce control smoothness. Our initial results in simulation show a significant potential of DRL in solving the blimp control task and robustness against moderate wind and parameter uncertainty. Extensive experiments are presented to study the robustness of our approach. We also openly provide the source code of our approach.
Malignant melanoma is a common skin cancer that is mostly curable before metastasis -when growths spawn in organs away from the original site. Melanoma is the most dangerous type of skin cancer if left untreated due to the high risk of metastasis. This paper presents Melatect, a machine learning (ML) model embedded in an iOS app that identifies potential malignant melanoma. Melatect accurately classifies lesions as malignant or benign over 96.6% of the time with no apparent bias or overfitting. Using the Melatect app, users have the ability to take pictures of skin lesions (moles) and subsequently receive a mole classification. The Melatect app provides a convenient way to get free advice on lesions and track these lesions over time. A recursive computer image analysis algorithm and modified MLOps pipeline was developed to create a model that performs at a higher accuracy than existing models. Our training dataset included 18,400 images of benign and malignant lesions, including 18,000 from the International Skin Imaging Collaboration (ISIC) archive, as well as 400 images gathered from local dermatologists; these images were augmented using DeepAugment, an AutoML tool, to 54,054 images.
Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding performance in semantic image segmentation. However, state-of-the-art DCNN-based semantic segmentation methods usually suffer from high computational complexity due to the use of complex network architectures. This greatly limits their applications in the real-world scenarios that require real-time processing. In this paper, we propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes, which achieves a good trade-off between accuracy and speed. Specifically, a Lightweight Baseline Network with Atrous convolution and Attention (LBN-AA) is firstly used as our baseline network to efficiently obtain dense feature maps. Then, the Distinctive Atrous Spatial Pyramid Pooling (DASPP), which exploits the different sizes of pooling operations to encode the rich and distinctive semantic information, is developed to detect objects at multiple scales. Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate high-resolution feature maps preserving the detailed spatial information. Finally, a simple but practical Feature Fusion Network (FFN) is used to effectively combine both shallow and deep features from the semantic branch (DASPP) and the spatial branch (SPN), respectively. Extensive experimental results show that the proposed method respectively achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps on the challenging Cityscapes and CamVid test datasets (by only using a single NVIDIA TITAN X card). This demonstrates that the proposed method offers excellent performance at the real-time speed for semantic segmentation of urban street scenes.
We propose a novel belief space planning technique for continuous dynamics by viewing the belief system as a hybrid dynamical system with time-driven switching. Our approach is based on the perturbation theory of differential equations and extends Sequential Action Control to stochastic belief dynamics. The resulting algorithm, which we name SACBP, does not require discretization of spaces or time and synthesizes control signals in near real-time. SACBP is an anytime algorithm that can handle general parametric Bayesian filters under certain assumptions. We demonstrate the effectiveness of our approach in an active sensing scenario and a model-based Bayesian reinforcement learning problem. In these challenging problems, we show that the algorithm significantly outperforms other existing solution techniques including approximate dynamic programming and local trajectory optimization.
Recent advances in real-time music score following have made it possible for machines to automatically track highly complex polyphonic music, including full orchestra performances. In this paper, we attempt to take this to an even higher level, namely, live tracking of full operas. We first apply a state-of-the-art audio alignment method based on online Dynamic Time-Warping (OLTW) to full-length recordings of a Mozart opera and, analyzing the tracker's most severe errors, identify three common sources of problems specific to the opera scenario. To address these, we propose a combination of a DTW-based music tracker with specialized audio event detectors (for applause, silence/noise, and speech) that condition the DTW algorithm in a top-down fashion, and show, step by step, how these detectors add robustness to the score follower. However, there remain a number of open problems which we identify as targets for ongoing and future research.
The prediction of behavior in dynamical systems, is frequently subject to the design of models. When a time series obtained from observing the system is available, the task can be performed by designing the model from these observations without additional assumptions or by assuming a preconceived structure in the model, with the help of additional information about the system. In the second case, it is a question of adequately combining theory with observations and subsequently optimizing the mixture. In this work, we proposes the design of time-continuous models of dynamical systems as solutions of differential equations, from non-uniform sampled or noisy observations, using machine learning techniques. The performance of strategy is shown with both, several simulated data sets and experimental data from Hare-Lynx population and Coronavirus 2019 outbreack. Our results suggest that this approach to the modeling systems, can be an useful technique in the case of synthetic or experimental data.
Most existing multiobjetive evolutionary algorithms (MOEAs) implicitly assume that each objective function can be evaluated within the same period of time. Typically. this is untenable in many real-world optimization scenarios where evaluation of different objectives involves different computer simulations or physical experiments with distinct time complexity. To address this issue, a transfer learning scheme based on surrogate-assisted evolutionary algorithms (SAEAs) is proposed, in which a co-surrogate is adopted to model the functional relationship between the fast and slow objective functions and a transferable instance selection method is introduced to acquire useful knowledge from the search process of the fast objective. Our experimental results on DTLZ and UF test suites demonstrate that the proposed algorithm is competitive for solving bi-objective optimization where objectives have non-uniform evaluation times.
The prevalence of wearable sensors (e.g., smart wristband) is enabling an unprecedented opportunity to not only inform health and wellness states of individuals, but also assess and infer demographic information and personality. This can allow us a deeper personalized insight beyond how many steps we took or what is our heart rate. However, before we can achieve this goal of personalized insight about an individual, we have to resolve a number of shortcomings: 1) wearable-sensory time series is often of variable-length and incomplete due to different data collection periods (e.g., wearing behavior varies by person); 2) inter-individual variability to external factors like stress and environment. This paper addresses these challenges and brings us closer to the potential of personalized insights whether about health or personality or job performance about an individual by developing a novel representation learning algorithm, HeartSpace. Specifically, HeartSpace is capable of encoding time series data with variable-length and missing values via the integration of a time series encoding module and a pattern aggregation network. Additionally, HeartSpace implements a Siamese-triplet network to optimize representations by jointly capturing intra- and inter-series correlations during the embedding learning process. Our empirical evaluation over two different data presents significant performance gains over state-of-the-art baselines in a variety of applications, including personality prediction, demographics inference, user identification.
We introduce a Recursive INsertion-based Encoder (RINE), a novel approach for semantic parsing in task-oriented dialog. Our model consists of an encoder network that incrementally builds the semantic parse tree by predicting the non-terminal label and its positions in the linearized tree. At the generation time, the model constructs the semantic parse tree by recursively inserting the predicted non-terminal labels at the predicted positions until termination. RINE achieves state-of-the-art exact match accuracy on low- and high-resource versions of the conversational semantic parsing benchmark TOP (Gupta et al., 2018; Chen et al., 2020), outperforming strong sequence-to-sequence models and transition-based parsers. We also show that our model design is applicable to nested named entity recognition task, where it performs on par with state-of-the-art approach designed for that task. Finally, we demonstrate that our approach is 2-3.5 times faster than the sequence-to-sequence model at inference time.
Spectrograms visualize the frequency components of a given signal which may be an audio signal or even a time-series signal. Audio signals have higher sampling rate and high variability of frequency with time. Spectrograms can capture such variations well. But, vital signs which are time-series signals have less sampling frequency and low-frequency variability due to which, spectrograms fail to express variations and patterns. In this paper, we propose a novel solution to introduce frequency variability using frequency modulation on vital signs. Then we apply spectrograms on frequency modulated signals to capture the patterns. The proposed approach has been evaluated on 4 different medical datasets across both prediction and classification tasks. Significant results are found showing the efficacy of the approach for vital sign signals. The results from the proposed approach are promising with an accuracy of 91.55% and 91.67% in prediction and classification tasks respectively.