We often use perturbations to regularize neural models. For neural encoder-decoders, previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial perturbations (Sato et al., 2019) as perturbations but these methods require considerable computational time. Thus, this study addresses the question of whether these approaches are efficient enough for training time. We compare several perturbations in sequence-to-sequence problems with respect to computational time. Experimental results show that the simple techniques such as word dropout (Gal and Ghahramani, 2016) and random replacement of input tokens achieve comparable (or better) scores to the recently proposed perturbations, even though these simple methods are faster. Our code is publicly available at https://github.com/takase/rethink_perturbations.
We introduce a new image editing and synthesis framework, Stochastic Differential Editing (SDEdit), based on a recent generative model using stochastic differential equations (SDEs). Given an input image with user edits (e.g., hand-drawn color strokes), we first add noise to the input according to an SDE, and subsequently denoise it by simulating the reverse SDE to gradually increase its likelihood under the prior. Our method does not require task-specific loss function designs, which are critical components for recent image editing methods based on GAN inversion. Compared to conditional GANs, we do not need to collect new datasets of original and edited images for new applications. Therefore, our method can quickly adapt to various editing tasks at test time without re-training models. Our approach achieves strong performance on a wide range of applications, including image synthesis and editing guided by stroke paintings and image compositing.
From the simple measurement of tissue attributes in pathology workflow to designing an explainable diagnostic/prognostic AI tool, access to accurate semantic segmentation of tissue regions in histology images is a prerequisite. However, delineating different tissue regions manually is a laborious, time-consuming and costly task that requires expert knowledge. On the other hand, the state-of-the-art automatic deep learning models for semantic segmentation require lots of annotated training data and there are only a limited number of tissue region annotated images publicly available. To obviate this issue in computational pathology projects and collect large-scale region annotations efficiently, we propose an efficient interactive segmentation network that requires minimum input from the user to accurately annotate different tissue types in the histology image. The user is only required to draw a simple squiggle inside each region of interest so it will be used as the guiding signal for the model. To deal with the complex appearance and amorph geometry of different tissue regions we introduce several automatic and minimalistic guiding signal generation techniques that help the model to become robust against the variation in the user input. By experimenting on a dataset of breast cancer images, we show that not only does our proposed method speed up the interactive annotation process, it can also outperform the existing automatic and interactive region segmentation models.
Legal interpretation is a linguistic venture. In judicial opinions, for example, courts are often asked to interpret the text of statutes and legislation. As time has shown, this is not always as easy as it sounds. Matters can hinge on vague or inconsistent language and, under the surface, human biases can impact the decision-making of judges. This raises an important question: what if there was a method of extracting the meaning of statutes consistently? That is, what if it were possible to use machines to encode legislation in a mathematically precise form that would permit clearer responses to legal questions? This article attempts to unpack the notion of machine-readability, providing an overview of both its historical and recent developments. The paper will reflect on logic syntax and symbolic language to assess the capacity and limits of representing legal knowledge. In doing so, the paper seeks to move beyond existing literature to discuss the implications of various approaches to machine-readable legislation. Importantly, this study hopes to highlight the challenges encountered in this burgeoning ecosystem of machine-readable legislation against existing human-readable counterparts.
Industrial cyber-physical systems (CPS) have gained enormous attention of manufacturers in recent years due to their automation and cost reduction capabilities in the fourth industrial revolution (Industry 4.0). Such an industrial network of connected cyber and physical components may consist of highly expensive components such as robots. In order to provide efficient communication in such a network, it is imperative to improve the Quality-of-Service (QoS). Software Defined Networking (SDN) has become a key technology in realizing QoS concepts in a dynamic fashion by allowing a centralized controller to program each flow with a unified interface. However, state-of-the-art solutions do not effectively use the centralized visibility of SDN to fulfill QoS requirements of such industrial networks. In this paper, we propose an SDN-based routing mechanism which attempts to improve QoS in robotic cyber-physical systems which have hard real-time requirements. We exploit the SDN capabilities to dynamically select paths based on current link parameters in order to improve the QoS in such delay-constrained networks. We verify the efficiency of the proposed approach on a realistic industrial OpenFlow topology. Our experiments reveal that the proposed approach significantly outperforms an existing delay-based routing mechanism in terms of average throughput, end-to-end delay and jitter. The proposed solution would prove to be significant for the industrial applications in robotic cyber-physical systems.
This manuscript presents a model of compliant contacts for time-critical humanoid robot motion control. The proposed model considers the environment as a continuum of spring-damper systems, which allows us to compute the equivalent contact force and torque that the environment exerts on the contact surface. We show that the proposed model extends the linear and rotational springs and dampers - classically used to characterize soft terrains - to the case of large contact surface orientations. The contact model is then used for the real-time whole-body control of humanoid robots walking on visco-elastic environments. The overall approach is validated by simulating walking motions of the iCub humanoid robot. Furthermore, the paper compares the proposed whole-body control strategy and state of the art approaches. In this respect, we investigate the terrain compliance that makes the classical approaches assuming rigid contacts fail. We finally analyze the robustness of the presented control design with respect to non-parametric uncertainty in the contact-model.
Lifelong Learning (LL) refers to the ability to continually learn and solve new problems with incremental available information over time while retaining previous knowledge. Much attention has been given lately to Supervised Lifelong Learning (SLL) with a stream of labelled data. In contrast, we focus on resolving challenges in Unsupervised Lifelong Learning (ULL) with streaming unlabelled data when the data distribution and the unknown class labels evolve over time. Bayesian framework is natural to incorporate past knowledge and sequentially update the belief with new data. We develop a fully Bayesian inference framework for ULL with a novel end-to-end Deep Bayesian Unsupervised Lifelong Learning (DBULL) algorithm, which can progressively discover new clusters without forgetting the past with unlabelled data while learning latent representations. To efficiently maintain past knowledge, we develop a novel knowledge preservation mechanism via sufficient statistics of the latent representation for raw data. To detect the potential new clusters on the fly, we develop an automatic cluster discovery and redundancy removal strategy in our inference inspired by Nonparametric Bayesian statistics techniques. We demonstrate the effectiveness of our approach using image and text corpora benchmark datasets in both LL and batch settings.
A resilient and robust positioning, navigation, and timing (PNT) system is a necessity for the navigation of autonomous vehicles (AVs). Global Navigation Satelite System (GNSS) provides satellite-based PNT services. However, a spoofer can temper an authentic GNSS signal and could transmit wrong position information to an AV. Therefore, a GNSS must have the capability of real-time detection and feedback-correction of spoofing attacks related to PNT receivers, whereby it will help the end-user (autonomous vehicle in this case) to navigate safely if it falls into any compromises. This paper aims to develop a deep reinforcement learning (RL)-based turn-by-turn spoofing attack detection using low-cost in-vehicle sensor data. We have utilized Honda Driving Dataset to create attack and non-attack datasets, develop a deep RL model, and evaluate the performance of the RL-based attack detection model. We find that the accuracy of the RL model ranges from 99.99% to 100%, and the recall value is 100%. However, the precision ranges from 93.44% to 100%, and the f1 score ranges from 96.61% to 100%. Overall, the analyses reveal that the RL model is effective in turn-by-turn spoofing attack detection.
Steer-by-Wire (SBW) systems are being adapted widely in semi-autonomous and fully autonomous vehicles. The main control challenge in a SBW system is to follow the steering commands in the face of parametric uncertainties, external disturbances and actuator delay; crucially, perturbations in inertial parameters and damping forces give rise to state-dependent uncertainties, which cannot be bounded a priori by a constant. However, the state-of-the-art control methods of SBW system rely on a priori bounded uncertainties, and thus, become inapplicable when state-dependent dynamics become unknown. In addition, ensuring tracking accuracy under actuator delay is always a challenging task. This work proposes two control frameworks to overcome these challenges. Firstly, an adaptive controller is proposed to tackle the state-dependent uncertainties and external disturbances in a typical SBW system without any a priori knowledge of their structures and of their bounds. The stability of the closed-loop system is studied analytically via uniformly ultimately bounded notion and the effectiveness of the proposed solution is verified via simulations against the state-of-the-art solution. While this proposed controller handles the uncertainties and external perturbations, it does not consider the actuator delay which sometimes result in decreased accuracy. Therefore, a new adaptive-robust control framework is devised to tackle the same control problem of an SBW system under the influence of time-varying input delay. In comparison to the existing strategies, the proposed framework removes the conservative assumption of a priori bounded uncertainty and, in addition, the Razumikhin theorem based stability analysis allows the proposed scheme to deal with arbitrary variation in input delay. The effectiveness of the both controllers is proved using comparative simulation studies.
Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources which could defeat the cost-saving advantages of automatic document image classification. In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document. We have rigorously benchmarked our proposed algorithm against several state-of-art vision and language models on both publicly available dataset and a real-life insurance document classification dataset. Empirical results on both publicly available and real-world data show that our methods achieve near SOTA performance yet require much less computing resources and time for model training and inference. This results in solutions than offer better cost advantages, especially in scalable deployment for enterprise applications. The results showed that our algorithm can achieve classification performance quite close to SOTA. We also provide comprehensive comparisons of computing resources, model sizes, train and inference time between our proposed methods and baselines. In addition we delineate the cost per image using our method and other baselines.