Radio frequency interference (RFI) mitigation and radar echo recovery are critically important for the proper functioning of ultra-wideband (UWB) radar systems using one-bit sampling techniques. We recently introduced a technique for one-bit UWB radar, which first uses a majorization-minimization method for RFI parameter estimation followed by a sparse method for radar echo recovery. However, this technique suffers from high computational complexity due to the need to estimate the parameters of each RFI source separately and iteratively. In this paper, we present a computationally efficient joint RFI mitigation and radar echo recovery framework to greatly reduce the computational cost. Specifically, we exploit the sparsity of RFI in the fast-frequency domain and the sparsity of radar echoes in the fast-time domain to design a one-bit weighted SPICE (SParse Iterative Covariance-based Estimation) based framework for the joint RFI mitigation and radar echo recovery of one-bit UWB radar. Both simulated and experimental results are presented to show that the proposed one-bit weighted SPICE framework can not only reduce the computational cost but also outperform the existing approach for decoupled RFI mitigation and radar echo recovery of one-bit UWB radar.
The outbreak of novel coronavirus pneumonia (COVID-19) has caused mortality and morbidity worldwide. Oropharyngeal-swab (OP-swab) sampling is widely used for the diagnosis of COVID-19 in the world. To avoid the clinical staff from being affected by the virus, we developed a 9-degree-of-freedom (DOF) rigid-flexible coupling (RFC) robot to assist the COVID-19 OP-swab sampling. This robot is composed of a visual system, UR5 robot arm, micro-pneumatic actuator and force-sensing system. The robot is expected to reduce risk and free up the clinical staff from the long-term repetitive sampling work. Compared with a rigid sampling robot, the developed force-sensing RFC robot can facilitate OP-swab sampling procedures in a safer and softer way. In addition, a varying-parameter zeroing neural network-based optimization method is also proposed for motion planning of the 9-DOF redundant manipulator. The developed robot system is validated by OP-swab sampling on both oral cavity phantoms and volunteers.
Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL). However, existing auxiliary tasks do not take the characteristics of RL problems into consideration and are unsupervised. By leveraging returns, the most important feedback signals in RL, we propose a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns. Our auxiliary loss is theoretically justified to learn representations that capture the structure of a new form of state-action abstraction, under which state-action pairs with similar return distributions are aggregated together. In low data regime, our algorithm outperforms strong baselines on complex tasks in Atari games and DeepMind Control suite, and achieves even better performance when combined with existing auxiliary tasks.
Radio frequency interference (RFI) mitigation is critical to the proper operation of ultra-wideband (UWB) radar systems since RFI can severely degrade the radar imaging capability and target detection performance. In this paper, we address the RFI mitigation problem for one-bit UWB radar systems. A one-bit UWB system obtains its signed measurements via a low-cost and high rate sampling scheme, referred to as the Continuous Time Binary Value (CTBV) technology. This sampling strategy compares the signal to a known threshold varying with slow-time and therefore can be used to achieve a rather high sampling rate and quantization resolution with rather simple and affordable hardware. This paper establishes a proper data model for the RFI sources and proposes a novel RFI mitigation method for the one-bit UWB radar system that uses the CTBV sampling technique. Specifically, we first model the RFI sources as a sum of sinusoids with frequencies fixed during the coherent processing interval (CPI) and we exploit the sparsity of the RFI spectrum. We extend a majorization-minimization based 1bRELAX algorithm, referred to as 1bMMRELAX, to estimate the RFI source parameters from the signed measurements obtained by using the CTBV sampling strategy. We also devise a new fast frequency initialization method based on the Alternating Direction Method of Multipliers (ADMM) methodology for the extended 1bMMRELAX algorithm to significantly improve its computational efficiency. Moreover, an ADMM-based sparse method is introduced to recover the desired radar echoes using the estimated RFI parameters. Both simulated and experimental results are presented to demonstrate that our proposed algorithm outperforms the existing digital integration method, especially for severe RFI cases.
With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase can be time consuming due to the need to wait for \textit{stragglers}, i.e., slower workers. An efficient way to mitigate this effect is to let each worker wait only for updates from the fastest neighbors before updating its local parameter. The remaining neighbors are called \textit{backup workers.} To minimize the globally training time over the network, we propose a fully distributed algorithm to dynamically determine the number of backup workers for each worker. We show that our algorithm achieves a linear speedup for convergence (i.e., convergence performance increases linearly with respect to the number of workers). We conduct extensive experiments on MNIST and CIFAR-10 to verify our theoretical results.
Integer quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation, using reduced integer precision. It plays a significant role in the efficient deployment and execution of machine learning (ML) systems, reducing memory consumption and leveraging typically faster computations. In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies, which themselves are the foundation of many production ML systems. Our quantization strategy is accurate (e.g. works well with quantization post-training), efficient and fast to execute (utilizing 8 bit integer weights and mostly 8 bit activations), and is able to target a variety of hardware (by leveraging instructions sets available in common CPU architectures, as well as available neural accelerators).
We consider the problem of service placement at the network edge, in which a decision maker has to choose between $N$ services to host at the edge to satisfy the demands of customers. Our goal is to design adaptive algorithms to minimize the average service delivery latency for customers. We pose the problem as a Markov decision process (MDP) in which the system state is given by describing, for each service, the number of customers that are currently waiting at the edge to obtain the service. However, solving this $N$-services MDP is computationally expensive due to the curse of dimensionality. To overcome this challenge, we show that the optimal policy for a single-service MDP has an appealing threshold structure, and derive explicitly the Whittle indices for each service as a function of the number of requests from customers based on the theory of Whittle index policy. Since request arrival and service delivery rates are usually unknown and possibly time-varying, we then develop efficient learning augmented algorithms that fully utilize the structure of optimal policies with a low learning regret. The first of these is UCB-Whittle, and relies upon the principle of optimism in the face of uncertainty. The second algorithm, Q-learning-Whittle, utilizes Q-learning iterations for each service by using a two time scale stochastic approximation. We characterize the non-asymptotic performance of UCB-Whittle by analyzing its learning regret, and also analyze the convergence properties of Q-learning-Whittle. Simulation results show that the proposed policies yield excellent empirical performance.
While many distributed optimization algorithms have been proposed for solving smooth or convex problems over the networks, few of them can handle non-convex and non-smooth problems. Based on a proximal primal-dual approach, this paper presents a new (stochastic) distributed algorithm with Nesterov momentum for accelerated optimization of non-convex and non-smooth problems. Theoretically, we show that the proposed algorithm can achieve an $\epsilon$-stationary solution under a constant step size with $\mathcal{O}(1/\epsilon^2)$ computation complexity and $\mathcal{O}(1/\epsilon)$ communication complexity. When compared to the existing gradient tracking based methods, the proposed algorithm has the same order of computation complexity but lower order of communication complexity. To the best of our knowledge, the presented result is the first stochastic algorithm with the $\mathcal{O}(1/\epsilon)$ communication complexity for non-convex and non-smooth problems. Numerical experiments for a distributed non-convex regression problem and a deep neural network based classification problem are presented to illustrate the effectiveness of the proposed algorithms.
Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100---1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.