We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker (M-TRACK) to guide the agent and monitor its progress. Specifically, we propose a milestone builder that tags the instructions with navigation and interaction milestones which the agent needs to complete step by step, and a milestone checker that systemically checks the agent's progress in its current milestone and determines when to proceed to the next. On the challenging ALFRED dataset, our M-TRACK leads to a notable 45% and 70% relative improvement in unseen success rate over two competitive base models.
The challenge of communication-efficient distributed optimization has attracted attention in recent years. In this paper, a communication efficient algorithm, called ordering-based alternating direction method of multipliers (OADMM) is devised in a general fully decentralized network setting where a worker can only exchange messages with neighbors. Compared to the classical ADMM, a key feature of OADMM is that transmissions are ordered among workers at each iteration such that a worker with the most informative data broadcasts its local variable to neighbors first, and neighbors who have not transmitted yet can update their local variables based on that received transmission. In OADMM, we prohibit workers from transmitting if their current local variables are not sufficiently different from their previously transmitted value. A variant of OADMM, called SOADMM, is proposed where transmissions are ordered but transmissions are never stopped for each node at each iteration. Numerical results demonstrate that given a targeted accuracy, OADMM can significantly reduce the number of communications compared to existing algorithms including ADMM. We also show numerically that SOADMM can accelerate convergence, resulting in communication savings compared to the classical ADMM.
A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger-scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.
The ability of a radar to discriminate in both range and Doppler velocity is completely characterized by the ambiguity function (AF) of its transmit waveform. Mathematically, it is obtained by correlating the waveform with its Doppler-shifted and delayed replicas. We consider the inverse problem of designing a radar transmit waveform that satisfies the specified AF magnitude. This process can be viewed as a signal reconstruction with some variation of phase retrieval methods. We provide a trust-region algorithm that minimizes a smoothed non-convex least-squares objective function to iteratively recover the underlying signal-of-interest for either time- or band-limited support. The method first approximates the signal using an iterative spectral algorithm and then refines the attained initialization based upon a sequence of gradient iterations. Our theoretical analysis shows that unique signal reconstruction is possible using signal samples no more than thrice the number of signal frequencies or time samples. Numerical experiments demonstrate that our method recovers both time- and band-limited signals from even sparsely and randomly sampled AFs with mean-square-error of $1\times 10^{-6}$ and $9\times 10^{-2}$ for the full noiseless samples and sparse noisy samples, respectively.
This paper addresses recovery of a kernel $\boldsymbol{h}\in \mathbb{C}^{n}$ and a signal $\boldsymbol{x}\in \mathbb{C}^{n}$ from the low-resolution phaseless measurements of their noisy circular convolution $\boldsymbol{y} = \left \rvert \boldsymbol{F}_{lo}( \boldsymbol{x}\circledast \boldsymbol{h}) \right \rvert^{2} + \boldsymbol{\eta}$, where $\boldsymbol{F}_{lo}\in \mathbb{C}^{m\times n}$ stands for a partial discrete Fourier transform ($m<n$), $\boldsymbol{\eta}$ models the noise, and $\lvert \cdot \rvert$ is the element-wise absolute value function. This problem is severely ill-posed because both the kernel and signal are unknown and, in addition, the measurements are phaseless, leading to many $\boldsymbol{x}$-$\boldsymbol{h}$ pairs that correspond to the measurements. Therefore, to guarantee a stable recovery of $\boldsymbol{x}$ and $\boldsymbol{h}$ from $\boldsymbol{y}$, we assume that the kernel $\boldsymbol{h}$ and the signal $\boldsymbol{x}$ lie in known subspaces of dimensions $k$ and $s$, respectively, such that $m\gg k+s$. We solve this problem by proposing a blind deconvolution algorithm for phaseless super-resolution (BliPhaSu) to minimize a non-convex least-squares objective function. The method first estimates a low-resolution version of both signals through a spectral algorithm, which are then refined based upon a sequence of stochastic gradient iterations. We show that our BliPhaSu algorithm converges linearly to a pair of true signals on expectation under a proper initialization that is based on spectral method. Numerical results from experimental data demonstrate perfect recovery of both $\boldsymbol{h}$ and $\boldsymbol{x}$ using our method.
Retrieving a signal from the Fourier transform of its third-order statistics or bispectrum arises in a wide range of signal processing problems. Conventional methods do not provide a unique inversion of bispectrum. In this paper, we present a an approach that uniquely recovers signals with finite spectral support (band-limited signals) from at least $3B$ measurements of its bispectrum function (BF), where $B$ is the signal's bandwidth. Our approach also extends to time-limited signals. We propose a two-step trust region algorithm that minimizes a non-convex objective function. First, we approximate the signal by a spectral algorithm. Then, we refine the attained initialization based upon a sequence of gradient iterations. Numerical experiments suggest that our proposed algorithm is able to estimate band/time-limited signals from its BF for both complete and undersampled observations.
We consider a general spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown at the receiver. In this \textit{dual-blind deconvolution} (DBD) problem, a common receiver admits the multi-carrier wireless communications signal that is overlaid with the radar signal reflected-off multiple targets. When the radar receiver is not collocated with the transmitter, such as in passive or multistatic radars, the transmitted signal is also unknown apart from the target parameters. Similarly, apart from the transmitted messages, the communications channel may also be unknown in dynamic environments such as vehicular networks. As a result, the estimation of unknown target and communications parameters in a DBD scenario is highly challenging. In this work, we exploit the sparsity of the channel to solve DBD by casting it as an atomic norm minimization problem. Our theoretical analyses and numerical experiments demonstrate perfect recovery of continuous-valued range-time and Doppler velocities of multiple targets as well as delay-Doppler communications channel parameters using uniformly-spaced time samples in the dual-blind receiver.
Robustness is key to engineering, automation, and science as a whole. However, the property of robustness is often underpinned by costly requirements such as over-provisioning, known uncertainty and predictive models, and known adversaries. These conditions are idealistic, and often not satisfiable. Resilience on the other hand is the capability to endure unexpected disruptions, to recover swiftly from negative events, and bounce back to normality. In this survey article, we analyze how resilience is achieved in networks of agents and multi-robot systems that are able to overcome adversity by leveraging system-wide complementarity, diversity, and redundancy - often involving a reconfiguration of robotic capabilities to provide some key ability that was not present in the system a priori. As society increasingly depends on connected automated systems to provide key infrastructure services (e.g., logistics, transport, and precision agriculture), providing the means to achieving resilient multi-robot systems is paramount. By enumerating the consequences of a system that is not resilient (fragile), we argue that resilience must become a central engineering design consideration. Towards this goal, the community needs to gain clarity on how it is defined, measured, and maintained. We address these questions across foundational robotics domains, spanning perception, control, planning, and learning. One of our key contributions is a formal taxonomy of approaches, which also helps us discuss the defining factors and stressors for a resilient system. Finally, this survey article gives insight as to how resilience may be achieved. Importantly, we highlight open problems that remain to be tackled in order to reap the benefits of resilient robotic systems.
In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addressing agent-level visual perception and feature learning, as well as swarm-level communication, local information aggregation and agent action inference, respectively. By jointly training the CNN and GNN, image features and communication messages are learned in conjunction to better address the specific task. We use imitation learning to train the VGAI controller in an offline phase, relying on a centralized expert controller. This results in a learned VGAI controller that can be deployed in a distributed manner for online execution. Additionally, the controller exhibits good scaling properties, with training in smaller teams and application in larger teams. Through a multi-agent flocking application, we demonstrate that VGAI yields performance comparable to or better than other decentralized controllers, using only the visual input modality and without accessing precise location or motion state information.
Graph neural networks (GNNs) are processing architectures that exploit graph structural information to model representations from network data. Despite their success, GNNs suffer from sub-optimal generalization performance given limited training data, referred to as over-fitting. This paper proposes Topology Adaptive Edge Dropping (TADropEdge) method as an adaptive data augmentation technique to improve generalization performance and learn robust GNN models. We start by explicitly analyzing how random edge dropping increases the data diversity during training, while indicating i.i.d. edge dropping does not account for graph structural information and could result in noisy augmented data degrading performance. To overcome this issue, we consider graph connectivity as the key property that captures graph topology. TADropEdge incorporates this factor into random edge dropping such that the edge-dropped subgraphs maintain similar topology as the underlying graph, yielding more satisfactory data augmentation. In particular, TADropEdge first leverages the graph spectrum to assign proper weights to graph edges, which represent their criticality for establishing the graph connectivity. It then normalizes the edge weights and drops graph edges adaptively based on their normalized weights. Besides improving generalization performance, TADropEdge reduces variance for efficient training and can be applied as a generic method modular to different GNN models. Intensive experiments on real-life and synthetic datasets corroborate theory and verify the effectiveness of the proposed method.