Orbital angular momentum (OAM) radars are able to estimate the azimuth angle and the rotation velocity of multiple targets without relative motion or beam scanning. Moreover, OAM wireless communications can achieve high spectral efficiency (SE) by utilizing a set of information-bearing modes on the same frequency channel. Benefitting from the above advantages, in this paper, we design a novel radar-centric joint OAM radar-communication (RadCom) scheme based on uniform circular arrays (UCAs), which modulates information signals on the existing OAM radar waveform. In details, we first propose an OAM-based three-dimensional (3-D) super-resolution position estimation and rotation velocity detection method, which can accurately estimate the 3-D position and rotation velocity of multiple targets. Then, we derive the posterior Cramer-Rao bound (PCRB) of the OAM-based estimates and, finally, we analyze the transmission rate of the integrated communication system. To achieve the best trade-off between imaging and communication, the transmitted integrated OAM beams are optimized by means of an exhaustive search method. Both mathematical analysis and simulation results show that the proposed radar-centric joint OAM RadCom scheme can accurately estimate the 3-D position and rotation velocity of multiple targets while ensuring the transmission rate of the communication receiver, which can be regarded as an effective supplement to existing joint RadCom schemes.
The need to increase the flexibility of production lines is calling for robots to collaborate with human workers. However, existing interactive industrial robots only guarantee intrinsic safety (reduce collision impact), but not interactive safety (collision avoidance), which greatly limited their flexibility. The issue arises from two limitations in existing control software for industrial robots: 1) lack of support for real-time trajectory modification; 2) lack of intelligent safe control algorithms with guaranteed collision avoidance under robot dynamics constraints. To address the first issue, a jerk-bounded position controller (JPC) was developed previously. This paper addresses the second limitation, on top of the JPC. Specifically, we introduce a jerk-based safe set algorithm (JSSA) to ensure collision avoidance while considering the robot dynamics constraints. The JSSA greatly extends the scope of the original safe set algorithm, which has only been applied for second-order systems with unbounded accelerations. The JSSA is implemented on the FANUC LR Mate 200id/7L robot and validated with HRI tasks. Experiments show that the JSSA can consistently keep the robot at a safe distance from the human while executing the designated task.
For robots to be effectively deployed in novel environments and tasks, they must be able to understand the feedback expressed by humans during intervention. This can either correct undesirable behavior or indicate additional preferences. Existing methods either require repeated episodes of interactions or assume prior known reward features, which is data-inefficient and can hardly transfer to new tasks. We relax these assumptions by describing human tasks in terms of object-centric sub-tasks and interpreting physical interventions in relation to specific objects. Our method, Object Preference Adaptation (OPA), is composed of two key stages: 1) pre-training a base policy to produce a wide variety of behaviors, and 2) online-updating only certain weights in the model according to human feedback. The key to our fast, yet simple adaptation is that general interaction dynamics between agents and objects are fixed, and only object-specific preferences are updated. Our adaptation occurs online, requires only one human intervention (one-shot), and produces new behaviors never seen during training. Trained on cheap synthetic data instead of expensive human demonstrations, our policy demonstrates impressive adaptation to human perturbations on challenging, realistic tasks in our user study. Videos, code, and supplementary material provided.
Delicate industrial insertion tasks (e.g., PC board assembly) remain challenging for industrial robots. The challenges include low error tolerance, delicacy of the components, and large task variations with respect to the components to be inserted. To deliver a feasible robotic solution for these insertion tasks, we also need to account for hardware limits of existing robotic systems and minimize the integration effort. This paper proposes a composable framework for efficient integration of a safe insertion policy on existing robotic platforms to accomplish these insertion tasks. The policy has an interpretable modularized design and can be learned efficiently on hardware and transferred to new tasks easily. In particular, the policy includes a safe insertion agent as a baseline policy for insertion, an optimal configurable Cartesian tracker as an interface to robot hardware, a probabilistic inference module to handle component variety and insertion errors, and a safe learning module to optimize the parameters in the aforementioned modules to achieve the best performance on designated hardware. The experiment results on a UR10 robot show that the proposed framework achieves safety (for the delicacy of components), accuracy (for low tolerance), robustness (against perception error and component defection), adaptability and transferability (for task variations), as well as task efficiency during execution plus data and time efficiency during learning.
In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline, which includes material acquisition, ray tracing based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able to generate depth maps with material-dependent error patterns similar to a real depth sensor. We conduct extensive experiments to show that perception algorithms and reinforcement learning policies trained in our simulation platform could transfer well to real world test cases without any fine-tuning. Furthermore, due to the high degree of realism of this simulation, our depth sensor simulator can be used as a convenient testbed to evaluate the algorithm performance in the real world, which will largely reduce the human effort in developing robotic algorithms. The entire pipeline has been integrated into the SAPIEN simulator and is open-sourced to promote the research of vision and robotics communities.
In digital cameras, we find a major limitation: the image and video form inherited from a film camera obstructs it from capturing the rapidly changing photonic world. Here, we present vidar, a bit sequence array where each bit represents whether the accumulation of photons has reached a threshold, to record and reconstruct the scene radiance at any moment. By employing only consumer-level CMOS sensors and integrated circuits, we have developed a vidar camera that is 1,000x faster than conventional cameras. By treating vidar as spike trains in biological vision, we have further developed a spiking neural network-based machine vision system that combines the speed of the machine and the mechanism of biological vision, achieving high-speed object detection and tracking 1,000x faster than human vision. We demonstrate the utility of the vidar camera and the super vision system in an assistant referee and target pointing system. Our study is expected to fundamentally revolutionize the image and video concepts and related industries, including photography, movies, and visual media, and to unseal a new spiking neural network-enabled speed-free machine vision era.
We introduce a novel masked graph autoencoder (MGAE) framework to perform effective learning on graph structure data. Taking insights from self-supervised learning, we randomly mask a large proportion of edges and try to reconstruct these missing edges during training. MGAE has two core designs. First, we find that masking a high ratio of the input graph structure, e.g., $70\%$, yields a nontrivial and meaningful self-supervisory task that benefits downstream applications. Second, we employ a graph neural network (GNN) as an encoder to perform message propagation on the partially-masked graph. To reconstruct the large number of masked edges, a tailored cross-correlation decoder is proposed. It could capture the cross-correlation between the head and tail nodes of anchor edge in multi-granularity. Coupling these two designs enables MGAE to be trained efficiently and effectively. Extensive experiments on multiple open datasets (Planetoid and OGB benchmarks) demonstrate that MGAE generally performs better than state-of-the-art unsupervised learning competitors on link prediction and node classification.
We study real-time collaborative robot (cobot) handling, where the cobot maneuvers a workpiece under human commands. This is useful when it is risky for humans to directly handle the workpiece. However, it is hard to make the cobot both easy to command and flexible in possible operations. In this work, we propose a Real-Time Collaborative Robot Handling (RTCoHand) framework that allows the control of cobot via user-customized dynamic gestures. This is hard due to variations among users, human motion uncertainties, and noisy human input. We model the task as a probabilistic generative process, referred to as Conditional Collaborative Handling Process (CCHP), and learn from human-human collaboration. We thoroughly evaluate the adaptability and robustness of CCHP and apply our approach to a real-time cobot handling task with Kinova Gen3 robot arm. We achieve seamless human-robot collaboration with both experienced and new users. Compared to classical controllers, RTCoHand allows significantly more complex maneuvers and lower user cognitive burden. It also eliminates the need for trial-and-error, rendering it advantageous in safety-critical tasks.
Traditional depth sensors generate accurate real world depth estimates that surpass even the most advanced learning approaches trained only on simulation domains. Since ground truth depth is readily available in the simulation domain but quite difficult to obtain in the real domain, we propose a method that leverages the best of both worlds. In this paper we present a new framework, ActiveZero, which is a mixed domain learning solution for active stereovision systems that requires no real world depth annotation. First, we demonstrate the transferability of our method to out-of-distribution real data by using a mixed domain learning strategy. In the simulation domain, we use a combination of supervised disparity loss and self-supervised losses on a shape primitives dataset. By contrast, in the real domain, we only use self-supervised losses on a dataset that is out-of-distribution from either training simulation data or test real data. Second, our method introduces a novel self-supervised loss called temporal IR reprojection to increase the robustness and accuracy of our reprojections in hard-to-perceive regions. Finally, we show how the method can be trained end-to-end and that each module is important for attaining the end result. Extensive qualitative and quantitative evaluations on real data demonstrate state of the art results that can even beat a commercial depth sensor.
Orbital angular momentum (OAM) at radio frequency (RF) has attracted more and more attention as a novel approach of multiplexing a set of orthogonal OAM modes on the same frequency channel to achieve high spectral efficiency (SE). However, the precondition for maintaining the orthogonality among different OAM modes is perfect alignment of the transmit and receive uniform circular arrays (UCAs), which is difficult to be satisfied in practical wireless communication scenario. Therefore, to achieve available multi-mode OAM broadband wireless communication, we first investigate the effect of oblique angles on the transmission performance of the multi-mode OAM broadband system in the non-parallel misalignment case. Then, we compare the UCA-based RF analog and baseband digital transceiver structures and corresponding beam steering schemes. Mathematical analysis and numerical simulations validate that the SE of the misaligned multi-mode OAM broadband system is quite low, while analog and digital beam steering both can significantly improve the SE of the system. However, digital beam steering can obtain higher SE than analog beam steering especially when the bandwidth and the number of array elements are large, which validates that baseband digital transceiver with digital beam steering is more suitable for multi-mode OAM broadband wireless communication systems in practice.