Advanced service robots require superior tactile intelligence to guarantee human-contact safety and to provide essential supplements to visual and auditory information for human-robot interaction, especially when a robot is in physical contact with a human. Tactile intelligence is an essential capability of perception and recognition from tactile information, based on the learning from a large amount of tactile data and the understanding of the physical meaning behind the data. This report introduces a recently collected and organized dataset "TacAct" that encloses real-time pressure distribution when a human subject touches the arms of a nursing-care robot. The dataset consists of information from 50 subjects who performed a total of 24,000 touch actions. Furthermore, the details of the dataset are described, the data are preliminarily analyzed, and the validity of the collected information is tested through a convolutional neural network LeNet-5 classifying different types of touch actions. We believe that the TacAct dataset would be more than beneficial for the community of human interactive robots to understand the tactile profile under various circumstances.
Human touching the robot to convey intentions or emotions is an essential communication pathway during physical Human-Robot Interaction (pHRI). Therefore, advanced service robots require superior tactile intelligence to guarantee naturalness and safety when making physical contact with human subjects. Tactile intelligence is the capability to percept and recognize tactile information from touch behaviors, in which understanding the physical meaning of touching actions is crucial. For this purpose, this report introduces a recently collected and organized dataset "TacAct" that encloses real-time tactile information when human subjects touched the test device mimicking a robot forearm. The dataset contains 12 types of 24,000 touch actions from 50 subjects. The dataset details are described, the data are preliminarily analyzed, and the validity of the dataset is tested through a convolutional neural network LeNet-5 which classifying different types of touch actions. We believe that the TacAct dataset would be beneficial for the community to understand the touch intention under various circumstances and to develop learning-based intelligent algorithms for different applications.
Grasping in cluttered scenes has always been a great challenge for robots, due to the requirement of the ability to well understand the scene and object information. Previous works usually assume that the geometry information of the objects is available, or utilize a step-wise, multi-stage strategy to predict the feasible 6-DoF grasp poses. In this work, we propose to formalize the 6-DoF grasp pose estimation as a simultaneous multi-task learning problem. In a unified framework, we jointly predict the feasible 6-DoF grasp poses, instance semantic segmentation, and collision information. The whole framework is jointly optimized and end-to-end differentiable. Our model is evaluated on large-scale benchmarks as well as the real robot system. On the public dataset, our method outperforms prior state-of-the-art methods by a large margin (+4.08 AP). We also demonstrate the implementation of our model on a real robotic platform and show that the robot can accurately grasp target objects in cluttered scenarios with a high success rate. Project link: https://openbyterobotics.github.io/sscl
We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis (NVS) task with multi-view images as input. Existing neural scene representations for solving the NVS problem, such as NeRF, cannot generalize to new scenes and take an excessively long time on training on each new scene from scratch. The other subsequent neural rendering methods based on stereo matching, such as PixelNeRF, SRF and IBRNet are designed to generalize to unseen scenes but suffer from view inconsistency in complex scenes with self-occlusions. To address these issues, our NeuRay method represents every scene by encoding the visibility of rays associated with the input views. This neural representation can efficiently be initialized from depths estimated by external MVS methods, which is able to generalize to new scenes and achieves satisfactory rendering images without any training on the scene. Then, the initialized NeuRay can be further optimized on every scene with little training timing to enforce spatial coherence to ensure view consistency in the presence of severe self-occlusion. Experiments demonstrate that NeuRay can quickly generate high-quality novel view images of unseen scenes with little finetuning and can handle complex scenes with severe self-occlusions which previous methods struggle with.
In real-world applications, data often come in a growing manner, where the data volume and the number of classes may increase dynamically. This will bring a critical challenge for learning: given the increasing data volume or the number of classes, one has to instantaneously adjust the neural model capacity to obtain promising performance. Existing methods either ignore the growing nature of data or seek to independently search an optimal architecture for a given dataset, and thus are incapable of promptly adjusting the architectures for the changed data. To address this, we present a neural architecture adaptation method, namely Adaptation eXpert (AdaXpert), to efficiently adjust previous architectures on the growing data. Specifically, we introduce an architecture adjuster to generate a suitable architecture for each data snapshot, based on the previous architecture and the different extent between current and previous data distributions. Furthermore, we propose an adaptation condition to determine the necessity of adjustment, thereby avoiding unnecessary and time-consuming adjustments. Extensive experiments on two growth scenarios (increasing data volume and number of classes) demonstrate the effectiveness of the proposed method.
In recent decades, with the emergence of numerous novel intelligent optimization algorithms, many optimization researchers have begun to look for a basic search mechanism for their schemes that provides a more essential explanation of their studies. This paper aims to study the basic mechanism of an algorithm for black-box optimization with quantum theory. To achieve this goal, the Schroedinger equation is employed to establish the relationship between the optimization problem and the quantum system, which makes it possible to study the dynamic search behaviors in the evolution process with quantum theory. Moreover, to explore the basic behavior of the optimization system, the optimization problem is assumed to be decomposed and approximated. Then, a multilevel approximation quantum dynamics model of the optimization algorithm is established, which provides a mathematical and physical framework for the analysis of the optimization algorithm. Correspondingly, the basic search behavior based on this model is derived, which is governed by quantum theory. Comparison experiments and analysis between different bare-bones algorithms confirm the existence of the quantum mechanic based basic search mechanism of the algorithm on black-box optimization.
We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion.
High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR representations is typically ignored in previous Neural Architecture Search (NAS) methods that focus on image classification. This work proposes a novel NAS method, called HR-NAS, which is able to find efficient and accurate networks for different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In HR-NAS, we renovate the NAS search space as well as its searching strategy. To better encode multiscale image contexts in the search space of HR-NAS, we first carefully design a lightweight transformer, whose computational complexity can be dynamically changed with respect to different objective functions and computation budgets. To maintain high-resolution representations of the learned networks, HR-NAS adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions, inspired by HRNet. Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources. HR-NAS is capable of achieving state-of-the-art trade-offs between performance and FLOPs for three dense prediction tasks and an image classification task, given only small computational budgets. For example, HR-NAS surpasses SqueezeNAS that is specially designed for semantic segmentation while improving efficiency by 45.9%. Code is available at https://github.com/dingmyu/HR-NAS
The ever-increasing demand for intelligent, automated, and connected mobility solutions pushes for the development of an innovative sixth Generation (6G) of cellular networks. A radical transformation on the physical layer of vehicular communications is planned, with a paradigm shift towards beam-based millimeter Waves or sub-Terahertz communications, which require precise beam pointing for guaranteeing the communication link, especially in high mobility. A key design aspect is a fast and proactive Initial Access (IA) algorithm to select the optimal beam to be used. In this work, we investigate alternative IA techniques to fasten the current fifth-generation (5G) standard, targeting an efficient 6G design. First, we discuss cooperative position-based schemes that rely on the position information. Then, motivated by the intuition of a non-uniform distribution of the communication directions due to road topology constraints, we design two Probabilistic Codebook (PCB) techniques of prioritized beams. In the first one, the PCBs are built leveraging past collected traffic information, while in the second one, we use the Hough Transform over the digital map to extract dominant road directions. We also show that the information coming from the angular probability distribution allows designing non-uniform codebook quantization, reducing the degradation of the performances compared to uniform one. Numerical simulation on realistic scenarios shows that PCBs-based beam selection outperforms the 5G standard in terms of the number of IA trials, with a performance comparable to position-based methods, without requiring the signaling of sensitive information.
Generative Adversarial Networks (GAN) have promoted a variety of applications in computer vision, natural language processing, etc. due to its generative model's compelling ability to generate realistic examples plausibly drawn from an existing distribution of samples. GAN not only provides impressive performance on data generation-based tasks but also stimulates fertilization for privacy and security oriented research because of its game theoretic optimization strategy. Unfortunately, there are no comprehensive surveys on GAN in privacy and security, which motivates this survey paper to summarize those state-of-the-art works systematically. The existing works are classified into proper categories based on privacy and security functions, and this survey paper conducts a comprehensive analysis of their advantages and drawbacks. Considering that GAN in privacy and security is still at a very initial stage and has imposed unique challenges that are yet to be well addressed, this paper also sheds light on some potential privacy and security applications with GAN and elaborates on some future research directions.