In this paper, we present a novel learning-based shared control framework. This framework deploys first-order Dynamical Systems (DS) as motion generators providing the desired reference motion, and a Variable Stiffness Dynamical Systems (VSDS) \cite{chen2021closed} for haptic guidance. We show how to shape several features of our controller in order to achieve authority allocation, local motion refinement, in addition to the inherent ability of the controller to automatically synchronize with the human state during joint task execution. We validate our approach in a teleoperated task scenario, where we also showcase the ability of our framework to deal with situations that require updating task knowledge due to possible changes in the task scenario, or changes in the environment. Finally, we conduct a user study to compare the performance of our VSDS controller for guidance generation to two state-of-the-art controllers in a target reaching task. The result shows that our VSDS controller has the highest successful rate of task execution among all conditions. Besides, our VSDS controller helps reduce the execution time and task load significantly, and was selected as the most favorable controller by participants.
Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD
Given a visual scene, humans have strong intuitions about how a scene can evolve over time under given actions. The intuition, often termed visual intuitive physics, is a critical ability that allows us to make effective plans to manipulate the scene to achieve desired outcomes without relying on extensive trial and error. In this paper, we present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids. Our method is composed of a conditional Neural Radiance Field (NeRF)-style visual frontend and a 3D point-based dynamics prediction backend, using which we can impose strong relational and structural inductive bias to capture the structure of the underlying environment. Unlike existing intuitive point-based dynamics works that rely on the supervision of dense point trajectory from simulators, we relax the requirements and only assume access to multi-view RGB images and (imperfect) instance masks acquired using color prior. This enables the proposed model to handle scenarios where accurate point estimation and tracking are hard or impossible. We generate datasets including three challenging scenarios involving fluid, granular materials, and rigid objects in the simulation. The datasets do not include any dense particle information so most previous 3D-based intuitive physics pipelines can barely deal with that. We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space. We also show that once trained, our model can achieve strong generalization in complex scenarios under extrapolate settings.
Due to the unfamiliarity to particular words(or proper nouns) for ingredients, non-native English speakers can be extremely confused about the ordering process in restaurants like Subway. Thus, We developed a dialogue system, which supports Chinese(Mandarin)1 and English2 at the same time. In other words, users can switch arbitrarily between Chinese(Mandarin) and English as the conversation is being conducted. This system is specifically designed for Subway ordering3. In BilDOS, we designed a Discriminator module to tell the language is being used in inputted user utterance, a Translator module to translate used language into English if it is not English, and a Dialogue Manager module to detect the intention within inputted user utterances, handle outlier inputs by throwing clarification requests, map detected Intention and detailed Keyword4 into a particular intention class, locate the current ordering process, continue to give queries to finish the order, conclude the order details once the order is completed, activate the evaluation process when the conversation is done.
Despite the recent advances of graph neural networks (GNNs) in modeling graph data, the training of GNNs on large datasets is notoriously hard due to the overfitting. Adversarial training, which augments data with the worst-case adversarial examples, has been widely demonstrated to improve model's robustness against adversarial attacks and generalization ability. However, while the previous adversarial training generally focuses on protecting GNNs from spiteful attacks, it remains unclear how the adversarial training could improve the generalization abilities of GNNs in the graph analytics problem. In this paper, we investigate GNNs from the lens of weight and feature loss landscapes, i.e., the loss changes with respect to model weights and node features, respectively. We draw the conclusion that GNNs are prone to falling into sharp local minima in these two loss landscapes, where GNNs possess poor generalization performances. To tackle this problem, we construct the co-adversarial perturbation (CAP) optimization problem in terms of weights and features, and design the alternating adversarial perturbation algorithm to flatten the weight and feature loss landscapes alternately. Furthermore, we divide the training process into two stages: one conducting the standard cross-entropy minimization to ensure the quick convergence of GNN models, the other applying our alternating adversarial training to avoid falling into locally sharp minima. The extensive experiments demonstrate our CAP can generally improve the generalization performance of GNNs on a variety of benchmark graph datasets.
This paper proposes a hypothesis for the aesthetic appreciation that aesthetic images make a neural network strengthen salient concepts and discard inessential concepts. In order to verify this hypothesis, we use multi-variate interactions to represent salient concepts and inessential concepts contained in images. Furthermore, we design a set of operations to revise images towards more beautiful ones. In experiments, we find that the revised images are more aesthetic than the original ones to some extent.
This paper proposes a set of criteria to evaluate the objectiveness of explanation methods of neural networks, which is crucial for the development of explainable AI, but it also presents significant challenges. The core challenge is that people usually cannot obtain ground-truth explanations of the neural network. To this end, we design four metrics to evaluate explanation results without ground-truth explanations. Our metrics can be broadly applied to nine benchmark methods of interpreting neural networks, which provides new insights of explanation methods.