Abstract:The recent surge of interest in agile humanoid robots achieving dynamic tasks like jumping and flipping necessitates the concurrent design of a robot platform that combines exceptional hardware performance with effective control algorithms. This paper introduces the NING Humanoid, an agile and robust platform aimed at achieving human-like athletic capabilities. The NING humanoid features high-torque actuators, a resilient mechanical co-design based on the Centroidal dynamics, and a whole-body model predictive control (WB-MPC) framework. It stands at 1.1 meters tall and weighs 20 kg with 18 degrees of freedom (DOFs). It demonstrates impressive abilities such as walking, push recovery, and stair climbing at a high control bandwidth. Our presentation will encompass a hardware co-design, the control framework, as well as simulation and real-time experiments.
Abstract:Variational inference is a technique that approximates a target distribution by optimizing within the parameter space of variational families. On the other hand, Wasserstein gradient flows describe optimization within the space of probability measures where they do not necessarily admit a parametric density function. In this paper, we bridge the gap between these two methods. We demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow where its forward Euler scheme is the standard black-box variational inference algorithm. Specifically, the vector field of the gradient flow is generated via the path-derivative gradient estimator. We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow. Distillations can be extended to encompass $f$-divergences and non-Gaussian variational families. This extension yields a new gradient estimator for $f$-divergences, readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.
Abstract:Emotion detection presents challenges to intelligent human-robot interaction (HRI). Foundational deep learning techniques used in emotion detection are limited by information-constrained datasets or models that lack the necessary complexity to learn interactions between input data elements, such as the the variance of human emotions across different contexts. In the current effort, we introduce 1) MoEmo (Motion to Emotion), a cross-attention vision transformer (ViT) for human emotion detection within robotics systems based on 3D human pose estimations across various contexts, and 2) a data set that offers full-body videos of human movement and corresponding emotion labels based on human gestures and environmental contexts. Compared to existing approaches, our method effectively leverages the subtle connections between movement vectors of gestures and environmental contexts through the use of cross-attention on the extracted movement vectors of full-body human gestures/poses and feature maps of environmental contexts. We implement a cross-attention fusion model to combine movement vectors and environment contexts into a joint representation to derive emotion estimation. Leveraging our Naturalistic Motion Database, we train the MoEmo system to jointly analyze motion and context, yielding emotion detection that outperforms the current state-of-the-art.
Abstract:Estimating truncated density models is difficult, as these models have intractable normalising constants and hard to satisfy boundary conditions. Score matching can be adapted to solve the truncated density estimation problem, but requires a continuous weighting function which takes zero at the boundary and is positive elsewhere. Evaluation of such a weighting function (and its gradient) often requires a closed-form expression of the truncation boundary and finding a solution to a complicated optimisation problem. In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation. We develop a novel discrepancy measure, truncated kernelised Stein discrepancy (TKSD), which does not require fixing a weighting function in advance, and can be evaluated using only samples on the boundary. We estimate a truncated density model by minimising the Lagrangian dual of TKSD. Finally, experiments show the accuracy of our method to be an improvement over previous works even without the explicit functional form of the boundary.
Abstract:Stein Variational Gradient Descent (SVGD) can transport particles along trajectories that reduce the KL divergence between the target and particle distribution but requires the target score function to compute the update. We introduce a new perspective on SVGD that views it as a local estimator of the reversed KL gradient flow. This perspective inspires us to propose new estimators that use local linear models to achieve the same purpose. The proposed estimators can be computed using only samples from the target and particle distribution without needing the target score function. Our proposed variational gradient estimators utilize local linear models, resulting in computational simplicity while maintaining effectiveness comparable to SVGD in terms of estimation biases. Additionally, we demonstrate that under a mild assumption, the estimation of high-dimensional gradient flow can be translated into a lower-dimensional estimation problem, leading to improved estimation accuracy. We validate our claims with experiments on both simulated and real-world datasets.
Abstract:Most multi-domain machine translation models rely on domain-annotated data. Unfortunately, domain labels are usually unavailable in both training processes and real translation scenarios. In this work, we propose a label-free multi-domain machine translation model which requires only a few or no domain-annotated data in training and no domain labels in inference. Our model is composed of three parts: a backbone model, a domain discriminator taking responsibility to discriminate data from different domains, and a set of experts that transfer the decoded features from generic to specific. We design a stage-wise training strategy and train the three parts sequentially. To leverage the extra domain knowledge and improve the training stability, in the discriminator training stage, domain differences are modeled explicitly with clustering and distilled into the discriminator through a multi-classification task. Meanwhile, the Gumbel-Max sampling is adopted as the routing scheme in the expert training stage to achieve the balance of each expert in specialization and generalization. Experimental results on the German-to-English translation task show that our model significantly improves BLEU scores on six different domains and even outperforms most of the models trained with domain-annotated data.
Abstract:Density Ratio Estimation (DRE) is an important machine learning technique with many downstream applications. We consider the challenge of DRE with missing not at random (MNAR) data. In this setting, we show that using standard DRE methods leads to biased results while our proposal (M-KLIEP), an adaptation of the popular DRE procedure KLIEP, restores consistency. Moreover, we provide finite sample estimation error bounds for M-KLIEP, which demonstrate minimax optimality with respect to both sample size and worst-case missingness. We then adapt an important downstream application of DRE, Neyman-Pearson (NP) classification, to this MNAR setting. Our procedure both controls Type I error and achieves high power, with high probability. Finally, we demonstrate promising empirical performance both synthetic data and real-world data with simulated missingness.
Abstract:The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsistent. In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. We introduce a unified generative modeling framework - MonoFlow: the particle evolution is rescaled via a monotonically increasing mapping of the log density ratio. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field via training the discriminator and the generator learns to draw the particle flow defined by the corresponding vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training. This analysis helps us to identify what types of generator loss functions can lead to the successful training of GANs and suggest that GANs may have more loss designs beyond the literature (e.g., non-saturated loss), as long as they realize MonoFlow. Consistent empirical studies are included to validate the effectiveness of our framework.
Abstract:We introduce Sequential Neural Posterior Score Estimation (SNPSE) and Sequential Neural Likelihood Score Estimation (SNLSE), two new score-based methods for Bayesian inference in simulator-based models. Our methods, inspired by the success of score-based methods in generative modelling, leverage conditional score-based diffusion models to generate samples from the posterior distribution of interest. These models can be trained using one of two possible objective functions, one of which approximates the score of the intractable likelihood, while the other directly estimates the score of the posterior. We embed these models into a sequential training procedure, which guides simulations using the current approximation of the posterior at the observation of interest, thereby reducing the simulation cost. We validate our methods, as well as their amortised, non-sequential variants, on several numerical examples, demonstrating comparable or superior performance to existing state-of-the-art methods such as Sequential Neural Posterior Estimation (SNPE) and Sequential Neural Likelihood Estimation (SNLE).
Abstract:Noncontact particle manipulation (NPM) technology has significantly extended mankind's analysis capability into micro and nano scale, which in turn greatly promoted the development of material science and life science. Though NPM by means of electric, magnetic, and optical field has achieved great success, from the robotic perspective, it is still labor-intensive manipulation since professional human assistance is somehow mandatory in early preparation stage. Therefore, developing automated noncontact trapping of moving particles is worthwhile, particularly for applications where particle samples are rare, fragile or contact sensitive. Taking advantage of latest dynamic acoustic field modulating technology, and particularly by virtue of the great scalability of acoustic manipulation from micro-scale to sub-centimeter-scale, we propose an automated noncontact trapping of moving micro-particles with ultrasonic phased array system and microscopic vision in this paper. The main contribution of this work is for the first time, as far as we know, we achieved fully automated moving micro-particle trapping in acoustic NPM field by resorting to robotic approach. In short, the particle moving status is observed and predicted by binocular microscopic vision system, by referring to which the acoustic trapping zone is calculated and generated to capture and stably hold the particle. The problem of hand-eye relationship of noncontact robotic end-effector is also solved in this work. Experiments demonstrated the effectiveness of this work.