Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kris Kitani

SMPLOlympics: Sports Environments for Physically Simulated Humanoids

Jun 28, 2024

Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, Jingbo Wang, Ye Yuan, Jinkun Cao, Zihui Lin, Fengyi Wang(+2 more)

Figure 1 for SMPLOlympics: Sports Environments for Physically Simulated Humanoids

Figure 2 for SMPLOlympics: Sports Environments for Physically Simulated Humanoids

Figure 3 for SMPLOlympics: Sports Environments for Physically Simulated Humanoids

Figure 4 for SMPLOlympics: Sports Environments for Physically Simulated Humanoids

Abstract:We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for many years, there is also a plethora of existing knowledge on the preferred strategy to achieve better performance. To leverage these existing human demonstrations from videos and motion capture, we design our humanoid to be compatible with the widely-used SMPL and SMPL-X human models from the vision and graphics community. We provide a suite of individual sports environments, including golf, javelin throw, high jump, long jump, and hurdling, as well as competitive sports, including both 1v1 and 2v2 games such as table tennis, tennis, fencing, boxing, soccer, and basketball. Our analysis shows that combining strong motion priors with simple rewards can result in human-like behavior in various sports. By providing a unified sports benchmark and baseline implementation of state and reward designs, we hope that SMPLOlympics can help the control and animation communities achieve human-like and performant behaviors.

* Project page: https://smplolympics.github.io/SMPLOlympics

Via

Access Paper or Ask Questions

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Jun 13, 2024

Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

Figure 1 for OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Figure 2 for OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Figure 3 for OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Figure 4 for OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Abstract:We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets.

* Project page: https://omni.human2humanoid.com/

Via

Access Paper or Ask Questions

Generalizable Neural Human Renderer

Apr 22, 2024

Mana Masuda, Jinhyung Park, Shun Iwase, Rawal Khirodkar, Kris Kitani

Abstract:While recent advancements in animatable human rendering have achieved remarkable results, they require test-time optimization for each subject which can be a significant limitation for real-world applications. To address this, we tackle the challenging task of learning a Generalizable Neural Human Renderer (GNH), a novel method for rendering animatable humans from monocular video without any test-time optimization. Our core method focuses on transferring appearance information from the input video to the output image plane by utilizing explicit body priors and multi-view geometry. To render the subject in the intended pose, we utilize a straightforward CNN-based image renderer, foregoing the more common ray-sampling or rasterizing-based rendering modules. Our GNH achieves remarkable generalizable, photorealistic rendering with unseen subjects with a three-stage process. We quantitatively and qualitatively demonstrate that GNH significantly surpasses current state-of-the-art methods, notably achieving a 31.3% improvement in LPIPS.

Via

Access Paper or Ask Questions

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Apr 18, 2024

Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani

Abstract:We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution, we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field for the object. We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis. We believe that our model, trained by aggregating seven diverse real-world interaction datasets spanning across 155 categories, represents a first approach that allows jointly generating both hand and object. Our empirical evaluations demonstrate the benefit of this joint prior in video-based reconstruction and human grasp synthesis, outperforming current task-specific baselines. Project website: https://judyye.github.io/ghop-www

* accepted to CVPR2024; project page at https://judyye.github.io/ghop-www

Via

Access Paper or Ask Questions

Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration

Apr 16, 2024

Benjamin A Newman, Chris Paxton, Kris Kitani, Henny Admoni

Abstract:Agents that assist people need to have well-initialized policies that can adapt quickly to align with their partners' reward functions. Initializing policies to maximize performance with unknown partners can be achieved by bootstrapping nonlinear models using imitation learning over large, offline datasets. Such policies can require prohibitive computation to fine-tune in-situ and therefore may miss critical run-time information about a partner's reward function as expressed through their immediate behavior. In contrast, online logistic regression using low-capacity models performs rapid inference and fine-tuning updates and thus can make effective use of immediate in-task behavior for reward function alignment. However, these low-capacity models cannot be bootstrapped as effectively by offline datasets and thus have poor initializations. We propose BLR-HAC, Bootstrapped Logistic Regression for Human Agent Collaboration, which bootstraps large nonlinear models to learn the parameters of a low-capacity model which then uses online logistic regression for updates during collaboration. We test BLR-HAC in a simulated surface rearrangement task and demonstrate that it achieves higher zero-shot accuracy than shallow methods and takes far less computation to adapt online while still achieving similar performance to fine-tuned, large nonlinear models. For code, please see our project page https://sites.google.com/view/blr-hac.

* 10 pages, 4 figures, Accepted to AAMAS 2024

Via

Access Paper or Ask Questions

Zero-Shot Multi-Object Shape Completion

Mar 21, 2024

Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares Ambrus, Sergey Zakharov

Abstract:We present a 3D shape completion method that recovers the complete geometry of multiple objects in complex scenes from a single RGB-D image. Despite notable advancements in single object 3D shape completion, high-quality reconstructions in highly cluttered real-world multi-object scenes remains a challenge. To address this issue, we propose OctMAE, an architecture that leverages an Octree U-Net and a latent 3D MAE to achieve high-quality and near real-time multi-object shape completion through both local and global geometric reasoning. Because a na\"ive 3D MAE can be computationally intractable and memory intensive even in the latent space, we introduce a novel occlusion masking strategy and adopt 3D rotary embeddings, which significantly improves the runtime and shape completion quality. To generalize to a wide range of objects in diverse scenes, we create a large-scale photorealistic dataset, featuring a diverse set of 12K 3D object models from the Objaverse dataset which are rendered in multi-object scenes with physics-based positioning. Our method outperforms the current state-of-the-art on both synthetic and real-world datasets and demonstrates a strong zero-shot capability.

* 21 pages, 8 figues

Via

Access Paper or Ask Questions

Real-Time Simulated Avatar from Head-Mounted Sensors

Mar 11, 2024

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Kris Kitani, Weipeng Xu

Abstract:We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion, but lack fine-grained details about the hands and feet. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion. We design an end-to-end method that does not rely on any intermediate representations and learns to directly map from images and headset poses to humanoid control signals. To train our method, we also propose a large-scale synthetic dataset created using camera configurations compatible with a commercially available VR headset (Quest 2) and show promising results on real-world captures. To demonstrate the applicability of our framework, we also test it on an AR headset with a forward-facing camera.

* CVPR 2024. Website: https://www.zhengyiluo.com/SimXR/

Via

Access Paper or Ask Questions

Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

Mar 07, 2024

Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, Guanya Shi

Figure 1 for Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

Figure 2 for Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

Figure 3 for Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

Figure 4 for Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation

Abstract:We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot with only an RGB camera. To create a large-scale retargeted motion dataset of human movements for humanoid robots, we propose a scalable "sim-to-data" process to filter and pick feasible motions using a privileged motion imitator. Afterwards, we train a robust real-time humanoid motion imitator in simulation using these refined motions and transfer it to the real humanoid robot in a zero-shot manner. We successfully achieve teleoperation of dynamic whole-body motions in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, boxing, etc. To the best of our knowledge, this is the first demonstration to achieve learning-based real-time whole-body humanoid teleoperation.

* Project website: https://human2humanoid.com/

Via

Access Paper or Ask Questions

Multi-Object Tracking by Hierarchical Visual Representations

Feb 24, 2024

Jinkun Cao, Jiangmiao Pang, Kris Kitani

Figure 1 for Multi-Object Tracking by Hierarchical Visual Representations

Figure 2 for Multi-Object Tracking by Hierarchical Visual Representations

Figure 3 for Multi-Object Tracking by Hierarchical Visual Representations

Figure 4 for Multi-Object Tracking by Hierarchical Visual Representations

Abstract:We propose a new visual hierarchical representation paradigm for multi-object tracking. It is more effective to discriminate between objects by attending to objects' compositional visual regions and contrasting with the background contextual information instead of sticking to only the semantic visual cue such as bounding boxes. This compositional-semantic-contextual hierarchy is flexible to be integrated in different appearance-based multi-object tracking methods. We also propose an attention-based visual feature module to fuse the hierarchical visual representations. The proposed method achieves state-of-the-art accuracy and time efficiency among query-based methods on multiple multi-object tracking benchmarks.

* 6 pages, 3 figures, 10 tables, accepted by ICRA 2024

Via

Access Paper or Ask Questions

Mixed Gaussian Flow for Diverse Trajectory Prediction

Feb 19, 2024

Jiahe Chen, Jinkun Cao, Dahua Lin, Kris Kitani, Jiangmiao Pang

Figure 1 for Mixed Gaussian Flow for Diverse Trajectory Prediction

Figure 2 for Mixed Gaussian Flow for Diverse Trajectory Prediction

Figure 3 for Mixed Gaussian Flow for Diverse Trajectory Prediction

Figure 4 for Mixed Gaussian Flow for Diverse Trajectory Prediction

Abstract:Existing trajectory prediction studies intensively leverage generative models. Normalizing flow is one of the genres with the advantage of being invertible to derive the probability density of predicted trajectories. However, mapping from a standard Gaussian by a flow-based model hurts the capacity to capture complicated patterns of trajectories, ignoring the under-represented motion intentions in the training data. To solve the problem, we propose a flow-based model to transform a mixed Gaussian prior into the future trajectory manifold. The model shows a better capacity for generating diverse trajectory patterns. Also, by associating each sub-Gaussian with a certain subspace of trajectories, we can generate future trajectories with controllable motion intentions. In such a fashion, the flow-based model is not encouraged to simply seek the most likelihood of the intended manifold anymore but a family of controlled manifolds with explicit interpretability. Our proposed method is demonstrated to show state-of-the-art performance in the quantitative evaluation of sampling well-aligned trajectories in top-M generated candidates. We also demonstrate that it can generate diverse, controllable, and out-of-distribution trajectories. Code is available at https://github.com/mulplue/MGF.

Via

Access Paper or Ask Questions