Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cewu Lu

ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment

Aug 19, 2023

Bingyang Zhou, Haoyu Zhou, Tianhai Liang, Qiaojun Yu, Siheng Zhao, Yuwei Zeng, Jun Lv, Siyuan Luo, Qiancai Wang, Xinyuan Yu(+3 more)

Figure 1 for ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment

Figure 2 for ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment

Figure 3 for ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment

Figure 4 for ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment

Abstract:We present ClothesNet: a large-scale dataset of 3D clothes objects with information-rich annotations. Our dataset consists of around 4400 models covering 11 categories annotated with clothes features, boundary lines, and keypoints. ClothesNet can be used to facilitate a variety of computer vision and robot interaction tasks. Using our dataset, we establish benchmark tasks for clothes perception, including classification, boundary line segmentation, and keypoint detection, and develop simulated clothes environments for robotic interaction tasks, including rearranging, folding, hanging, and dressing. We also demonstrate the efficacy of our ClothesNet in real-world experiments. Supplemental materials and dataset are available on our project webpage.

* IEEE/CVF International Conference on Computer Vision (ICCV) 2023

Via

Access Paper or Ask Questions

Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

Aug 14, 2023

Licheng Zhong, Lixin Yang, Kailin Li, Haoyu Zhen, Mei Han, Cewu Lu

Figure 1 for Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

Figure 2 for Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

Figure 3 for Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

Figure 4 for Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

Abstract:The reconstruction of object surfaces from multi-view images or monocular video is a fundamental issue in computer vision. However, much of the recent research concentrates on reconstructing geometry through implicit or explicit methods. In this paper, we shift our focus towards reconstructing mesh in conjunction with color. We remove the view-dependent color from neural volume rendering while retaining volume rendering performance through a relighting network. Mesh is extracted from the signed distance function (SDF) network for the surface, and color for each surface vertex is drawn from the global color network. To evaluate our approach, we conceived a in hand object scanning task featuring numerous occlusions and dramatic shifts in lighting conditions. We've gathered several videos for this task, and the results surpass those of any existing methods capable of reconstructing mesh alongside color. Additionally, our method's performance was assessed using public datasets, including DTU, BlendedMVS, and OmniObject3D. The results indicated that our method performs well across all these datasets. Project page: https://colmar-zlicheng.github.io/color_neus.

Via

Access Paper or Ask Questions

RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Jul 02, 2023

Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Junbo Wang, Haoyi Zhu, Cewu Lu

Figure 1 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Figure 2 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Figure 3 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Figure 4 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Abstract:A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations. This feature is attractive for enabling robots to acquire new skills and improving task and motion planning. However, due to limitations in the training dataset, the current focus of the community has mainly been on simple cases, such as push or pick-place tasks, relying solely on visual guidance. In reality, there are many complex skills, some of which may even require both visual and tactile perception to solve. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception. To achieve this, we have collected a dataset comprising over 110,000 \emph{contact-rich} robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected \emph{in the real world}. Each sequence in the dataset includes visual, force, audio, and action information, along with a corresponding human demonstration video. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. The dataset is made publicly available at rh20t.github.io

* RSS 2023 workshop on LTAMP. The project page is at rh20t.github.io

Via

Access Paper or Ask Questions

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

May 28, 2023

Yue Xu, Yong-Lu Li, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

Figure 1 for Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Figure 2 for Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Figure 3 for Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Figure 4 for Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Abstract:Data-efficient learning has drawn significant attention, especially given the current trend of large multi-modal models, where dataset distillation can be an effective solution. However, the dataset distillation process itself is still very inefficient. In this work, we model the distillation problem with reference to information theory. Observing that severe data redundancy exists in dataset distillation, we argue to put more emphasis on the utility of the training samples. We propose a family of methods to exploit the most valuable samples, which is validated by our comprehensive analysis of the optimal data selection. The new strategy significantly reduces the training cost and extends a variety of existing distillation algorithms to larger and more diversified datasets, e.g. in some cases only 0.04% training data is sufficient for comparable distillation performance. Moreover, our strategy consistently enhances the performance, which may open up new analyses on the dynamics of distillation and networks. Our method is able to extend the distillation algorithms to much larger-scale datasets and more heterogeneous datasets, e.g. ImageNet-1K and Kinetics-400. Our code will be made publicly available.

Via

Access Paper or Ask Questions

NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation

May 15, 2023

Jiefeng Li, Siyuan Bian, Qi Liu, Jiasheng Tang, Fan Wang, Cewu Lu

Abstract:With the progress of 3D human pose and shape estimation, state-of-the-art methods can either be robust to occlusions or obtain pixel-aligned accuracy in non-occlusion cases. However, they cannot obtain robustness and mesh-image alignment at the same time. In this work, we present NIKI (Neural Inverse Kinematics with Invertible Neural Network), which models bi-directional errors to improve the robustness to occlusions and obtain pixel-aligned accuracy. NIKI can learn from both the forward and inverse processes with invertible networks. In the inverse process, the model separates the error from the plausible 3D pose manifold for a robust 3D human pose estimation. In the forward process, we enforce the zero-error boundary conditions to improve the sensitivity to reliable joint positions for better mesh-image alignment. Furthermore, NIKI emulates the analytical inverse kinematics algorithms with the twist-and-swing decomposition for better interpretability. Experiments on standard and occlusion-specific benchmarks demonstrate the effectiveness of NIKI, where we exhibit robust and well-aligned results simultaneously. Code is available at https://github.com/Jeff-sjtu/NIKI

* CVPR 2023

Via

Access Paper or Ask Questions

HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-body Mesh Recovery

Apr 12, 2023

Jiefeng Li, Siyuan Bian, Chao Xu, Zhicun Chen, Lixin Yang, Cewu Lu

Abstract:Recovering whole-body mesh by inferring the abstract pose and shape parameters from visual content can obtain 3D bodies with realistic structures. However, the inferring process is highly non-linear and suffers from image-mesh misalignment, resulting in inaccurate reconstruction. In contrast, 3D keypoint estimation methods utilize the volumetric representation to achieve pixel-level accuracy but may predict unrealistic body structures. To address these issues, this paper presents a novel hybrid inverse kinematics solution, HybrIK, that integrates the merits of 3D keypoint estimation and body mesh recovery in a unified framework. HybrIK directly transforms accurate 3D joints to body-part rotations via twist-and-swing decomposition. The swing rotations are analytically solved with 3D joints, while the twist rotations are derived from visual cues through neural networks. To capture comprehensive whole-body details, we further develop a holistic framework, HybrIK-X, which enhances HybrIK with articulated hands and an expressive face. HybrIK-X is fast and accurate by solving the whole-body pose with a one-stage model. Experiments demonstrate that HybrIK and HybrIK-X preserve both the accuracy of 3D joints and the realistic structure of the parametric human model, leading to pixel-aligned whole-body mesh recovery. The proposed method significantly surpasses the state-of-the-art methods on various benchmarks for body-only, hand-only, and whole-body scenarios. Code and results can be found at https://jeffli.site/HybrIK-X/

* An eXpressive extension of HybrIK [arXiv:2011.14672], supports SMPL-X. arXiv admin note: substantial text overlap with arXiv:2011.14672

Via

Access Paper or Ask Questions

POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Apr 08, 2023

Lixin Yang, Jian Xu, Licheng Zhong, Xinyu Zhan, Zhicheng Wang, Kejian Wu, Cewu Lu

Figure 1 for POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Figure 2 for POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Figure 3 for POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Figure 4 for POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo

Abstract:Enable neural networks to capture 3D geometrical-aware features is essential in multi-view based vision tasks. Previous methods usually encode the 3D information of multi-view stereo into the 2D features. In contrast, we present a novel method, named POEM, that directly operates on the 3D POints Embedded in the Multi-view stereo for reconstructing hand mesh in it. Point is a natural form of 3D information and an ideal medium for fusing features across views, as it has different projections on different views. Our method is thus in light of a simple yet effective idea, that a complex 3D hand mesh can be represented by a set of 3D points that 1) are embedded in the multi-view stereo, 2) carry features from the multi-view images, and 3) encircle the hand. To leverage the power of points, we design two operations: point-based feature fusion and cross-set point attention mechanism. Evaluation on three challenging multi-view datasets shows that POEM outperforms the state-of-the-art in hand mesh reconstruction. Code and models are available for research at https://github.com/lixiny/POEM.

* Accepted by CVPR 2023

Via

Access Paper or Ask Questions

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Apr 04, 2023

Yong-Lu Li, Xiaoqian Wu, Xinpeng Liu, Yiming Dou, Yikun Ji, Junyi Zhang, Yixing Li, Jingru Tan, Xudong Lu, Cewu Lu

Figure 1 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Figure 2 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Figure 3 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Figure 4 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Abstract:Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like "Isolated Islands" due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space given verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging "isolated islands" into a "Pangea". Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our system shows significant superiority, especially in transfer learning. Code and data will be made publicly available.

* Project Webpage: https://mvig-rhos.com/pangea

Via

Access Paper or Ask Questions

Visual-Tactile Sensing for In-Hand Object Reconstruction

Mar 25, 2023

Wenqiang Xu, Zhenjun Yu, Han Xue, Ruolin Ye, Siqiong Yao, Cewu Lu

Abstract:Tactile sensing is one of the modalities humans rely on heavily to perceive the world. Working with vision, this modality refines local geometry structure, measures deformation at the contact area, and indicates the hand-object contact state. With the availability of open-source tactile sensors such as DIGIT, research on visual-tactile learning is becoming more accessible and reproducible. Leveraging this tactile sensor, we propose a novel visual-tactile in-hand object reconstruction framework \textbf{VTacO}, and extend it to \textbf{VTacOH} for hand-object reconstruction. Since our method can support both rigid and deformable object reconstruction, no existing benchmarks are proper for the goal. We propose a simulation environment, VT-Sim, which supports generating hand-object interaction for both rigid and deformable objects. With VT-Sim, we generate a large-scale training dataset and evaluate our method on it. Extensive experiments demonstrate that our proposed method can outperform the previous baseline methods qualitatively and quantitatively. Finally, we directly apply our model trained in simulation to various real-world test cases, which display qualitative results. Codes, models, simulation environment, and datasets are available at \url{https://sites.google.com/view/vtaco/}.

* Accepted in CVPR 2023

Via

Access Paper or Ask Questions

GarmentTracking: Category-Level Garment Pose Tracking

Mar 24, 2023

Han Xue, Wenqiang Xu, Jieyi Zhang, Tutian Tang, Yutong Li, Wenxin Du, Ruolin Ye, Cewu Lu

Abstract:Garments are important to humans. A visual system that can estimate and track the complete garment pose can be useful for many downstream tasks and real-world applications. In this work, we present a complete package to address the category-level garment pose tracking task: (1) A recording system VR-Garment, with which users can manipulate virtual garment models in simulation through a VR interface. (2) A large-scale dataset VR-Folding, with complex garment pose configurations in manipulation like flattening and folding. (3) An end-to-end online tracking framework GarmentTracking, which predicts complete garment pose both in canonical space and task space given a point cloud sequence. Extensive experiments demonstrate that the proposed GarmentTracking achieves great performance even when the garment has large non-rigid deformation. It outperforms the baseline approach on both speed and accuracy. We hope our proposed solution can serve as a platform for future research. Codes and datasets are available in https://garment-tracking.robotflow.ai.

Via

Access Paper or Ask Questions