Alert button
Picture for Xiao Teng

Xiao Teng

Alert button

Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models

Aug 29, 2023
Lei Han, Qingxu Zhu, Jiapeng Sheng, Chong Zhang, Tingguang Li, Yizheng Zhang, He Zhang, Yuzhen Liu, Cheng Zhou, Rui Zhao, Jie Li, Yufeng Zhang, Rui Wang, Wanchao Chi, Xiong Li, Yonghui Zhu, Lingzhu Xiang, Xiao Teng, Zhengyou Zhang

Figure 1 for Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models
Figure 2 for Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models
Figure 3 for Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models
Figure 4 for Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models

Summarizing knowledge from animals and human beings inspires robotic innovations. In this work, we propose a framework for driving legged robots act like real animals with lifelike agility and strategy in complex environments. Inspired by large pre-trained models witnessed with impressive performance in language and image understanding, we introduce the power of advanced deep generative models to produce motor control signals stimulating legged robots to act like real animals. Unlike conventional controllers and end-to-end RL methods that are task-specific, we propose to pre-train generative models over animal motion datasets to preserve expressive knowledge of animal behavior. The pre-trained model holds sufficient primitive-level knowledge yet is environment-agnostic. It is then reused for a successive stage of learning to align with the environments by traversing a number of challenging obstacles that are rarely considered in previous approaches, including creeping through narrow spaces, jumping over hurdles, freerunning over scattered blocks, etc. Finally, a task-specific controller is trained to solve complex downstream tasks by reusing the knowledge from previous stages. Enriching the knowledge regarding each stage does not affect the usage of other levels of knowledge. This flexible framework offers the possibility of continual knowledge accumulation at different levels. We successfully apply the trained multi-level controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles, and play in a designed challenging multi-agent Chase Tag Game, where lifelike agility and strategy emerge on the robots. The present research pushes the frontier of robot control with new insights on reusing multi-level pre-trained knowledge and solving highly complex downstream tasks in the real world.

Viaarxiv icon

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

Jul 03, 2023
Yonglin Li, Jing Zhang, Xiao Teng, Long Lan

Figure 1 for RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation
Figure 2 for RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation
Figure 3 for RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation
Figure 4 for RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which for the first time explores the potential of SAM for RVOS by incorporating multi-view information from diverse modalities and successive frames at different timestamps. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-Modal MLP that projects the text embedding of the referring expression into sparse and dense embeddings, serving as user-interactive prompts. Subsequently, a parameter-efficient tuning strategy is employed to effectively align and fuse the language and vision features. Through comprehensive ablation studies, we demonstrate the practical and effective design choices of our strategy. Extensive experiments conducted on Ref-Youtu-VOS and Ref-DAVIS17 datasets validate the superiority and effectiveness of our RefSAM model over existing methods. The code and models will be made publicly at \href{https://github.com/LancasterLi/RefSAM}{github.com/LancasterLi/RefSAM}.

* The code and models will be made publicly at https://github.com/LancasterLi/RefSAM 
Viaarxiv icon

Multi-scale Knowledge Distillation for Unsupervised Person Re-Identification

Apr 21, 2022
Long Lan, Xiao Teng, Haoang Chi, Xiang Zhang

Figure 1 for Multi-scale Knowledge Distillation for Unsupervised Person Re-Identification
Figure 2 for Multi-scale Knowledge Distillation for Unsupervised Person Re-Identification
Figure 3 for Multi-scale Knowledge Distillation for Unsupervised Person Re-Identification
Figure 4 for Multi-scale Knowledge Distillation for Unsupervised Person Re-Identification

Unsupervised person re-identification is a challenging and promising task in the computer vision. Nowadays unsupervised person re-identification methods have achieved great improvements by training with pseudo labels. However, the appearance and label noise are less explicitly studied in the unsupervised manner. To relieve the effects of appearance noise the global features involved, we also take into account the features from two local views and produce multi-scale features. We explore the knowledge distillation to filter label noise, Specifically, we first train a teacher model from noisy pseudo labels in a iterative way, and then use the teacher model to guide the learning of our student model. In our setting, the student model could converge fast in the supervision of the teacher model thus reduce the interference of noisy labels as the teacher model greatly suffered. After carefully handling the noises in the feature learning, Our multi-scale knowledge distillation are proven to be very effective in the unsupervised re-identification. Extensive experiments on three popular person re-identification datasets demonstrate the superiority of our method. Especially, our approach achieves a state-of-the-art accuracy 85.7% @mAP or 94.3% @Rank-1 on the challenging Market-1501 benchmark with ResNet-50 under the fully unsupervised setting.

Viaarxiv icon