Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masayoshi Tomizuka

Towards Free Data Selection with General-Purpose Models

Oct 14, 2023

Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan

Figure 1 for Towards Free Data Selection with General-Purpose Models

Figure 2 for Towards Free Data Selection with General-Purpose Models

Figure 3 for Towards Free Data Selection with General-Purpose Models

Figure 4 for Towards Free Data Selection with General-Purpose Models

Abstract:A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. However, current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. In this paper, we challenge this status quo by designing a distinct data selection pipeline that utilizes existing general-purpose models to select data from various datasets with a single-pass inference without the need for additional training or supervision. A novel free data selection (FreeSel) method is proposed following this new pipeline. Specifically, we define semantic patterns extracted from inter-mediate features of the general-purpose model to capture subtle local information in each image. We then enable the selection of all data samples in a single pass through distance-based sampling at the fine-grained semantic pattern level. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods. Extensive experiments verify the effectiveness of FreeSel on various computer vision tasks. Our code is available at https://github.com/yichen928/FreeSel.

* accepted by NeurIPS 2023

Via

Access Paper or Ask Questions

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

Oct 13, 2023

Hao Sha, Yao Mu, Yuxuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding

Abstract:Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability. To address these problems, this work employs Large Language Models (LLMs) as a decision-making component for complex AD scenarios that require human commonsense understanding. We devise cognitive pathways to enable comprehensive reasoning with LLMs, and develop algorithms for translating LLM decisions into actionable driving commands. Through this approach, LLM decisions are seamlessly integrated with low-level controllers by guided parameter matrix adaptation. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination, thanks to the commonsense reasoning capabilities of LLMs. This paper presents an initial step toward leveraging LLMs as effective decision-makers for intricate AD scenarios in terms of safety, efficiency, generalizability, and interoperability. We aspire for it to serve as inspiration for future research in this field. Project page: https://sites.google.com/view/llm-mpc

Via

Access Paper or Ask Questions

Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Oct 11, 2023

Yuxin Chen, Chen Tang, Ran Tian, Chenran Li, Jinning Li, Masayoshi Tomizuka, Wei Zhan

Figure 1 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Figure 2 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Figure 3 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Figure 4 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Abstract:Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which an agent is influenced by unseen co-players depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on effectively training agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

What Matters to You? Towards Visual Representation Alignment for Robot Learning

Oct 11, 2023

Ran Tian, Chenfeng Xu, Masayoshi Tomizuka, Jitendra Malik, Andrea Bajcsy

Figure 1 for What Matters to You? Towards Visual Representation Alignment for Robot Learning

Figure 2 for What Matters to You? Towards Visual Representation Alignment for Robot Learning

Figure 3 for What Matters to You? Towards Visual Representation Alignment for Robot Learning

Figure 4 for What Matters to You? Towards Visual Representation Alignment for Robot Learning

Abstract:When operating in service of people, robots need to optimize rewards aligned with end-user preferences. Since robots will rely on raw perceptual inputs like RGB images, their rewards will inevitably use visual representations. Recently there has been excitement in using representations from pre-trained visual models, but key to making these work in robotics is fine-tuning, which is typically done via proxy tasks like dynamics prediction or enforcing temporal cycle-consistency. However, all these proxy tasks bypass the human's input on what matters to them, exacerbating spurious correlations and ultimately leading to robot behaviors that are misaligned with user preferences. In this work, we propose that robots should leverage human feedback to align their visual representations with the end-user and disentangle what matters for the task. We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem and visual reward learning problem through the lens of preference-based learning and optimal transport. Across experiments in X-MAGICAL and in robotic manipulation, we find that RAPL's reward consistently generates preferred robot behaviors with high sample efficiency, and shows strong zero-shot generalization when the visual representation is learned from a different embodiment than the robot's.

Via

Access Paper or Ask Questions

Frequency Domain Analysis of Nonlinear Series Elastic Actuator via Describing Function

Oct 05, 2023

Motohiro Hirao, Burak Kurkcu, Alireza Ghanbarpour, Masayoshi Tomizuka

Figure 1 for Frequency Domain Analysis of Nonlinear Series Elastic Actuator via Describing Function

Figure 2 for Frequency Domain Analysis of Nonlinear Series Elastic Actuator via Describing Function

Figure 3 for Frequency Domain Analysis of Nonlinear Series Elastic Actuator via Describing Function

Figure 4 for Frequency Domain Analysis of Nonlinear Series Elastic Actuator via Describing Function

Abstract:Nonlinear stiffness SEAs (NSEAs) inspired by biological muscles offer promise in achieving adaptable stiffness for assistive robots. While assistive robots are often designed and compared based on torque capability and control bandwidth, NSEAs have not been systematically designed in the frequency domain due to their nonlinearity. The describing function, an analytical concept for nonlinear systems, offers a means to understand their behavior in the frequency domain. This paper introduces a frequency domain analysis of nonlinear series elastic actuators using the describing function method. This framework aims to equip researchers and engineers with tools for improved design and control in assistive robotics.

* accepted by 2023 IEEE ROBIO conference

Via

Access Paper or Ask Questions

Adaptive Spatio-Temporal Voxels Based Trajectory Planning for Autonomous Driving in Highway Traffic Flow

Oct 04, 2023

Zhiqiang Jian, Songyi Zhang, Lingfeng Sun, Wei Zhan, Masayoshi Tomizuka, Nanning Zheng

Figure 1 for Adaptive Spatio-Temporal Voxels Based Trajectory Planning for Autonomous Driving in Highway Traffic Flow

Figure 2 for Adaptive Spatio-Temporal Voxels Based Trajectory Planning for Autonomous Driving in Highway Traffic Flow

Figure 3 for Adaptive Spatio-Temporal Voxels Based Trajectory Planning for Autonomous Driving in Highway Traffic Flow

Figure 4 for Adaptive Spatio-Temporal Voxels Based Trajectory Planning for Autonomous Driving in Highway Traffic Flow

Abstract:Trajectory planning is crucial for the safe driving of autonomous vehicles in highway traffic flow. Currently, some advanced trajectory planning methods utilize spatio-temporal voxels to construct feasible regions and then convert trajectory planning into optimization problem solving based on the feasible regions. However, these feasible region construction methods cannot adapt to the changes in dynamic environments, making them difficult to apply in complex traffic flow. In this paper, we propose a trajectory planning method based on adaptive spatio-temporal voxels which improves the construction of feasible regions and trajectory optimization while maintaining the quadratic programming form. The method can adjust feasible regions and trajectory planning according to real-time traffic flow and environmental changes, realizing vehicles to drive safely in complex traffic flow. The proposed method has been tested in both open-loop and closed-loop environments, and the test results show that our method outperforms the current planning methods.

* IEEE ITSC 2023
* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Long-Term Dynamic Window Approach for Kinodynamic Local Planning in Static and Crowd Environments

Oct 04, 2023

Zhiqiang Jian, Songyi Zhang, Lingfeng Sun, Wei Zhan, Nanning Zheng, Masayoshi Tomizuka

Figure 1 for Long-Term Dynamic Window Approach for Kinodynamic Local Planning in Static and Crowd Environments

Figure 2 for Long-Term Dynamic Window Approach for Kinodynamic Local Planning in Static and Crowd Environments

Figure 3 for Long-Term Dynamic Window Approach for Kinodynamic Local Planning in Static and Crowd Environments

Figure 4 for Long-Term Dynamic Window Approach for Kinodynamic Local Planning in Static and Crowd Environments

Abstract:Local planning for a differential wheeled robot is designed to generate kinodynamic feasible actions that guide the robot to a goal position along the navigation path while avoiding obstacles. Reactive, predictive, and learning-based methods are widely used in local planning. However, few of them can fit static and crowd environments while satisfying kinodynamic constraints simultaneously. To solve this problem, we propose a novel local planning method. The method applies a long-term dynamic window approach to generate an initial trajectory and then optimizes it with graph optimization. The method can plan actions under the robot's kinodynamic constraints in real time while allowing the generated actions to be safer and more jitterless. Experimental results show that the proposed method adapts well to crowd and static environments and outperforms most SOTA approaches.

* 2023 IEEE RA-L
* 9 pages, 7 figures

Via

Access Paper or Ask Questions

Human-oriented Representation Learning for Robotic Manipulation

Oct 04, 2023

Mingxiao Huo, Mingyu Ding, Chenfeng Xu, Thomas Tian, Xinghao Zhu, Yao Mu, Lingfeng Sun, Masayoshi Tomizuka, Wei Zhan

Abstract:Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We advocate that such a representation automatically arises from simultaneously learning about multiple simple perceptual skills that are critical for everyday scenarios (e.g., hand detection, state estimate, etc.) and is better suited for learning robot manipulation policies compared to current state-of-the-art visual representations purely based on self-supervised objectives. We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders, where each task is a perceptual skill tied to human-environment interactions. We introduce Task Fusion Decoder as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks. Extensive experiments across a range of robotic tasks and embodiments, in both simulations and real-world environments, show that our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders including R3M, MVP, and EgoVLP, for downstream manipulation policy-learning. Project page: https://sites.google.com/view/human-oriented-robot-learning

Via

Access Paper or Ask Questions

Generalizable Long-Horizon Manipulations with Large Language Models

Oct 03, 2023

Haoyu Zhou, Mingyu Ding, Weikun Peng, Masayoshi Tomizuka, Lin Shao, Chuang Gan

Figure 1 for Generalizable Long-Horizon Manipulations with Large Language Models

Figure 2 for Generalizable Long-Horizon Manipulations with Large Language Models

Abstract:This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks. These task conditions serve as guides for the generation and adjustment of Dynamic Movement Primitives (DMP) trajectories for long-horizon task execution. We further create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation. Extensive experiments in both simulated and real-world environments demonstrate the effectiveness of our framework on both familiar tasks involving new objects and novel but related tasks, highlighting the potential of LLMs in enhancing robotic system versatility and adaptability. Project website: https://object814.github.io/Task-Condition-With-LLM/

Via

Access Paper or Ask Questions

RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

Oct 03, 2023

Tong Zhao, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Yintao Wei

Figure 1 for RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

Figure 2 for RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

Figure 3 for RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

Figure 4 for RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

Abstract:This paper addresses the growing demands for safety and comfort in intelligent robot systems, particularly autonomous vehicles, where road conditions play a pivotal role in overall driving performance. For example, reconstructing road surfaces helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems. We introduce the Road Surface Reconstruction Dataset (RSRD), a real-world, high-resolution, and high-precision dataset collected with a specialized platform in diverse driving conditions. It covers common road types containing approximately 16,000 pairs of stereo images, original point clouds, and ground-truth depth/disparity maps, with accurate post-processing pipelines to ensure its quality. Based on RSRD, we further build a comprehensive benchmark for recovering road profiles through depth estimation and stereo matching. Preliminary evaluations with various state-of-the-art methods reveal the effectiveness of our dataset and the challenge of the task, underscoring substantial opportunities of RSRD as a valuable resource for advancing techniques, e.g., multi-view stereo towards safe autonomous driving. The dataset and demo videos are available at https://thu-rsxd.com/rsrd/

Via

Access Paper or Ask Questions