Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seongwoong Cho

Meta-Controller: Few-Shot Imitation of Unseen Embodiments and Tasks in Continuous Control

Dec 10, 2024

Seongwoong Cho, Donggyun Kim, Jinwoo Lee, Seunghoon Hong

Figure 1 for Meta-Controller: Few-Shot Imitation of Unseen Embodiments and Tasks in Continuous Control

Figure 2 for Meta-Controller: Few-Shot Imitation of Unseen Embodiments and Tasks in Continuous Control

Figure 3 for Meta-Controller: Few-Shot Imitation of Unseen Embodiments and Tasks in Continuous Control

Figure 4 for Meta-Controller: Few-Shot Imitation of Unseen Embodiments and Tasks in Continuous Control

Abstract:Generalizing across robot embodiments and tasks is crucial for adaptive robotic systems. Modular policy learning approaches adapt to new embodiments but are limited to specific tasks, while few-shot imitation learning (IL) approaches often focus on a single embodiment. In this paper, we introduce a few-shot behavior cloning framework to simultaneously generalize to unseen embodiments and tasks using a few (\emph{e.g.,} five) reward-free demonstrations. Our framework leverages a joint-level input-output representation to unify the state and action spaces of heterogeneous embodiments and employs a novel structure-motion state encoder that is parameterized to capture both shared knowledge across all embodiments and embodiment-specific knowledge. A matching-based policy network then predicts actions from a few demonstrations, producing an adaptive policy that is robust to over-fitting. Evaluated in the DeepMind Control suite, our framework termed \modelname{} demonstrates superior few-shot generalization to unseen embodiments and tasks over modular policy learning and few-shot IL approaches. Codes are available at \href{https://github.com/SeongwoongCho/meta-controller}{https://github.com/SeongwoongCho/meta-controller}.

* NeurIPS 2024

Via

Access Paper or Ask Questions

Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Apr 29, 2024

Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong

Figure 1 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Figure 2 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Figure 3 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Figure 4 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Abstract:Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime is not straightforward and has received less attention from previous vision generalists. In this study, we explore a universal model that can flexibly adapt to unseen dense label structures with a few examples, enabling it to serve as a data-efficient vision generalist in diverse real-world scenarios. To this end, we base our method on a powerful meta-learning framework and explore several axes to improve its performance and versatility for real-world problems, such as flexible adaptation mechanisms and scalability. We evaluate our model across a spectrum of unseen real-world scenarios where low-shot learning is desirable, including video, 3D, medical, biological, and user-interactive tasks. Equipped with a generic architecture and an effective adaptation mechanism, our model flexibly adapts to all of these tasks with at most 50 labeled images, showcasing a significant advancement over existing data-efficient generalist approaches. Codes are available at https://github.com/GitGyun/chameleon.

Via

Access Paper or Ask Questions

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Mar 27, 2023

Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong

Figure 1 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Figure 2 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Figure 3 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Figure 4 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Abstract:Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision. Codes are available at https://github.com/GitGyun/visual_token_matching.

Via

Access Paper or Ask Questions

Multi-Task Processes

Oct 29, 2021

Donggyun Kim, Seongwoong Cho, Wonkwang Lee, Seunghoon Hong

Abstract:Neural Processes (NPs) consider a task as a function realized from a stochastic process and flexibly adapt to unseen tasks through inference on functions. However, naive NPs can model data from only a single stochastic process and are designed to infer each task independently. Since many real-world data represent a set of correlated tasks from multiple sources (e.g., multiple attributes and multi-sensor data), it is beneficial to infer them jointly and exploit the underlying correlation to improve the predictive performance. To this end, we propose Multi-Task Processes (MTPs), an extension of NPs designed to jointly infer tasks realized from multiple stochastic processes. We build our MTPs in a hierarchical manner such that inter-task correlation is considered by conditioning all per-task latent variables on a single global latent variable. In addition, we further design our MTPs so that they can address multi-task settings with incomplete data (i.e., not all tasks share the same set of input points), which has high practical demands in various applications. Experiments demonstrate that MTPs can successfully model multiple tasks jointly by discovering and exploiting their correlations in various real-world data such as time series of weather attributes and pixel-aligned visual modalities.

* 33 pages, 13 figures

Via

Access Paper or Ask Questions