Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Danica Kragic

Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies

Sep 30, 2024

Ruiyu Wang, Zheyu Zhuang, Shutong Jin, Nils Ingelhag, Danica Kragic, Florian T. Pokorny

Figure 1 for Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies

Figure 2 for Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies

Figure 3 for Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies

Figure 4 for Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies

Abstract:An end-to-end (E2E) visuomotor policy is typically treated as a unified whole, but recent approaches using out-of-domain (OOD) data to pretrain the visual encoder have cleanly separated the visual encoder from the network, with the remainder referred to as the policy. We propose Visual Alignment Testing, an experimental framework designed to evaluate the validity of this functional separation. Our results indicate that in E2E-trained models, visual encoders actively contribute to decision-making resulting from motor data supervision, contradicting the assumed functional separation. In contrast, OOD-pretrained models, where encoders lack this capability, experience an average performance drop of 42% in our benchmark results, compared to the state-of-the-art performance achieved by E2E policies. We believe this initial exploration of visual encoders' role can provide a first step towards guiding future pretraining methods to address their decision-making ability, such as developing task-conditioned or context-aware encoders.

Via

Access Paper or Ask Questions

Relative Representations: Topological and Geometric Perspectives

Sep 17, 2024

Alejandro García-Castellanos, Giovanni Luca Marchetti, Danica Kragic, Martina Scolamiero

Abstract:Relative representations are an established approach to zero-shot model stitching, consisting of a non-trainable transformation of the latent space of a deep neural network. Based on insights of topological and geometric nature, we propose two improvements to relative representations. First, we introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations. The latter coincides with the symmetries in parameter space induced by common activation functions. Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes. We provide an empirical investigation on a natural language task, where both the proposed variations yield improved performance on zero-shot model stitching.

Via

Access Paper or Ask Questions

The 1st InterAI Workshop: Interactive AI for Human-centered Robotics

Sep 17, 2024

Yuchong Zhang, Elmira Yadollahi, Yong Ma, Di Fu, Iolanda Leite, Danica Kragic

Abstract:The workshop is affiliated with 33nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2024) August 26~30, 2023 / Pasadena, CA, USA. It is designed as a half-day event, extending over four hours from 9:00 to 12:30 PST time. It accommodates both in-person and virtual attendees (via Zoom), ensuring a flexible participation mode. The agenda is thoughtfully crafted to include a diverse range of sessions: two keynote speeches that promise to provide insightful perspectives, two dedicated paper presentation sessions, an interactive panel discussion to foster dialogue among experts which facilitates deeper dives into specific topics, and a 15-minute coffee break. The workshop website: https://sites.google.com/view/interaiworkshops/home.

Via

Access Paper or Ask Questions

Puppeteer Your Robot: Augmented Reality Leader-Follower Teleoperation

Jul 16, 2024

Jonne van Haastregt, Michael C. Welle, Yuchong Zhang, Danica Kragic

Figure 1 for Puppeteer Your Robot: Augmented Reality Leader-Follower Teleoperation

Figure 2 for Puppeteer Your Robot: Augmented Reality Leader-Follower Teleoperation

Figure 3 for Puppeteer Your Robot: Augmented Reality Leader-Follower Teleoperation

Figure 4 for Puppeteer Your Robot: Augmented Reality Leader-Follower Teleoperation

Abstract:High-quality demonstrations are necessary when learning complex and challenging manipulation tasks. In this work, we introduce an approach to puppeteer a robot by controlling a virtual robot in an augmented reality setting. Our system allows for retaining the advantages of being intuitive from a physical leader-follower side while avoiding the unnecessary use of expensive physical setup. In addition, the user is endowed with additional information using augmented reality. We validate our system with a pilot study n=10 on a block stacking and rice scooping tasks where the majority rates the system favorably. Oculus App and corresponding ROS code are available on the project website: https://ar-puppeteer.github.io/

Via

Access Paper or Ask Questions

Unfolding the Literature: A Review of Robotic Cloth Manipulation

Jul 01, 2024

Alberta Longhini, Yufei Wang, Irene Garcia-Camacho, David Blanco-Mulero, Marco Moletta, Michael Welle, Guillem Alenyà, Hang Yin, Zackory Erickson, David Held(+2 more)

Figure 1 for Unfolding the Literature: A Review of Robotic Cloth Manipulation

Figure 2 for Unfolding the Literature: A Review of Robotic Cloth Manipulation

Figure 3 for Unfolding the Literature: A Review of Robotic Cloth Manipulation

Figure 4 for Unfolding the Literature: A Review of Robotic Cloth Manipulation

Abstract:The realm of textiles spans clothing, households, healthcare, sports, and industrial applications. The deformable nature of these objects poses unique challenges that prior work on rigid objects cannot fully address. The increasing interest within the community in textile perception and manipulation has led to new methods that aim to address challenges in modeling, perception, and control, resulting in significant progress. However, this progress is often tailored to one specific textile or a subcategory of these textiles. To understand what restricts these methods and hinders current approaches from generalizing to a broader range of real-world textiles, this review provides an overview of the field, focusing specifically on how and to what extent textile variations are addressed in modeling, perception, benchmarking, and manipulation of textiles. We finally conclude by identifying key open problems and outlining grand challenges that will drive future advancements in the field.

* 30 pages, 3 figures, 2 tables. Submitted to Annual Review of Control, Robotics, and Autonomous Systems

Via

Access Paper or Ask Questions

Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot Interaction

Apr 23, 2024

Yuchong Zhang, Yong Ma, Danica Kragic

Abstract:The emergence of Large Vision Models (LVMs) is following in the footsteps of the recent prosperity of Large Language Models (LLMs) in following years. However, there's a noticeable gap in structured research applying LVMs to Human-Robot Interaction (HRI), despite extensive evidence supporting the efficacy of vision models in enhancing interactions between humans and robots. Recognizing the vast and anticipated potential, we introduce an initial design space that incorporates domain-specific LVMs, chosen for their superior performance over normal models. We delve into three primary dimensions: HRI contexts, vision-based tasks, and specific domains. The empirical validation was implemented among 15 experts across six evaluated metrics, showcasing the primary efficacy in relevant decision-making scenarios. We explore the process of ideation and potential application scenarios, envisioning this design space as a foundational guideline for future HRI system design, emphasizing accurate domain alignment and model selection.

Via

Access Paper or Ask Questions

Hyperbolic Delaunay Geometric Alignment

Apr 12, 2024

Aniss Aiman Medbouhi, Giovanni Luca Marchetti, Vladislav Polianskii, Alexander Kravberg, Petra Poklukar, Anastasia Varava, Danica Kragic

Abstract:Hyperbolic machine learning is an emerging field aimed at representing data with a hierarchical structure. However, there is a lack of tools for evaluation and analysis of the resulting hyperbolic data representations. To this end, we propose Hyperbolic Delaunay Geometric Alignment (HyperDGA) -- a similarity score for comparing datasets in a hyperbolic space. The core idea is counting the edges of the hyperbolic Delaunay graph connecting datapoints across the given sets. We provide an empirical investigation on synthetic and real-life biological data and demonstrate that HyperDGA outperforms the hyperbolic version of classical distances between sets. Furthermore, we showcase the potential of HyperDGA for evaluating latent representations inferred by a Hyperbolic Variational Auto-Encoder.

Via

Access Paper or Ask Questions

Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric Topics

Mar 27, 2024

Yuchong Zhang, Miguel Vasco, Mårten Björkman, Danica Kragic

Figure 1 for Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric Topics

Figure 2 for Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric Topics

Figure 3 for Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric Topics

Figure 4 for Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric Topics

Abstract:This paper presents findings from an exploratory needfinding study investigating the research current status and potential participation of the competitions on the robotics community towards four human-centric topics: safety, privacy, explainability, and federated learning. We conducted a survey with 34 participants across three distinguished European robotics consortia, nearly 60% of whom possessed over five years of research experience in robotics. Our qualitative and quantitative analysis revealed that current mainstream robotic researchers prioritize safety and explainability, expressing a greater willingness to invest in further research in these areas. Conversely, our results indicate that privacy and federated learning garner less attention and are perceived to have lower potential. Additionally, the study suggests a lack of enthusiasm within the robotics community for participating in competitions related to these topics. Based on these findings, we recommend targeting other communities, such as the machine learning community, for future competitions related to these four human-centric topics.

* International Conference on Human-Computer Interaction (HCII) 2024

Via

Access Paper or Ask Questions

A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Mar 25, 2024

Nils Ingelhag, Jesper Munkeby, Jonne van Haastregt, Anastasia Varava, Michael C. Welle, Danica Kragic

Figure 1 for A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Figure 2 for A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Figure 3 for A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Figure 4 for A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Abstract:In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end as well as give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website.

* https://roboskillframework.github.io

Via

Access Paper or Ask Questions

Visual Action Planning with Multiple Heterogeneous Agents

Mar 25, 2024

Martina Lippi, Michael C. Welle, Marco Moletta, Alessandro Marino, Andrea Gasparri, Danica Kragic

Abstract:Visual planning methods are promising to handle complex settings where extracting the system state is challenging. However, none of the existing works tackles the case of multiple heterogeneous agents which are characterized by different capabilities and/or embodiment. In this work, we propose a method to realize visual action planning in multi-agent settings by exploiting a roadmap built in a low-dimensional structured latent space and used for planning. To enable multi-agent settings, we infer possible parallel actions from a dataset composed of tuples associated with individual actions. Next, we evaluate feasibility and cost of them based on the capabilities of the multi-agent system and endow the roadmap with this information, building a capability latent space roadmap (C-LSR). Additionally, a capability suggestion strategy is designed to inform the human operator about possible missing capabilities when no paths are found. The approach is validated in a simulated burger cooking task and a real-world box packing task.

Via

Access Paper or Ask Questions