Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Danica Kragic

CAPGrasp: An $\mathbb{R}^3\times \text{SO-equivariant}$ Continuous Approach-Constrained Generative Grasp Sampler

Oct 18, 2023

Zehang Weng, Haofei Lu, Jens Lundell, Danica Kragic

$Figure 1 for CAPGrasp: An $\mathbb{R}^3\times \text{SO-equivariant}$ Continuous Approach-Constrained Generative Grasp Sampler$

$Figure 2 for CAPGrasp: An $\mathbb{R}^3\times \text{SO-equivariant}$ Continuous Approach-Constrained Generative Grasp Sampler$

$Figure 3 for CAPGrasp: An $\mathbb{R}^3\times \text{SO-equivariant}$ Continuous Approach-Constrained Generative Grasp Sampler$

$Figure 4 for CAPGrasp: An $\mathbb{R}^3\times \text{SO-equivariant}$ Continuous Approach-Constrained Generative Grasp Sampler$

Abstract:We propose CAPGrasp, an $\mathbb{R}^3\times \text{SO(2)-equivariant}$ 6-DoF continuous approach-constrained generative grasp sampler. It includes a novel learning strategy for training CAPGrasp that eliminates the need to curate massive conditionally labeled datasets and a constrained grasp refinement technique that improves grasp poses while respecting the grasp approach directional constraints. The experimental results demonstrate that CAPGrasp is more than three times as sample efficient as unconstrained grasp samplers while achieving up to 38% grasp success rate improvement. CAPGrasp also achieves 4-10% higher grasp success rates than constrained but noncontinuous grasp samplers. Overall, CAPGrasp is a sample-efficient solution when grasps must originate from specific directions, such as grasping in confined spaces.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Music- and Lyrics-driven Dance Synthesis

Sep 30, 2023

Wenjie Yin, Qingyuan Yao, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

Figure 1 for Music- and Lyrics-driven Dance Synthesis

Figure 2 for Music- and Lyrics-driven Dance Synthesis

Abstract:Lyrics often convey information about the songs that are beyond the auditory dimension, enriching the semantic meaning of movements and musical themes. Such insights are important in the dance choreography domain. However, most existing dance synthesis methods mainly focus on music-to-dance generation, without considering the semantic information. To complement it, we introduce JustLMD, a new multimodal dataset of 3D dance motion with music and lyrics. To the best of our knowledge, this is the first dataset with triplet information including dance motion, music, and lyrics. Additionally, we showcase a cross-modal diffusion-based network designed to generate 3D dance motion conditioned on music and lyrics. The proposed JustLMD dataset encompasses 4.6 hours of 3D dance motion in 1867 sequences, accompanied by musical tracks and their corresponding English lyrics.

Via

Access Paper or Ask Questions

Learning Geometric Representations of Objects via Interaction

Sep 11, 2023

Alfredo Reichlin, Giovanni Luca Marchetti, Hang Yin, Anastasiia Varava, Danica Kragic

Figure 1 for Learning Geometric Representations of Objects via Interaction

Figure 2 for Learning Geometric Representations of Objects via Interaction

Figure 3 for Learning Geometric Representations of Objects via Interaction

Figure 4 for Learning Geometric Representations of Objects via Interaction

Abstract:We address the problem of learning representations from observations of a scene involving an agent and an external object the agent interacts with. To this end, we propose a representation learning framework extracting the location in physical space of both the agent and the object from unstructured observations of arbitrary nature. Our framework relies on the actions performed by the agent as the only source of supervision, while assuming that the object is displaced by the agent via unknown dynamics. We provide a theoretical foundation and formally prove that an ideal learner is guaranteed to infer an isometric representation, disentangling the agent from the object and correctly extracting their locations. We evaluate empirically our framework on a variety of scenarios, showing that it outperforms vision-based approaches such as a state-of-the-art keypoint extractor. We moreover demonstrate how the extracted representations enable the agent to solve downstream tasks via reinforcement learning in an efficient manner.

Via

Access Paper or Ask Questions

Enabling Robot Manipulation of Soft and Rigid Objects with Vision-based Tactile Sensors

Jun 09, 2023

Michael C. Welle, Martina Lippi, Haofei Lu, Jens Lundell, Andrea Gasparri, Danica Kragic

Figure 1 for Enabling Robot Manipulation of Soft and Rigid Objects with Vision-based Tactile Sensors

Figure 2 for Enabling Robot Manipulation of Soft and Rigid Objects with Vision-based Tactile Sensors

Figure 3 for Enabling Robot Manipulation of Soft and Rigid Objects with Vision-based Tactile Sensors

Figure 4 for Enabling Robot Manipulation of Soft and Rigid Objects with Vision-based Tactile Sensors

Abstract:Endowing robots with tactile capabilities opens up new possibilities for their interaction with the environment, including the ability to handle fragile and/or soft objects. In this work, we equip the robot gripper with low-cost vision-based tactile sensors and propose a manipulation algorithm that adapts to both rigid and soft objects without requiring any knowledge of their properties. The algorithm relies on a touch and slip detection method, which considers the variation in the tactile images with respect to reference ones. We validate the approach on seven different objects, with different properties in terms of rigidity and fragility, to perform unplugging and lifting tasks. Furthermore, to enhance applicability, we combine the manipulation algorithm with a grasp sampler for the task of finding and picking a grape from a bunch without damaging~it.

* Published in IEEE International Conference on Automation Science and Engineering (CASE2023)

Via

Access Paper or Ask Questions

TD-GEM: Text-Driven Garment Editing Mapper

May 29, 2023

Reza Dadfar, Sanaz Sabzevari, Mårten Björkman, Danica Kragic

Abstract:Language-based fashion image editing allows users to try out variations of desired garments through provided text prompts. Inspired by research on manipulating latent representations in StyleCLIP and HairCLIP, we focus on these latent spaces for editing fashion items of full-body human datasets. Currently, there is a gap in handling fashion image editing due to the complexity of garment shapes and textures and the diversity of human poses. In this paper, we propose an editing optimizer scheme method called Text-Driven Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled way. To this end, we initially obtain a latent representation of an image through generative adversarial network inversions such as Encoder for Editing (e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An optimization-based Contrasive Language-Image Pre-training (CLIP) is then utilized to guide the latent representation of a fashion image in the direction of a target attribute expressed in terms of a text prompt. Our TD-GEM manipulates the image accurately according to the target attribute, while other parts of the image are kept untouched. In the experiments, we evaluate TD-GEM on two different attributes (i.e., "color" and "sleeve length"), which effectively generates realistic images compared to the recent manipulation schemes.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

A Virtual Reality Framework for Human-Robot Collaboration in Cloth Folding

May 12, 2023

Marco Moletta, Maciej K. Wozniak, Michael C. Welle, Danica Kragic

Abstract:We present a virtual reality (VR) framework to automate the data collection process in cloth folding tasks. The framework uses skeleton representations to help the user define the folding plans for different classes of garments, allowing for replicating the folding on unseen items of the same class. We evaluate the framework in the context of automating garment folding tasks. A quantitative analysis is performed on 3 classes of garments, demonstrating that the framework reduces the need for intervention by the user. We also compare skeleton representations with RGB and binary images in a classification task on a large dataset of clothing items, motivating the use of the framework for other classes of garments.

Via

Access Paper or Ask Questions

Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Apr 03, 2023

Wenjie Yin, Ruibo Tu, Hang Yin, Danica Kragic, Hedvig Kjellström, Mårten Björkman

Figure 1 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Figure 2 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Figure 3 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Figure 4 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Abstract:Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Via

Access Paper or Ask Questions

Ensemble Latent Space Roadmap for Improved Robustness in Visual Action Planning

Mar 27, 2023

Martina Lippi, Michael C. Welle, Andrea Gasparri, Danica Kragic

Abstract:Planning in learned latent spaces helps to decrease the dimensionality of raw observations. In this work, we propose to leverage the ensemble paradigm to enhance the robustness of latent planning systems. We rely on our Latent Space Roadmap (LSR) framework, which builds a graph in a learned structured latent space to perform planning. Given multiple LSR framework instances, that differ either on their latent spaces or on the parameters for constructing the graph, we use the action information as well as the embedded nodes of the produced plans to define similarity measures. These are then utilized to select the most promising plans. We validate the performance of our Ensemble LSR (ENS-LSR) on simulated box stacking and grape harvesting tasks as well as on a real-world robotic T-shirt folding experiment.

Via

Access Paper or Ask Questions

GoNet: An Approach-Constrained Generative Grasp Sampling Network

Mar 14, 2023

Zehang Weng, Haofei Lu, Jens Lundell, Danica Kragic

Abstract:Constraining the approach direction of grasps is important when picking objects in confined spaces, such as when emptying a shelf. Yet, such capabilities are not available in state-of-the-art data-driven grasp sampling methods that sample grasps all around the object. In this work, we address the specific problem of training approach-constrained data-driven grasp samplers and how to generate good grasping directions automatically. Our solution is GoNet: a generative grasp sampler that can constrain the grasp approach direction to lie close to a specified direction. This is achieved by discretizing SO(3) into bins and training GoNet to generate grasps from those bins. At run-time, the bin aligning with the second largest principal component of the observed point cloud is selected. GoNet is benchmarked against GraspNet, a state-of-the-art unconstrained grasp sampler, in an unconfined grasping experiment in simulation and on an unconfined and confined grasping experiment in the real world. The results demonstrate that GoNet achieves higher success-over-coverage in simulation and a 12%-18% higher success rate in real-world table-picking and shelf-picking tasks than the baseline.

* IROS 2023 submission

Via

Access Paper or Ask Questions

Equivariant Representations for Non-Free Group Actions

Jan 12, 2023

Luis Armando Pérez Rey, Giovanni Luca Marchetti, Danica Kragic, Dmitri Jarnikov, Mike Holenderski

Abstract:We introduce a method for learning representations that are equivariant with respect to general group actions over data. Differently from existing equivariant representation learners, our method is suitable for actions that are not free i.e., that stabilize data via nontrivial symmetries. Our method is grounded in the orbit-stabilizer theorem from group theory, which guarantees that an ideal learner infers an isomorphic representation. Finally, we provide an empirical investigation on image datasets with rotational symmetries and show that taking stabilizers into account improves the quality of the representations.

* NeurIPS Workshop on Symmetry and Geometry in Neural Representations

Via

Access Paper or Ask Questions