Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Raphael Memmesheimer

Efficient Image Annotation via Semi-Supervised Object Segmentation with Label Propagation

Apr 24, 2026

Vitalii Tutevych, Raphael Memmesheimer, Luca Eichler, Dmytro Pavlichenko, Fynn Schilke, Rodja Krudewig, Sven Behnke

Abstract:Reliable object perception is necessary for general-purpose service robots. Open-vocabulary detectors struggle to generalize beyond a few classes and fully supervised training of object detectors requires time-intensive annotations. We present a semi-supervised label propagation approach for household object segmentation. A segment proposer generates class-agnostic masks, and an ensemble of Hopfield networks assigns labels by learning representative embeddings in complementary foundation model embedding spaces (CLIP, ViT, Theia). Our approach scales to 50 object classes with limited annotation overhead and can automatically label 60% of the data in a RoboCup@Home setting, where preparation time is severely constrained. Dataset and code are publicly available at https://github.com/ais-bonn/label_propagation.

* 12 pages, 6 figures, 7 tables, submitted to RoboCup 2026 Symposium

Via

Access Paper or Ask Questions

OMCL: Open-vocabulary Monte Carlo Localization

Dec 17, 2025

Evgenii Kruzhkov, Raphael Memmesheimer, Sven Behnke

Abstract:Robust robot localization is an important prerequisite for navigation planning. If the environment map was created from different sensors, robot measurements must be robustly associated with map features. In this work, we extend Monte Carlo Localization using vision-language features. These open-vocabulary features enable to robustly compute the likelihood of visual observations, given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds. The abstract vision-language features enable to associate observations and map elements from different modalities. Global localization can be initialized by natural language descriptions of the objects present in the vicinity of locations. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.

* Accepted to IEEE RA-L

Via

Access Paper or Ask Questions

EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Nov 17, 2025

Jonas Bode, Raphael Memmesheimer, Sven Behnke

Figure 1 for EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Figure 2 for EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Figure 3 for EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Abstract:Acting in human environments is a crucial capability for general-purpose robots, necessitating a robust understanding of natural language and its application to physical tasks. This paper seeks to harness the capabilities of diffusion models within a visuomotor policy framework that merges visual and textual inputs to generate precise robotic trajectories. By employing reference demonstrations during training, the model learns to execute manipulation tasks specified through textual commands within the robot's immediate environment. The proposed research aims to extend an existing model by leveraging improved embeddings, and adapting techniques from diffusion models for image generation. We evaluate our methods on the CALVIN dataset, proving enhanced performance on various manipulation tasks and an increased long-horizon success rate when multiple tasks are executed in sequence. Our approach reinforces the usefulness of diffusion models and contributes towards general multitask manipulation.

* 10 pages; 2 figures; 1 table. Prprint submitted to the European Robotics Forum 2026

Via

Access Paper or Ask Questions

LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

Mar 15, 2025

Yihao Wang, Raphael Memmesheimer, Sven Behnke

Figure 1 for LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

Figure 2 for LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

Figure 3 for LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

Figure 4 for LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

Abstract:The availability of large language models and open-vocabulary object perception methods enables more flexibility for domestic service robots. The large variability of domestic tasks can be addressed without implementing each task individually by providing the robot with a task description along with appropriate environment information. In this work, we propose LIAM - an end-to-end model that predicts action transcripts based on language, image, action, and map inputs. Language and image inputs are encoded with a CLIP backbone, for which we designed two pre-training tasks to fine-tune its weights and pre-align the latent spaces. We evaluate our method on the ALFRED dataset, a simulator-generated benchmark for domestic tasks. Our results demonstrate the importance of pre-aligning embedding spaces from different modalities and the efficacy of incorporating semantic maps.

Via

Access Paper or Ask Questions

RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning

Dec 19, 2024

Raphael Memmesheimer, Jan Nogga, Bastian Pätzold, Evgenii Kruzhkov, Simon Bultmann, Michael Schreiber, Jonas Bode, Bertan Karacora, Juhui Park, Alena Savinykh(+1 more)

Figure 1 for RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning

Figure 2 for RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning

Figure 3 for RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning

Figure 4 for RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning

Abstract:We present the approaches and contributions of the winning team NimbRo@Home at the RoboCup@Home 2024 competition in the Open Platform League held in Eindhoven, NL. Further, we describe our hardware setup and give an overview of the results for the task stages and the final demonstration. For this year's competition, we put a special emphasis on open-vocabulary object segmentation and grasping approaches that overcome the labeling overhead of supervised vision approaches, commonly used in RoboCup@Home. We successfully demonstrated that we can segment and grasp non-labeled objects by text descriptions. Further, we extensively employed LLMs for natural language understanding and task planning. Throughout the competition, our approaches showed robustness and generalization capabilities. A video of our performance can be found online.

* 12 pages, 8 figures, RoboCup 2024 Champion Paper

Via

Access Paper or Ask Questions

Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors

Nov 17, 2024

Svetlana Seliunina, Artem Otelepko, Raphael Memmesheimer, Sven Behnke

Figure 1 for Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors

Figure 2 for Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors

Figure 3 for Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors

Figure 4 for Person Segmentation and Action Classification for Multi-Channel Hemisphere Field of View LiDAR Sensors

Abstract:Robots need to perceive persons in their surroundings for safety and to interact with them. In this paper, we present a person segmentation and action classification approach that operates on 3D scans of hemisphere field of view LiDAR sensors. We recorded a data set with an Ouster OSDome-64 sensor consisting of scenes where persons perform three different actions and annotated it. We propose a method based on a MaskDINO model to detect and segment persons and to recognize their actions from combined spherical projected multi-channel representations of the LiDAR data with an additional positional encoding. Our approach demonstrates good performance for the person segmentation task and further performs well for the estimation of the person action states walking, waving, and sitting. An ablation study provides insights about the individual channel contributions for the person segmentation task. The trained models, code and dataset are made publicly available.

* 6 pages, 9 figures, 4 tables, accepted for publication at IEEE/SICE International Symposium on System Integration (SII), Munich, Germany, January 2025

Via

Access Paper or Ask Questions

A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service Robotics

Oct 30, 2024

Jonas Bode, Bastian Pätzold, Raphael Memmesheimer, Sven Behnke

Figure 1 for A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service Robotics

Figure 2 for A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service Robotics

Figure 3 for A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service Robotics

Figure 4 for A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service Robotics

Abstract:Recent advances in LLM have been instrumental in autonomous robot control and human-robot interaction by leveraging their vast general knowledge and capabilities to understand and reason across a wide range of tasks and scenarios. Previous works have investigated various prompt engineering techniques for improving the performance of \glspl{LLM} to accomplish tasks, while others have proposed methods that utilize LLMs to plan and execute tasks based on the available functionalities of a given robot platform. In this work, we consider both lines of research by comparing prompt engineering techniques and combinations thereof within the application of high-level task planning and execution in service robotics. We define a diverse set of tasks and a simple set of functionalities in simulation, and measure task completion accuracy and execution time for several state-of-the-art models.

* 6 pages, 3 figures, 2 tables, to be published in the 2024 IEEE-RAS International Conference on Humanoid Robots, We make our code, including all prompts, available at https://github.com/AIS-Bonn/Prompt_Engineering

Via

Access Paper or Ask Questions

Anticipating Human Behavior for Safe Navigation and Efficient Collaborative Manipulation with Mobile Service Robots

Oct 07, 2024

Simon Bultmann, Raphael Memmesheimer, Jan Nogga, Julian Hau, Sven Behnke

Figure 1 for Anticipating Human Behavior for Safe Navigation and Efficient Collaborative Manipulation with Mobile Service Robots

Figure 2 for Anticipating Human Behavior for Safe Navigation and Efficient Collaborative Manipulation with Mobile Service Robots

Figure 3 for Anticipating Human Behavior for Safe Navigation and Efficient Collaborative Manipulation with Mobile Service Robots

Figure 4 for Anticipating Human Behavior for Safe Navigation and Efficient Collaborative Manipulation with Mobile Service Robots

Abstract:The anticipation of human behavior is a crucial capability for robots to interact with humans safely and efficiently. We employ a smart edge sensor network to provide global observations along with future predictions and goal information to integrate anticipatory behavior for the control of a mobile manipulation robot. We present approaches to anticipate human behavior in the context of safe navigation and a collaborative mobile manipulation task. First, we anticipate human motion by employing projections of human trajectories from smart edge sensor network observations into the planning map of a mobile robot. Second, we anticipate human intentions in a collaborative furniture-carrying task to achieve a given goal. Our experiments indicate that anticipating human behavior allows for safer navigation and more efficient collaboration. Finally, we showcase an integrated system that anticipates human behavior and collaborates with a human to achieve a target room layout, including the placement of tables and chairs.

Via

Access Paper or Ask Questions

Self-centering 3-DOF feet controller for hands-free locomotion control in telepresence and virtual reality

Aug 05, 2024

Raphael Memmesheimer, Christian Lenz, Max Schwarz, Michael Schreiber, Sven Behnke

Figure 1 for Self-centering 3-DOF feet controller for hands-free locomotion control in telepresence and virtual reality

Figure 2 for Self-centering 3-DOF feet controller for hands-free locomotion control in telepresence and virtual reality

Figure 3 for Self-centering 3-DOF feet controller for hands-free locomotion control in telepresence and virtual reality

Figure 4 for Self-centering 3-DOF feet controller for hands-free locomotion control in telepresence and virtual reality

Abstract:We present a novel seated foot controller for handling 3-DOF aimed to control locomotion for telepresence robotics and virtual reality environments. Tilting the feet on two axes yields in forward, backward and sideways motion. In addition, a separate rotary joint allows for rotation around the vertical axis. Attached springs on all joints self-center the controller. The HTC Vive tracker is used to translate the trackers' orientation into locomotion commands. The proposed self-centering foot controller was used successfully for the ANA Avatar XPRIZE competition, where a naive operator traversed the robot through a longer distance, surpassing obstacles while solving various interaction and manipulation tasks in between. We publicly provide the models of the mostly 3D-printed feet controller for reproduction.

* 4 pages, 7 figures, submitted to 2024 IEEE International Conference on Telepresence (Tele 2024)

Via

Access Paper or Ask Questions

Cleaning Robots in Public Spaces: A Survey and Proposal for Benchmarking Based on Stakeholders Interviews

Jul 23, 2024

Raphael Memmesheimer, Martina Overbeck, Bjoern Kral, Lea Steffen, Sven Behnke, Martin Gersch, Arne Roennau

Abstract:Autonomous cleaning robots for public spaces have potential for addressing current societal challenges, such as labor shortages and cleanliness in public spaces. Other application domains like autonomous driving, bin picking, or search and rescue have shown that benchmarking platforms and approaches in competitive settings can advance their respective research fields, resulting in more applicable systems under real-world conditions. For this paper, we analyzed seven semi-structured, qualitative stakeholder interviews about outdoor cleaning, identified current needs as well as limitations, and considered those results for the development of a benchmarking scenario based on the previous observations.

* 12 pages, 3 figures, 4 tables, RoboCup Symposium 2024

Via

Access Paper or Ask Questions