Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dzmitry Tsetserukou

Space Center, Skolkovo Institute of Science and Technology

SafeSwarm: Decentralized Safe RL for the Swarm of Drones Landing in Dense Crowds

Jan 13, 2025

Grik Tadevosyan, Maksim Osipenko, Demetros Aschu, Aleksey Fedoseev, Valerii Serpiva, Oleg Sautenkov, Sausar Karaf, Dzmitry Tsetserukou

Abstract:This paper introduces a safe swarm of drones capable of performing landings in crowded environments robustly by relying on Reinforcement Learning techniques combined with Safe Learning. The developed system allows us to teach the swarm of drones with different dynamics to land on moving landing pads in an environment while avoiding collisions with obstacles and between agents. The safe barrier net algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves landing accuracy of 2.25 cm with a mean time of 17 s and collision-free landings, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for applications in environments where safety and precision are paramount.

Via

Access Paper or Ask Questions

Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing

Jan 12, 2025

Muhamamd Haris Khan, Selamawit Asfaw, Dmitrii Iarchuk, Miguel Altamirano Cabrera, Luis Moreno, Issatay Tokmurziyev, Dzmitry Tsetserukou

Figure 1 for Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing

Figure 2 for Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing

Figure 3 for Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing

Abstract:This paper introduces Shake-VLA, a Vision-Language-Action (VLA) model-based system designed to enable bimanual robotic manipulation for automated cocktail preparation. The system integrates a vision module for detecting ingredient bottles and reading labels, a speech-to-text module for interpreting user commands, and a language model to generate task-specific robotic instructions. Force Torque (FT) sensors are employed to precisely measure the quantity of liquid poured, ensuring accuracy in ingredient proportions during the mixing process. The system architecture includes a Retrieval-Augmented Generation (RAG) module for accessing and adapting recipes, an anomaly detection mechanism to address ingredient availability issues, and bimanual robotic arms for dexterous manipulation. Experimental evaluations demonstrated a high success rate across system components, with the speech-to-text module achieving a 93% success rate in noisy environments, the vision module attaining a 91% success rate in object and label detection in cluttered environment, the anomaly module successfully identified 95% of discrepancies between detected ingredients and recipe requirements, and the system achieved an overall success rate of 100% in preparing cocktails, from recipe formulation to action generation.

* Accepted to IEEE/ACM HRI 2025

Via

Access Paper or Ask Questions

UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

Jan 09, 2025

Oleg Sautenkov, Yasheerah Yaqoot, Artem Lykov, Muhammad Ahsan Mustafa, Grik Tadevosyan, Aibek Akhmetkazy, Miguel Altamirano Cabrera, Mikhail Martynov, Sausar Karaf, Dzmitry Tsetserukou

Abstract:The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach.

* HRI 2025

Via

Access Paper or Ask Questions

Optimizing energy consumption for legged robot by adapting equilibrium position and stiffness of a parallel torsion spring

Nov 27, 2024

Danil Belov, Artem Erkhov, Farit Khabibullin, Elisaveta Pestova, Sergei Satsevich, Ilya Osokin, Pavel Osinenko, Dzmitry Tsetserukou

Abstract:This paper is dedicated to the development of a novel adaptive torsion spring mechanism for optimizing energy consumption in legged robots. By adjusting the equilibrium position and stiffness of the spring, the system improves energy efficiency during cyclic movements, such as walking and jumping. The adaptive compliance mechanism, consisting of a torsion spring combined with a worm gear driven by a servo actuator, compensates for motion-induced torque and reduces motor load. Simulation results demonstrate a significant reduction in power consumption, highlighting the effectiveness of this approach in enhancing robotic locomotion.

Via

Access Paper or Ask Questions

MissionGPT: Mission Planner for Mobile Robot based on Robotics Transformer Model

Nov 07, 2024

Vladimir Berman, Artem Bazhenov, Dzmitry Tsetserukou

Abstract:This paper presents a novel approach to building mission planners based on neural networks with Transformer architecture and Large Language Models (LLMs). This approach demonstrates the possibility of setting a task for a mobile robot and its successful execution without the use of perception algorithms, based only on the data coming from the camera. In this work, a success rate of more than 50\% was obtained for one of the basic actions for mobile robots. The proposed approach is of practical importance in the field of warehouse logistics robots, as in the future it may allow to eliminate the use of markings, LiDARs, beacons and other tools for robot orientation in space. In conclusion, this approach can be scaled for any type of robot and for any number of robots.

Via

Access Paper or Ask Questions

FlightAR: AR Flight Assistance Interface with Multiple Video Streams and Object Detection Aimed at Immersive Drone Control

Oct 22, 2024

Oleg Sautenkov, Selamawit Asfaw, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Aleksey Fedoseev, Daria Trinitatova, Dzmitry Tsetserukou

Abstract:The swift advancement of unmanned aerial vehicle (UAV) technologies necessitates new standards for developing human-drone interaction (HDI) interfaces. Most interfaces for HDI, especially first-person view (FPV) goggles, limit the operator's ability to obtain information from the environment. This paper presents a novel interface, FlightAR, that integrates augmented reality (AR) overlays of UAV first-person view (FPV) and bottom camera feeds with head-mounted display (HMD) to enhance the pilot's situational awareness. Using FlightAR, the system provides pilots not only with a video stream from several UAV cameras simultaneously, but also the ability to observe their surroundings in real time. User evaluation with NASA-TLX and UEQ surveys showed low physical demand ($\mu=1.8$, $SD = 0.8$) and good performance ($\mu=3.4$, $SD = 0.8$), proving better user assessments in comparison with baseline FPV goggles. Participants also rated the system highly for stimulation ($\mu=2.35$, $SD = 0.9$), novelty ($\mu=2.1$, $SD = 0.9$) and attractiveness ($\mu=1.97$, $SD = 1$), indicating positive user experiences. These results demonstrate the potential of the system to improve UAV piloting experience through enhanced situational awareness and intuitive control. The code is available here: https://github.com/Sautenich/FlightAR

* Manuscript accepted in IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2024)

Via

Access Paper or Ask Questions

Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation

Oct 10, 2024

Maria Makarova, Daria Trinitatova, Dzmitry Tsetserukou

Figure 1 for Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation

Figure 2 for Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation

Figure 3 for Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation

Figure 4 for Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation

Abstract:Many modern robotic systems operate autonomously, however they often lack the ability to accurately analyze the environment and adapt to changing external conditions, while teleoperation systems often require special operator skills. In the field of laboratory automation, the number of automated processes is growing, however such systems are usually developed to perform specific tasks. In addition, many of the objects used in this field are transparent, making it difficult to analyze them using visual channels. The contributions of this work include the development of a robotic framework with autonomous mode for manipulating liquid-filled objects with different degrees of transparency in complex pose combinations. The conducted experiments demonstrated the robustness of the designed visual perception system to accurately estimate object poses for autonomous manipulation, and confirmed the performance of the algorithms in dexterous operations such as liquid dispensing. The proposed robotic framework can be applied for laboratory automation, since it allows solving the problem of performing non-trivial manipulation tasks with the analysis of object poses of varying degrees of transparency and liquid levels, requiring high accuracy and repeatability.

* Accepted to the 2024 IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2024), 8 pages, 11 figures

Via

Access Paper or Ask Questions

SwarmPath: Drone Swarm Navigation through Cluttered Environments Leveraging Artificial Potential Field and Impedance Control

Oct 10, 2024

Roohan Ahmed Khan, Malaika Zafar, Amber Batool, Aleksey Fedoseev, Dzmitry Tsetserukou

Figure 1 for SwarmPath: Drone Swarm Navigation through Cluttered Environments Leveraging Artificial Potential Field and Impedance Control

Figure 2 for SwarmPath: Drone Swarm Navigation through Cluttered Environments Leveraging Artificial Potential Field and Impedance Control

Figure 3 for SwarmPath: Drone Swarm Navigation through Cluttered Environments Leveraging Artificial Potential Field and Impedance Control

Figure 4 for SwarmPath: Drone Swarm Navigation through Cluttered Environments Leveraging Artificial Potential Field and Impedance Control

Abstract:In the area of multi-drone systems, navigating through dynamic environments from start to goal while providing collision-free trajectory and efficient path planning is a significant challenge. To solve this problem, we propose a novel SwarmPath technology that involves the integration of Artificial Potential Field (APF) with Impedance Controller. The proposed approach provides a solution based on collision free leader-follower behaviour where drones are able to adapt themselves to the environment. Moreover, the leader is virtual while drones are physical followers leveraging APF path planning approach to find the smallest possible path to the target. Simultaneously, the drones dynamically adjust impedance links, allowing themselves to create virtual links with obstacles to avoid them. As compared to conventional APF, the proposed SwarmPath system not only provides smooth collision-avoidance but also enable agents to efficiently pass through narrow passages by reducing the total travel time by 30% while ensuring safety in terms of drones connectivity. Lastly, the results also illustrate that the discrepancies between simulated and real environment, exhibit an average absolute percentage error (APE) of 6% of drone trajectories. This underscores the reliability of our solution in real-world scenarios.

* Manuscript accepted in IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2024)

Via

Access Paper or Ask Questions

SharpSLAM: 3D Object-Oriented Visual SLAM with Deblurring for Agile Drones

Oct 07, 2024

Denis Davletshin, Iana Zhura, Vladislav Cheremnykh, Mikhail Rybiyanov, Aleksey Fedoseev, Dzmitry Tsetserukou

Figure 1 for SharpSLAM: 3D Object-Oriented Visual SLAM with Deblurring for Agile Drones

Figure 2 for SharpSLAM: 3D Object-Oriented Visual SLAM with Deblurring for Agile Drones

Figure 3 for SharpSLAM: 3D Object-Oriented Visual SLAM with Deblurring for Agile Drones

Figure 4 for SharpSLAM: 3D Object-Oriented Visual SLAM with Deblurring for Agile Drones

Abstract:The paper focuses on the algorithm for improving the quality of 3D reconstruction and segmentation in DSP-SLAM by enhancing the RGB image quality. SharpSLAM algorithm developed by us aims to decrease the influence of high dynamic motion on visual object-oriented SLAM through image deblurring, improving all aspects of object-oriented SLAM, including localization, mapping, and object reconstruction. The experimental results revealed noticeable improvement in object detection quality, with F-score increased from 82.9% to 86.2% due to the higher number of features and corresponding map points. The RMSE of signed distance function has also decreased from 17.2 to 15.4 cm. Furthermore, our solution has enhanced object positioning, with an increase in the IoU from 74.5% to 75.7%. SharpSLAM algorithm has the potential to highly improve the quality of 3D reconstruction and segmentation in DSP-SLAM and to impact a wide range of fields, including robotics, autonomous vehicles, and augmented reality.

* Manuscript accepted to IEEE Telepresence 2024

Via

Access Paper or Ask Questions

TiltXter: CNN-based Electro-tactile Rendering of Tilt Angle for Telemanipulation of Pasteur Pipettes

Sep 24, 2024

Miguel Altamirano Cabrera, Jonathan Tirado, Aleksey Fedoseev, Oleg Sautenkov, Vladimir Poliakov, Pavel Kopanev, Dzmitry Tsetserukou

Abstract:The shape of deformable objects can change drastically during grasping by robotic grippers, causing an ambiguous perception of their alignment and hence resulting in errors in robot positioning and telemanipulation. Rendering clear tactile patterns is fundamental to increasing users' precision and dexterity through tactile haptic feedback during telemanipulation. Therefore, different methods have to be studied to decode the sensors' data into haptic stimuli. This work presents a telemanipulation system for plastic pipettes that consists of a Force Dimension Omega.7 haptic interface endowed with two electro-stimulation arrays and two tactile sensor arrays embedded in the 2-finger Robotiq gripper. We propose a novel approach based on convolutional neural networks (CNN) to detect the tilt of deformable objects. The CNN generates a tactile pattern based on recognized tilt data to render further electro-tactile stimuli provided to the user during the telemanipulation. The study has shown that using the CNN algorithm, tilt recognition by users increased from 23.13\% with the downsized data to 57.9%, and the success rate during teleoperation increased from 53.12% using the downsized data to 92.18% using the tactile patterns generated by the CNN.

* Manuscript accepted to IEEE Telepresence 2024. arXiv admin note: text overlap with arXiv:2204.03521 by other authors

Via

Access Paper or Ask Questions