Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Monroe Kennedy III

DexFruit: Dexterous Manipulation and Gaussian Splatting Inspection of Fruit

Aug 09, 2025

Aiden Swann, Alex Qiu, Matthew Strong, Angelina Zhang, Samuel Morstein, Kai Rayle, Monroe Kennedy III

Abstract:DexFruit is a robotic manipulation framework that enables gentle, autonomous handling of fragile fruit and precise evaluation of damage. Many fruits are fragile and prone to bruising, thus requiring humans to manually harvest them with care. In this work, we demonstrate by using optical tactile sensing, autonomous manipulation of fruit with minimal damage can be achieved. We show that our tactile informed diffusion policies outperform baselines in both reduced bruising and pick-and-place success rate across three fruits: strawberries, tomatoes, and blackberries. In addition, we introduce FruitSplat, a novel technique to represent and quantify visual damage in high-resolution 3D representation via 3D Gaussian Splatting (3DGS). Existing metrics for measuring damage lack quantitative rigor or require expensive equipment. With FruitSplat, we distill a 2D strawberry mask as well as a 2D bruise segmentation mask into the 3DGS representation. Furthermore, this representation is modular and general, compatible with any relevant 2D model. Overall, we demonstrate a 92% grasping policy success rate, up to a 20% reduction in visual bruising, and up to an 31% improvement in grasp success rate on challenging fruit compared to our baselines across our three tested fruits. We rigorously evaluate this result with over 630 trials. Please checkout our website at https://dex-fruit.github.io .

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

TensorTouch: Calibration of Tactile Sensors for High Resolution Stress Tensor and Deformation for Dexterous Manipulation

Jun 09, 2025

Won Kyung Do, Matthew Strong, Aiden Swann, Boshu Lei, Monroe Kennedy III

Abstract:Advanced dexterous manipulation involving multiple simultaneous contacts across different surfaces, like pinching coins from ground or manipulating intertwined objects, remains challenging for robotic systems. Such tasks exceed the capabilities of vision and proprioception alone, requiring high-resolution tactile sensing with calibrated physical metrics. Raw optical tactile sensor images, while information-rich, lack interpretability and cross-sensor transferability, limiting their real-world utility. TensorTouch addresses this challenge by integrating finite element analysis with deep learning to extract comprehensive contact information from optical tactile sensors, including stress tensors, deformation fields, and force distributions at pixel-level resolution. The TensorTouch framework achieves sub-millimeter position accuracy and precise force estimation while supporting large sensor deformations crucial for manipulating soft objects. Experimental validation demonstrates 90% success in selectively grasping one of two strings based on detected motion, enabling new contact-rich manipulation capabilities previously inaccessible to robotic systems.

Via

Access Paper or Ask Questions

Diffusion-SAFE: Shared Autonomy Framework with Diffusion for Safe Human-to-Robot Driving Handover

May 15, 2025

Yunxin Fan, Monroe Kennedy III

Abstract:Safe handover in shared autonomy for vehicle control is well-established in modern vehicles. However, avoiding accidents often requires action several seconds in advance. This necessitates understanding human driver behavior and an expert control strategy for seamless intervention when a collision or unsafe state is predicted. We propose Diffusion-SAFE, a closed-loop shared autonomy framework leveraging diffusion models to: (1) predict human driving behavior for detection of potential risks, (2) generate safe expert trajectories, and (3) enable smooth handovers by blending human and expert policies over a short time horizon. Unlike prior works which use engineered score functions to rate driving performance, our approach enables both performance evaluation and optimal action sequence generation from demonstrations. By adjusting the forward and reverse processes of the diffusion-based copilot, our method ensures a gradual transition of control authority, by mimicking the drivers' behavior before intervention, which mitigates abrupt takeovers, leading to smooth transitions. We evaluated Diffusion-SAFE in both simulation (CarRacing-v0) and real-world (ROS-based race car), measuring human-driving similarity, safety, and computational efficiency. Results demonstrate a 98.5\% successful handover rate, highlighting the framework's effectiveness in progressively correcting human actions and continuously sampling optimal robot actions.

Via

Access Paper or Ask Questions

J-PARSE: Jacobian-based Projection Algorithm for Resolving Singularities Effectively in Inverse Kinematic Control of Serial Manipulators

May 01, 2025

Shivani Guptasarma, Matthew Strong, Honghao Zhen, Monroe Kennedy III

Abstract:J-PARSE is a method for smooth first-order inverse kinematic control of a serial manipulator near kinematic singularities. The commanded end-effector velocity is interpreted component-wise, according to the available mobility in each dimension of the task space. First, a substitute "Safety" Jacobian matrix is created, keeping the aspect ratio of the manipulability ellipsoid above a threshold value. The desired motion is then projected onto non-singular and singular directions, and the latter projection scaled down by a factor informed by the threshold value. A right-inverse of the non-singular Safety Jacobian is applied to the modified command. In the absence of joint limits and collisions, this ensures smooth transition into and out of low-rank poses, guaranteeing asymptotic stability for target poses within the workspace, and stability for those outside. Velocity control with J-PARSE is benchmarked against the Least-Squares and Damped Least-Squares inversions of the Jacobian, and shows high accuracy in reaching and leaving singular target poses. By expanding the available workspace of manipulators, the method finds applications in servoing, teleoperation, and learning.

* 18 pages, 25 figures

Via

Access Paper or Ask Questions

Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

Oct 07, 2024

Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis, Monroe Kennedy III

Figure 1 for Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

Figure 2 for Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

Figure 3 for Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

Figure 4 for Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

Abstract:We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at https://armlabstanford.github.io/next-best-sense.

Via

Access Paper or Ask Questions

Understanding and Imitating Human-Robot Motion with Restricted Visual Fields

Oct 07, 2024

Maulik Bhatt, HongHao Zhen, Monroe Kennedy III, Negar Mehr

Figure 1 for Understanding and Imitating Human-Robot Motion with Restricted Visual Fields

Figure 2 for Understanding and Imitating Human-Robot Motion with Restricted Visual Fields

Figure 3 for Understanding and Imitating Human-Robot Motion with Restricted Visual Fields

Figure 4 for Understanding and Imitating Human-Robot Motion with Restricted Visual Fields

Abstract:When working around humans, it is important to model their perception limitations in order to predict their behavior more accurately. In this work, we consider agents with a limited field of view, viewing range, and ability to miss objects within viewing range (e.g., transparency). By considering the observation model independently from the motion policy, we can better predict the agent's behavior by considering these limitations and approximating them. We perform a user study where human operators navigate a cluttered scene while scanning the region for obstacles with a limited field of view and range. Using imitation learning, we show that a robot can adopt a human's strategy for observing an environment with limitations on observation and navigate with minimal collision with dynamic and static obstacles. We also show that this learned model helps it successfully navigate a physical hardware vehicle in real time.

Via

Access Paper or Ask Questions

An Augmented Reality Interface for Teleoperating Robot Manipulators: Reducing Demonstrator Task Load through Digital Twin Control

Sep 27, 2024

Aliyah Smith, Monroe Kennedy III

Abstract:Acquiring high-quality demonstration data is essential for the success of data-driven methods, such as imitation learning. Existing platforms for providing demonstrations for manipulation tasks often impose significant physical and mental demands on the demonstrator, require additional hardware systems, or necessitate specialized domain knowledge. In this work, we present a novel augmented reality (AR) interface for teleoperating robotic manipulators, emphasizing the demonstrator's experience, particularly in the context of performing complex tasks that require precision and accuracy. This interface, designed for the Microsoft HoloLens 2, leverages the adaptable nature of mixed reality (MR), enabling users to control a physical robot through digital twin surrogates. We assess the effectiveness of our approach across three complex manipulation tasks and compare its performance against OPEN TEACH, a recent virtual reality (VR) teleoperation system, as well as two traditional control methods: kinesthetic teaching and a 3D SpaceMouse for end-effector control. Our findings show that our method performs comparably to the VR approach and demonstrates the potential for AR in data collection. Additionally, we conduct a pilot study to evaluate the usability and task load associated with each method. Results indicate that our AR-based system achieves higher usability scores than the VR benchmark and significantly reduces mental demand, physical effort, and frustration experienced by users. An accompanying video can be found at https://youtu.be/w-M58ohPgrA.

* 6 pages, 4 figures

Via

Access Paper or Ask Questions

SAFER-Splat: A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps

Sep 15, 2024

Timothy Chen, Aiden Swann, Javier Yu, Ola Shorinwa, Riku Murai, Monroe Kennedy III, Mac Schwager

Abstract:SAFER-Splat (Simultaneous Action Filtering and Environment Reconstruction) is a real-time, scalable, and minimally invasive action filter, based on control barrier functions, for safe robotic navigation in a detailed map constructed at runtime using Gaussian Splatting (GSplat). We propose a novel Control Barrier Function (CBF) that not only induces safety with respect to all Gaussian primitives in the scene, but when synthesized into a controller, is capable of processing hundreds of thousands of Gaussians while maintaining a minimal memory footprint and operating at 15 Hz during online Splat training. Of the total compute time, a small fraction of it consumes GPU resources, enabling uninterrupted training. The safety layer is minimally invasive, correcting robot actions only when they are unsafe. To showcase the safety filter, we also introduce SplatBridge, an open-source software package built with ROS for real-time GSplat mapping for robots. We demonstrate the safety and robustness of our pipeline first in simulation, where our method is 20-50x faster, safer, and less conservative than competing methods based on neural radiance fields. Further, we demonstrate simultaneous GSplat mapping and safety filtering on a drone hardware platform using only on-board perception. We verify that under teleoperation a human pilot cannot invoke a collision. Our videos and codebase can be found at https://chengine.github.io/safer-splat.

Via

Access Paper or Ask Questions

Dynamic Layer Detection of a Thin Silk Cloth using DenseTact Optical Tactile Sensors

Sep 15, 2024

Ankush Kundan Dhawan, Camille Chungyoun, Karina Ting, Monroe Kennedy III

Abstract:Cloth manipulation is an important aspect of many everyday tasks and remains a significant challenge for robots. While existing research has made strides in tasks like cloth smoothing and folding, many studies struggle with common failure modes (crumpled corners/edges, incorrect grasp configurations) that a preliminary step of cloth layer detection can solve. We present a novel method for classifying the number of grasped cloth layers using a custom gripper equipped with DenseTact 2.0 optical tactile sensors. After grasping a cloth, the gripper performs an anthropomorphic rubbing motion while collecting optical flow, 6-axis wrench, and joint state data. Using this data in a transformer-based network achieves a test accuracy of 98.21% in correctly classifying the number of grasped layers, showing the effectiveness of our dynamic rubbing method. Evaluating different inputs and model architectures highlights the usefulness of using tactile sensor information and a transformer model for this task. A comprehensive dataset of 368 labeled trials was collected and made open-source along with this paper. Our project page is available at https://armlabstanford.github.io/dynamic-cloth-detection.

* 6 pages, 8 figures, submitted to ICRA 2025

Via

Access Paper or Ask Questions

Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

May 14, 2024

Ola Shorinwa, Johnathan Tucker, Aliyah Smith, Aiden Swann, Timothy Chen, Roya Firoozi, Monroe Kennedy III, Mac Schwager

Figure 1 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Figure 2 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Figure 3 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Figure 4 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Abstract:We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) ASK-Splat, a GSplat representation that distills latent codes for language semantics and grasp affordance into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical for many robotics tasks; (ii) SEE-Splat, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a "digital twin" of the evolving environment throughout the manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses ASK-Splat and SEE-Splat to propose candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks, as well as in four multi-stage manipulation tasks using the edited scene to reflect scene changes due to prior manipulation stages, which is not possible with the existing baselines. Code for this project and a link to the project page will be made available soon.

Via

Access Paper or Ask Questions