Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abraham George

Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge

Feb 10, 2026

Jesse Barkley, Abraham George, Amir Barati Farimani

Abstract:Deploying autonomous edge robotics in dynamic military environments is constrained by both scarce domain-specific training data and the computational limits of edge hardware. This paper introduces a hierarchical, zero-shot framework that cascades lightweight object detection with compact Vision-Language Models (VLMs) from the Qwen and Gemma families (4B-12B parameters). Grounding DINO serves as a high-recall, text-promptable region proposer, and frames with high detection confidence are passed to edge-class VLMs for semantic verification. We evaluate this pipeline on 55 high-fidelity synthetic videos from Battlefield 6 across three tasks: false-positive filtering (up to 100% accuracy), damage assessment (up to 97.5%), and fine-grained vehicle classification (55-90%). We further extend the pipeline into an agentic Scout-Commander workflow, achieving 100% correct asset deployment and a 9.8/10 reasoning score (graded by GPT-4o) with sub-75-second latency. A novel "Controlled Input" methodology decouples perception from reasoning, revealing distinct failure phenotypes: Gemma3-12B excels at tactical logic but fails in visual perception, while Gemma3-4B exhibits reasoning collapse even with accurate inputs. These findings validate hierarchical zero-shot architectures for edge autonomy and provide a diagnostic framework for certifying VLM suitability in safety-critical applications.

* 8 Pages, 3 Figures

Via

Access Paper or Ask Questions

RT-cache: Efficient Robot Trajectory Retrieval System

May 14, 2025

Owen Kwon, Abraham George, Alison Bartsch, Amir Barati Farimani

Figure 1 for RT-cache: Efficient Robot Trajectory Retrieval System

Figure 2 for RT-cache: Efficient Robot Trajectory Retrieval System

Figure 3 for RT-cache: Efficient Robot Trajectory Retrieval System

Figure 4 for RT-cache: Efficient Robot Trajectory Retrieval System

Abstract:This paper introduces RT-cache, a novel trajectorymemory pipeline that accelerates real-world robot inference by leveraging big-data retrieval and learning from experience. While modern Vision-Language-Action (VLA) models can handle diverse robotic tasks, they often incur high per-step inference costs, resulting in significant latency, sometimes minutes per task. In contrast, RT-cache stores a large-scale Memory of previously successful robot trajectories and retrieves relevant multistep motion snippets, drastically reducing inference overhead. By integrating a Memory Builder with a Trajectory Retrieval, we develop an efficient retrieval process that remains tractable even for extremely large datasets. RT-cache flexibly accumulates real-world experiences and replays them whenever the current scene matches past states, adapting quickly to new or unseen environments with only a few additional samples. Experiments on the Open-X Embodiment Dataset and other real-world data demonstrate that RT-cache completes tasks both faster and more successfully than a baseline lacking retrieval, suggesting a practical, data-driven solution for real-time manipulation.

* 9 pages, 5 figures. Submitted to an IEEE robotics conference

Via

Access Paper or Ask Questions

Semantic Intelligence: Integrating GPT-4 with A Planning in Low-Cost Robotics

May 03, 2025

Jesse Barkley, Abraham George, Amir Barati Farimani

Figure 1 for Semantic Intelligence: Integrating GPT-4 with A Planning in Low-Cost Robotics

Figure 2 for Semantic Intelligence: Integrating GPT-4 with A Planning in Low-Cost Robotics

Figure 3 for Semantic Intelligence: Integrating GPT-4 with A Planning in Low-Cost Robotics

Figure 4 for Semantic Intelligence: Integrating GPT-4 with A Planning in Low-Cost Robotics

Abstract:Classical robot navigation often relies on hardcoded state machines and purely geometric path planners, limiting a robot's ability to interpret high-level semantic instructions. In this paper, we first assess GPT-4's ability to act as a path planner compared to the A* algorithm, then present a hybrid planning framework that integrates GPT-4's semantic reasoning with A* on a low-cost robot platform operating on ROS2 Humble. Our approach eliminates explicit finite state machine (FSM) coding by using prompt-based GPT-4 reasoning to handle task logic while maintaining the accurate paths computed by A*. The GPT-4 module provides semantic understanding of instructions and environmental cues (e.g., recognizing toxic obstacles or crowded areas to avoid, or understanding low-battery situations requiring alternate route selection), and dynamically adjusts the robot's occupancy grid via obstacle buffering to enforce semantic constraints. We demonstrate multi-step reasoning for sequential tasks, such as first navigating to a resource goal and then reaching a final destination safely. Experiments on a Petoi Bittle robot with an overhead camera and Raspberry Pi Zero 2W compare classical A* against GPT-4-assisted planning. Results show that while A* is faster and more accurate for basic route generation and obstacle avoidance, the GPT-4-integrated system achieves high success rates (96-100%) on semantic tasks that are infeasible for pure geometric planners. This work highlights how affordable robots can exhibit intelligent, context-aware behaviors by leveraging large language model reasoning with minimal hardware and no fine-tuning.

* 10 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

PLATO: Planning with LLMs and Affordances for Tool Manipulation

Sep 17, 2024

Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani

Figure 1 for PLATO: Planning with LLMs and Affordances for Tool Manipulation

Figure 2 for PLATO: Planning with LLMs and Affordances for Tool Manipulation

Figure 3 for PLATO: Planning with LLMs and Affordances for Tool Manipulation

Figure 4 for PLATO: Planning with LLMs and Affordances for Tool Manipulation

Abstract:As robotic systems become increasingly integrated into complex real-world environments, there is a growing need for approaches that enable robots to understand and act upon natural language instructions without relying on extensive pre-programmed knowledge of their surroundings. This paper presents PLATO, an innovative system that addresses this challenge by leveraging specialized large language model agents to process natural language inputs, understand the environment, predict tool affordances, and generate executable actions for robotic systems. Unlike traditional systems that depend on hard-coded environmental information, PLATO employs a modular architecture of specialized agents to operate without any initial knowledge of the environment. These agents identify objects and their locations within the scene, generate a comprehensive high-level plan, translate this plan into a series of low-level actions, and verify the completion of each step. The system is particularly tested on challenging tool-use tasks, which involve handling diverse objects and require long-horizon planning. PLATO's design allows it to adapt to dynamic and unstructured settings, significantly enhancing its flexibility and robustness. By evaluating the system across various complex scenarios, we demonstrate its capability to tackle a diverse range of tasks and offer a novel solution to integrate LLMs with robotic platforms, advancing the state-of-the-art in autonomous robotic task execution. For videos and prompt details, please see our project website: https://sites.google.com/andrew.cmu.edu/plato

* 7 pages, 4 figures, submitted to ICRA 2025

Via

Access Paper or Ask Questions

Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

Jun 25, 2024

Selam Gano, Abraham George, Amir Barati Farimani

Figure 1 for Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

Figure 2 for Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

Figure 3 for Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

Figure 4 for Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

Abstract:Tactile perception is a critical component of solving real-world manipulation tasks, but tactile sensors for manipulation have barriers to use such as fragility and cost. In this work, we engage a robust, low-cost tactile sensor, BeadSight, as an alternative to precise pre-calibrated sensors for a pretraining approach to manipulation. We show that tactile pretraining, even with a low-fidelity sensor as BeadSight, can improve an imitation learning agent's performance on complex manipulation tasks. We demonstrate this method against a baseline USB cable plugging task, previously achieved with a much higher precision GelSight sensor as the tactile input to pretraining. Our best BeadSight pretrained visuo-tactile agent completed the task with 70\% accuracy compared to 85\% for the best GelSight pretrained visuo-tactile agent, with vision-only inference for both.

Via

Access Paper or Ask Questions

BeadSight: An Inexpensive Tactile Sensor Using Hydro-Gel Beads

May 21, 2024

Abraham George, Yibo Chen, Atharva Dikshit, Peter Pak, Amir Barati Farimani

Figure 1 for BeadSight: An Inexpensive Tactile Sensor Using Hydro-Gel Beads

Figure 2 for BeadSight: An Inexpensive Tactile Sensor Using Hydro-Gel Beads

Figure 3 for BeadSight: An Inexpensive Tactile Sensor Using Hydro-Gel Beads

Figure 4 for BeadSight: An Inexpensive Tactile Sensor Using Hydro-Gel Beads

Abstract:In robotic manipulation, tactile sensors are indispensable, especially when dealing with soft objects, objects of varying dimensions, or those out of the robot's direct line of sight. Traditional tactile sensors often grapple with challenges related to cost and durability. To address these issues, our study introduces a novel approach to visuo-tactile sensing with an emphasis on economy and replacablity. Our proposed sensor, BeadSight, uses hydro-gel beads encased in a vinyl bag as an economical, easily replaceable sensing medium. When the sensor makes contact with a surface, the deformation of the hydrogel beads is observed using a rear camera. This observation is then passed through a U-net Neural Network to predict the forces acting on the surface of the bead bag, in the form of a pressure map. Our results show that the sensor can accurately predict these pressure maps, detecting the location and magnitude of forces applied to the surface. These abilities make BeadSight an effective, inexpensive, and easily replaceable tactile sensor, ideal for many robotics applications.

* BeadSight code is available at: https://github.com/Abraham190137/BeadSight 7 pages, 8 figures

Via

Access Paper or Ask Questions

Visuo-Tactile Pretraining for Cable Plugging

Mar 18, 2024

Abraham George, Selam Gano, Pranav Katragadda, Amir Barati Farimani

Abstract:Tactile information is a critical tool for fine-grain manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human level of performance, we need to properly incorporate tactile information into both skill execution and skill learning. In this paper, we investigate how we can incorporate tactile information into imitation learning platforms to improve performance on complex tasks. To do this, we tackle the challenge of plugging in a USB cable, a dexterous manipulation task that relies on fine-grain visuo-tactile serving. By incorporating tactile information into imitation learning frameworks, we are able to train a robotic agent to plug in a USB cable - a first for imitation learning. Additionally, we explore how tactile information can be used to train non-tactile agents through a contrastive-loss pretraining process. Our results show that by pretraining with tactile information, the performance of a non-tactile agent can be significantly improved, reaching a level on par with visuo-tactile agents. For demonstration videos and access to our codebase, see the project website: https://sites.google.com/andrew.cmu.edu/visuo-tactile-cable-plugging/home

* 8 pages, 6 figures, submitted to IROS 2024

Via

Access Paper or Ask Questions

Pour me a drink: Robotic Precision Pouring Carbonated Beverages into Transparent Containers

Sep 19, 2023

Feiya Zhu, Shuo Hu, Letian Leng, Alison Bartsch, Abraham George, Amir Barati Farimani

Figure 1 for Pour me a drink: Robotic Precision Pouring Carbonated Beverages into Transparent Containers

Figure 2 for Pour me a drink: Robotic Precision Pouring Carbonated Beverages into Transparent Containers

Figure 3 for Pour me a drink: Robotic Precision Pouring Carbonated Beverages into Transparent Containers

Figure 4 for Pour me a drink: Robotic Precision Pouring Carbonated Beverages into Transparent Containers

Abstract:With the growing emphasis on the development and integration of service robots within household environments, we will need to endow robots with the ability to reliably pour a variety of liquids. However, liquid handling and pouring is a challenging task due to the complex dynamics and varying properties of different liquids, the exacting precision required to prevent spills and ensure accurate pouring, and the necessity for robots to adapt seamlessly to a multitude of containers in real-world scenarios. In response to these challenges, we propose a novel autonomous robotics pipeline that empowers robots to execute precision pouring tasks, encompassing both carbonated and non-carbonated liquids, as well as opaque and transparent liquids, into a variety of transparent containers. Our proposed approach maximizes the potential of RGB input alone, achieving zero-shot capability by harnessing existing pre-trained vision segmentation models. This eliminates the need for additional data collection, manual image annotations, or extensive training. Furthermore, our work integrates ChatGPT, facilitating seamless interaction between individuals without prior expertise in robotics and our pouring pipeline, this integration enables users to effortlessly request and execute pouring actions. Our experiments demonstrate the pipeline's capability to successfully pour a diverse range of carbonated and non-carbonated beverages into containers of varying sizes, relying solely on visual input.

* Supplementary materials will be available soon

Via

Access Paper or Ask Questions

One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers

Sep 18, 2023

Abraham George, Amir Barati Farimani

Figure 1 for One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers

Figure 2 for One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers

Figure 3 for One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers

Figure 4 for One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers

Abstract:Learning from human demonstrations (behavior cloning) is a cornerstone of robot learning. However, most behavior cloning algorithms require a large number of demonstrations to learn a task, especially for general tasks that have a large variety of initial conditions. Humans, however, can learn to complete tasks, even complex ones, after only seeing one or two demonstrations. Our work seeks to emulate this ability, using behavior cloning to learn a task given only a single human demonstration. We achieve this goal by using linear transforms to augment the single demonstration, generating a set of trajectories for a wide range of initial conditions. With these demonstrations, we are able to train a behavior cloning agent to successfully complete three block manipulation tasks. Additionally, we developed a novel addition to the temporal ensembling method used by action chunking agents during inference. By incorporating the standard deviation of the action predictions into the ensembling method, our approach is more robust to unforeseen changes in the environment, resulting in significant performance improvements.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Fluid Property Prediction Leveraging AI and Robotics

Aug 04, 2023

Jong Hoon Park, Gauri Pramod Dalwankar, Alison Bartsch, Abraham George, Amir Barati Farimani

Abstract:Inferring liquid properties from vision is a challenging task due to the complex nature of fluids, both in behavior and detection. Nevertheless, the ability to infer their properties directly from visual information is highly valuable for autonomous fluid handling systems, as cameras are readily available. Moreover, predicting fluid properties purely from vision can accelerate the process of fluid characterization saving considerable time and effort in various experimental environments. In this work, we present a purely vision-based approach to estimate viscosity, leveraging the fact that the behavior of the fluid oscillations is directly related to the viscosity. Specifically, we utilize a 3D convolutional autoencoder to learn latent representations of different fluid-oscillating patterns present in videos. We leverage this latent representation to visually infer the category of fluid or the dynamics viscosity of fluid from video.

Via

Access Paper or Ask Questions