Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Malak Mansour

UNCLE-Grasp: Uncertainty-Aware Grasping of Leaf-Occluded Strawberries

Jan 20, 2026

Malak Mansour, Ali Abouzeid, Zezhou Sun, Qinbo Sun, Dezhen Song, Abdalla Swikir

Abstract:Robotic strawberry harvesting is challenging under partial occlusion, where leaves induce significant geometric uncertainty and make grasp decisions based on a single deterministic shape estimate unreliable. From a single partial observation, multiple incompatible 3D completions may be plausible, causing grasps that appear feasible on one completion to fail on another. We propose an uncertainty-aware grasping pipeline for partially occluded strawberries that explicitly models completion uncertainty arising from both occlusion and learned shape reconstruction. Our approach uses point cloud completion with Monte Carlo dropout to sample multiple shape hypotheses, generates candidate grasps for each completion, and evaluates grasp feasibility using physically grounded force-closure-based metrics. Rather than selecting a grasp based on a single estimate, we aggregate feasibility across completions and apply a conservative lower confidence bound (LCB) criterion to decide whether a grasp should be attempted or safely abstained. We evaluate the proposed method in simulation and on a physical robot across increasing levels of synthetic and real leaf occlusion. Results show that uncertainty-aware decision making enables reliable abstention from high-risk grasp attempts under severe occlusion while maintaining robust grasp execution when geometric confidence is sufficient, outperforming deterministic baselines in both simulated and physical robot experiments.

Via

Access Paper or Ask Questions

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

Sep 17, 2025

Ali Abouzeid, Malak Mansour, Zezhou Sun, Dezhen Song

Figure 1 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

Figure 2 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

Figure 3 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

Figure 4 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

Abstract:Vision-Language-Action (VLA) models often fail to generalize to novel camera viewpoints, a limitation stemming from their difficulty in inferring robust 3D geometry from 2D images. We introduce GeoAware-VLA, a simple yet effective approach that enhances viewpoint invariance by integrating strong geometric priors into the vision backbone. Instead of training a visual encoder or relying on explicit 3D data, we leverage a frozen, pretrained geometric vision model as a feature extractor. A trainable projection layer then adapts these geometrically-rich features for the policy decoder, relieving it of the burden of learning 3D consistency from scratch. Through extensive evaluations on LIBERO benchmark subsets, we show GeoAware-VLA achieves substantial improvements in zero-shot generalization to novel camera poses, boosting success rates by over 2x in simulation. Crucially, these benefits translate to the physical world; our model shows a significant performance gain on a real robot, especially when evaluated from unseen camera angles. Our approach proves effective across both continuous and discrete action spaces, highlighting that robust geometric grounding is a key component for creating more generalizable robotic agents.

* Under Review

Via

Access Paper or Ask Questions

Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

Jan 07, 2025

Malak Mansour, Ahmed Aly, Bahey Tharwat, Sarim Hashmi, Dong An, Ian Reid

Figure 1 for Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

Figure 2 for Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

Figure 3 for Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

Figure 4 for Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

Abstract:Large Language Models (LLMs) such as GPT-4, trained on huge amount of datasets spanning multiple domains, exhibit significant reasoning, understanding, and planning capabilities across various tasks. This study presents the first-ever work in Arabic language integration within the Vision-and-Language Navigation (VLN) domain in robotics, an area that has been notably underexplored in existing research. We perform a comprehensive evaluation of state-of-the-art multi-lingual Small Language Models (SLMs), including GPT-4o mini, Llama 3 8B, and Phi-3 medium 14B, alongside the Arabic-centric LLM, Jais. Our approach utilizes the NavGPT framework, a pure LLM-based instruction-following navigation agent, to assess the impact of language on navigation reasoning through zero-shot sequential action prediction using the R2R dataset. Through comprehensive experiments, we demonstrate that our framework is capable of high-level planning for navigation tasks when provided with instructions in both English and Arabic. However, certain models struggled with reasoning and planning in the Arabic language due to inherent limitations in their capabilities, sub-optimal performance, and parsing issues. These findings highlight the importance of enhancing planning and reasoning capabilities in language models for effective navigation, emphasizing this as a key area for further development while also unlocking the potential of Arabic-language models for impactful real-world applications.

Via

Access Paper or Ask Questions