Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Irfan Hussain

Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates

YOLO26-RipeLoc Lite: A lightweight architecture for tomato ripeness detection and picking point localization in greenhouse robotic harvesting

May 26, 2026

Rajmeet Singh, Manveen Kaur, Shahpour Alirezaee, Irfan Hussain

Abstract:In greenhouse tomato production, automated harvesting requires accurate detection of ripe tomatoes, ripeness classification, and precise picking-point localization for robotic end-effectors. This paper proposes YOLO26-RipeLoc Lite, a lightweight deep learning architecture based on YOLO26 for simultaneous detection, ripeness classification, and center-point localization of greenhouse tomatoes. The model introduces three modifications: (1) a Lightweight Feature Pyramid Network (LFPN) with depthwise separable convolutions for efficient multi-scale fusion, (2) a Ripeness-Aware Attention Module (RAAM) with dual pooling and a learnable ripeness bias vector for enhanced color-texture discrimination, and (3) a Compact Detection Head (CDH) with shared convolutions and an integrated center-point regression branch for direct grasp planning. The model is evaluated on a custom dataset of 1,500 images with 6,227 instances (3,566 ripe, 2,661 unripe) from the SILAL greenhouse, Abu Dhabi, UAE. YOLO26-RipeLoc Lite achieves mAP@0.5 of 92.9% (95.2% ripe, 90.6% unripe) with the highest precision (95.2%) among all evaluated architectures using only 2.38M parameters. Post-training BatchNorm pruning at 30% reduces parameters to ~1.8M with negligible accuracy loss. Ablation studies confirm that greenhouse-aware HSV augmentation provides the largest improvement (+2.02 pp mAP@50), backbone freezing achieves peak precision (93.8%), and 3-phase progressive unfreezing yields the best localization quality (mAP@50:95 of 64.6%). Comparisons with YOLOv8n/s, YOLO11n/s, YOLO12n/s, and YOLO26s confirm superior accuracy-efficiency: 2.9 pp higher precision than YOLO12n with 7.0% fewer parameters and integrated center-point localization for robotic end-effector guidance.

Via

Access Paper or Ask Questions

A Novel Hybrid PID-LQR Controller for Sit-To-Stand Assistance Using a CAD-Integrated Simscape Multibody Lower Limb Exoskeleton

Apr 04, 2026

Ranjeet Kumbhar, Rajmeet Singh, Appaso M Gadade, Ashish Singla, Irfan Hussain

Abstract:Precise control of lower limb exoskeletons during sit-to-stand (STS) transitions remains a central challenge in rehabilitation robotics owing to the highly nonlinear, time-varying dynamics of the human-exoskeleton system and the stringent trajectory tracking requirements imposed by clinical safety. This paper presents the systematic design, simulation, and comparative evaluation of three control strategies: a classical Proportional-Integral-Derivative (PID) controller, a Linear Quadratic Regulator (LQR), and a novel Hybrid PID-LQR controller applied to a bilateral lower limb exoskeleton performing the sit-to-stand transition. A high-fidelity, physics-based dynamic model of the exoskeleton is constructed by importing a SolidWorks CAD assembly directly into the MATLAB/Simulink Simscape Multibody environment, preserving accurate geometric and inertial properties of all links. Physiologically representative reference joint trajectories for the hip, knee, and ankle joints are generated using OpenSim musculoskeletal simulation and decomposed into three biomechanical phases: flexion-momentum (0-33%), momentum-transfer (34-66%), and extension (67-100%). The proposed Hybrid PID-LQR controller combines the optimal transient response of LQR with the integral disturbance rejection of PID through a tuned blending coefficient alpha = 0.65. Simulation results demonstrate that the Hybrid PID-LQR achieves RMSE reductions of 72.3% and 70.4% over PID at the hip and knee joints, respectively, reduces settling time by over 90% relative to PID across all joints, and limits overshoot to 2.39%-6.10%, confirming its superiority over both baseline strategies across all evaluated performance metrics and demonstrating strong translational potential for clinical assistive exoskeleton deployment.

Via

Access Paper or Ask Questions

AgriChat: A Multimodal Large Language Model for Agriculture Image Understanding

Mar 14, 2026

Abderrahmene Boudiaf, Irfan Hussain, Sajid Javed

Abstract:The deployment of Multimodal Large Language Models (MLLMs) in agriculture is currently stalled by a critical trade-off: the existing literature lacks the large-scale agricultural datasets required for robust model development and evaluation, while current state-of-the-art models lack the verified domain expertise necessary to reason across diverse taxonomies. To address these challenges, we propose the Vision-to-Verified-Knowledge (V2VK) pipeline, a novel generative AI-driven annotation framework that integrates visual captioning with web-augmented scientific retrieval to autonomously generate the AgriMM benchmark, effectively eliminating biological hallucinations by grounding training data in verified phytopathological literature. The AgriMM benchmark contains over 3,000 agricultural classes and more than 607k VQAs spanning multiple tasks, including fine-grained plant species identification, plant disease symptom recognition, crop counting, and ripeness assessment. Leveraging this verifiable data, we present AgriChat, a specialized MLLM that presents broad knowledge across thousands of agricultural classes and provides detailed agricultural assessments with extensive explanations. Extensive evaluation across diverse tasks, datasets, and evaluation conditions reveals both the capabilities and limitations of current agricultural MLLMs, while demonstrating AgriChat's superior performance over other open-source models, including internal and external benchmarks. The results validate that preserving visual detail combined with web-verified knowledge constitutes a reliable pathway toward robust and trustworthy agricultural AI. The code and dataset are publicly available at https://github.com/boudiafA/AgriChat .

Via

Access Paper or Ask Questions

LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System

Jan 19, 2026

Muhayy Ud Din, Waseem Akram, Ahsan B. Bakht, Irfan Hussain

Abstract:Maritime port inspection plays a critical role in ensuring safety, regulatory compliance, and operational efficiency in complex maritime environments. However, existing inspection methods often rely on manual operations and conventional computer vision techniques that lack scalability and contextual understanding. This study introduces a novel integrated engineering framework that utilizes the synergy between Large Language Models (LLMs) and Vision Language Models (VLMs) to enable autonomous maritime port inspection using cooperative aerial and surface robotic platforms. The proposed framework replaces traditional state-machine mission planners with LLM-driven symbolic planning and improved perception pipelines through VLM-based semantic inspection, enabling context-aware and adaptive monitoring. The LLM module translates natural language mission instructions into executable symbolic plans with dependency graphs that encode operational constraints and ensure safe UAV-USV coordination. Meanwhile, the VLM module performs real-time semantic inspection and compliance assessment, generating structured reports with contextual reasoning. The framework was validated using the extended MBZIRC Maritime Simulator with realistic port infrastructure and further assessed through real-world robotic inspection trials. The lightweight on-board design ensures suitability for resource-constrained maritime platforms, advancing the development of intelligent, autonomous inspection systems. Project resources (code and videos) can be found here: https://github.com/Muhayyuddin/llm-vlm-fusion-port-inspection

* submitted in AEJ

Via

Access Paper or Ask Questions

Lang2Manip: A Tool for LLM-Based Symbolic-to-Geometric Planning for Manipulation

Dec 18, 2025

Muhayy Ud Din, Jan Rosell, Waseem Akram, Irfan Hussain

Figure 1 for Lang2Manip: A Tool for LLM-Based Symbolic-to-Geometric Planning for Manipulation

Figure 2 for Lang2Manip: A Tool for LLM-Based Symbolic-to-Geometric Planning for Manipulation

Figure 3 for Lang2Manip: A Tool for LLM-Based Symbolic-to-Geometric Planning for Manipulation

Figure 4 for Lang2Manip: A Tool for LLM-Based Symbolic-to-Geometric Planning for Manipulation

Abstract:Simulation is essential for developing robotic manipulation systems, particularly for task and motion planning (TAMP), where symbolic reasoning interfaces with geometric, kinematic, and physics-based execution. Recent advances in Large Language Models (LLMs) enable robots to generate symbolic plans from natural language, yet executing these plans in simulation often requires robot-specific engineering or planner-dependent integration. In this work, we present a unified pipeline that connects an LLM-based symbolic planner with the Kautham motion planning framework to achieve generalizable, robot-agnostic symbolic-to-geometric manipulation. Kautham provides ROS-compatible support for a wide range of industrial manipulators and offers geometric, kinodynamic, physics-driven, and constraint-based motion planning under a single interface. Our system converts language instructions into symbolic actions and computes and executes collision-free trajectories using any of Kautham's planners without additional coding. The result is a flexible and scalable tool for language-driven TAMP that is generalized across robots, planning modalities, and manipulation tasks.

* Submitted to ICARA

Via

Access Paper or Ask Questions

Bio-Inspired Robotic Houbara: From Development to Field Deployment for Behavioral Studies

Oct 06, 2025

Lyes Saad Saoud, Irfan Hussain

Abstract:Biomimetic intelligence and robotics are transforming field ecology by enabling lifelike robotic surrogates that interact naturally with animals under real world conditions. Studying avian behavior in the wild remains challenging due to the need for highly realistic morphology, durable outdoor operation, and intelligent perception that can adapt to uncontrolled environments. We present a next generation bio inspired robotic platform that replicates the morphology and visual appearance of the female Houbara bustard to support controlled ethological studies and conservation oriented field research. The system introduces a fully digitally replicable fabrication workflow that combines high resolution structured light 3D scanning, parametric CAD modelling, articulated 3D printing, and photorealistic UV textured vinyl finishing to achieve anatomically accurate and durable robotic surrogates. A six wheeled rocker bogie chassis ensures stable mobility on sand and irregular terrain, while an embedded NVIDIA Jetson module enables real time RGB and thermal perception, lightweight YOLO based detection, and an autonomous visual servoing loop that aligns the robot's head toward detected targets without human intervention. A lightweight thermal visible fusion module enhances perception in low light conditions. Field trials in desert aviaries demonstrated reliable real time operation at 15 to 22 FPS with latency under 100 ms and confirmed that the platform elicits natural recognition and interactive responses from live Houbara bustards under harsh outdoor conditions. This integrated framework advances biomimetic field robotics by uniting reproducible digital fabrication, embodied visual intelligence, and ecological validation, providing a transferable blueprint for animal robot interaction research, conservation robotics, and public engagement.

Via

Access Paper or Ask Questions

FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains

Jul 23, 2025

Muayad Abujabal, Lyes Saad Saoud, Irfan Hussain

Figure 1 for FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains

Figure 2 for FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains

Figure 3 for FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains

Figure 4 for FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains

Abstract:Accurate fish detection in underwater imagery is essential for ecological monitoring, aquaculture automation, and robotic perception. However, practical deployment remains limited by fragmented datasets, heterogeneous imaging conditions, and inconsistent evaluation protocols. To address these gaps, we present \textit{FishDet-M}, the largest unified benchmark for fish detection, comprising 13 publicly available datasets spanning diverse aquatic environments including marine, brackish, occluded, and aquarium scenes. All data are harmonized using COCO-style annotations with both bounding boxes and segmentation masks, enabling consistent and scalable cross-domain evaluation. We systematically benchmark 28 contemporary object detection models, covering the YOLOv8 to YOLOv12 series, R-CNN based detectors, and DETR based models. Evaluations are conducted using standard metrics including mAP, mAP@50, and mAP@75, along with scale-specific analyses (AP$_S$, AP$_M$, AP$_L$) and inference profiling in terms of latency and parameter count. The results highlight the varying detection performance across models trained on FishDet-M, as well as the trade-off between accuracy and efficiency across models of different architectures. To support adaptive deployment, we introduce a CLIP-based model selection framework that leverages vision-language alignment to dynamically identify the most semantically appropriate detector for each input image. This zero-shot selection strategy achieves high performance without requiring ensemble computation, offering a scalable solution for real-time applications. FishDet-M establishes a standardized and reproducible platform for evaluating object detection in complex aquatic scenes. All datasets, pretrained models, and evaluation tools are publicly available to facilitate future research in underwater computer vision and intelligent marine systems.

Via

Access Paper or Ask Questions

A Review of Generative AI in Aquaculture: Foundations, Applications, and Future Directions for Smart and Sustainable Farming

Jul 16, 2025

Waseem Akram, Muhayy Ud Din, Lyes Saad Soud, Irfan Hussain

Abstract:Generative Artificial Intelligence (GAI) has rapidly emerged as a transformative force in aquaculture, enabling intelligent synthesis of multimodal data, including text, images, audio, and simulation outputs for smarter, more adaptive decision-making. As the aquaculture industry shifts toward data-driven, automation and digital integration operations under the Aquaculture 4.0 paradigm, GAI models offer novel opportunities across environmental monitoring, robotics, disease diagnostics, infrastructure planning, reporting, and market analysis. This review presents the first comprehensive synthesis of GAI applications in aquaculture, encompassing foundational architectures (e.g., diffusion models, transformers, and retrieval augmented generation), experimental systems, pilot deployments, and real-world use cases. We highlight GAI's growing role in enabling underwater perception, digital twin modeling, and autonomous planning for remotely operated vehicle (ROV) missions. We also provide an updated application taxonomy that spans sensing, control, optimization, communication, and regulatory compliance. Beyond technical capabilities, we analyze key limitations, including limited data availability, real-time performance constraints, trust and explainability, environmental costs, and regulatory uncertainty. This review positions GAI not merely as a tool but as a critical enabler of smart, resilient, and environmentally aligned aquaculture systems.

Via

Access Paper or Ask Questions

Friction-Scaled Vibrotactile Feedback for Real-Time Slip Detection in Manipulation using Robotic Sixth Finger

Mar 19, 2025

Naqash Afzal, Basma Hasanen, Lakmal Seneviratne, Oussama Khatib, Irfan Hussain

Figure 1 for Friction-Scaled Vibrotactile Feedback for Real-Time Slip Detection in Manipulation using Robotic Sixth Finger

Figure 2 for Friction-Scaled Vibrotactile Feedback for Real-Time Slip Detection in Manipulation using Robotic Sixth Finger

Figure 3 for Friction-Scaled Vibrotactile Feedback for Real-Time Slip Detection in Manipulation using Robotic Sixth Finger

Figure 4 for Friction-Scaled Vibrotactile Feedback for Real-Time Slip Detection in Manipulation using Robotic Sixth Finger

Abstract:The integration of extra-robotic limbs/fingers to enhance and expand motor skills, particularly for grasping and manipulation, possesses significant challenges. The grasping performance of existing limbs/fingers is far inferior to that of human hands. Human hands can detect onset of slip through tactile feedback originating from tactile receptors during the grasping process, enabling precise and automatic regulation of grip force. The frictional information is perceived by humans depending upon slip happening between finger and object. Enhancing this capability in extra-robotic limbs or fingers used by humans is challenging. To address this challenge, this paper introduces novel approach to communicate frictional information to users through encoded vibrotactile cues. These cues are conveyed on onset of incipient slip thus allowing users to perceive friction and ultimately use this information to increase force to avoid dropping of object. In a 2-alternative forced-choice protocol, participants gripped and lifted a glass under three different frictional conditions, applying a normal force of 3.5 N. After reaching this force, glass was gradually released to induce slip. During this slipping phase, vibrations scaled according to static coefficient of friction were presented to users, reflecting frictional conditions. The results suggested an accuracy of 94.53 p/m 3.05 (mean p/mSD) in perceiving frictional information upon lifting objects with varying friction. The results indicate effectiveness of using vibrotactile feedback for sensory feedback, allowing users of extra-robotic limbs or fingers to perceive frictional information. This enables them to assess surface properties and adjust grip force according to frictional conditions, enhancing their ability to grasp, manipulate objects more effectively.

Via

Access Paper or Ask Questions

Maritime Mission Planning for Unmanned Surface Vessel using Large Language Model

Mar 15, 2025

Muhayy Ud Din, Waseem Akram, Ahsan B Bakht, Yihao Dong, Irfan Hussain

Figure 1 for Maritime Mission Planning for Unmanned Surface Vessel using Large Language Model

Figure 2 for Maritime Mission Planning for Unmanned Surface Vessel using Large Language Model

Figure 3 for Maritime Mission Planning for Unmanned Surface Vessel using Large Language Model

Figure 4 for Maritime Mission Planning for Unmanned Surface Vessel using Large Language Model

Abstract:Unmanned Surface Vessels (USVs) are essential for various maritime operations. USV mission planning approach offers autonomous solutions for monitoring, surveillance, and logistics. Existing approaches, which are based on static methods, struggle to adapt to dynamic environments, leading to suboptimal performance, higher costs, and increased risk of failure. This paper introduces a novel mission planning framework that uses Large Language Models (LLMs), such as GPT-4, to address these challenges. LLMs are proficient at understanding natural language commands, executing symbolic reasoning, and flexibly adjusting to changing situations. Our approach integrates LLMs into maritime mission planning to bridge the gap between high-level human instructions and executable plans, allowing real-time adaptation to environmental changes and unforeseen obstacles. In addition, feedback from low-level controllers is utilized to refine symbolic mission plans, ensuring robustness and adaptability. This framework improves the robustness and effectiveness of USV operations by integrating the power of symbolic planning with the reasoning abilities of LLMs. In addition, it simplifies the mission specification, allowing operators to focus on high-level objectives without requiring complex programming. The simulation results validate the proposed approach, demonstrating its ability to optimize mission execution while seamlessly adapting to dynamic maritime conditions.

* IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots

Via

Access Paper or Ask Questions