Abstract:Purse seiners play a crucial role in tuna fishing, as approximately 69% of the world's tropical tuna is caught using this gear. All tuna Regional Fisheries Management Organizations have established minimum standards to use electronic monitoring (EM) in fisheries in addition to traditional observers. The EM systems produce a massive amount of video data that human analysts must process. Integrating artificial intelligence (AI) into their workflow can decrease that workload and improve the accuracy of the reports. However, species identification still poses significant challenges for AI, as achieving balanced performance across all species requires appropriate training data. Here, we quantify the difficulty experts face to distinguish bigeye tuna (BET, Thunnus Obesus) from yellowfin tuna (YFT, Thunnus Albacares) using images captured by EM systems. We found inter-expert agreements of 42.9% $\pm$ 35.6% for BET and 57.1% $\pm$ 35.6% for YFT. We then present a multi-stage pipeline to estimate the species composition of the catches using a reliable ground-truth dataset based on identifications made by observers on board. Three segmentation approaches are compared: Mask R-CNN, a combination of DINOv2 with SAM2, and a integration of YOLOv9 with SAM2. We found that the latest performs the best, with a validation mean average precision of 0.66 $\pm$ 0.03 and a recall of 0.88 $\pm$ 0.03. Segmented individuals are tracked using ByteTrack. For classification, we evaluate a standard multiclass classification model and a hierarchical approach, finding a superior generalization by the hierarchical. All our models were cross-validated during training and tested on fishing operations with fully known catch composition. Combining YOLOv9-SAM2 with the hierarchical classification produced the best estimations, with 84.8% of the individuals being segmented and classified with a mean average error of 4.5%.




Abstract:A multi-modal framework to generated user intention distributions when operating a mobile vehicle is proposed in this work. The model learns from past observed trajectories and leverages traversability information derived from the visual surroundings to produce a set of future trajectories, suitable to be directly embedded into a perception-action shared control strategy on a mobile agent, or as a safety layer to supervise the prudent operation of the vehicle. We base our solution on a conditional Generative Adversarial Network with Long-Short Term Memory cells to capture trajectory distributions conditioned on past trajectories, further fused with traversability probabilities derived from visual segmentation with a Convolutional Neural Network. The proposed data-driven framework results in a significant reduction in error of the predicted trajectories (versus the ground truth) from comparable strategies in the literature (e.g. Social-GAN) that fail to account for information other than the agent's past history. Experiments were conducted on a dataset collected with a custom wheelchair model built onto the open-source urban driving simulator CARLA, proving also that the proposed framework can be used with a small, un-annotated dataset.




Abstract:A mechanism to derive non-repetitive coverage path solutions with a proven minimal number of discontinuities is proposed in this work, with the aim to avoid unnecessary, costly end effector lift-offs for manipulators. The problem is motivated by the automatic polishing of an object. Due to the non-bijective mapping between the workspace and the joint-space, a continuous coverage path in the workspace may easily be truncated in the joint-space, incuring undesirable end effector lift-offs. Inversely, there may be multiple configuration choices to cover the same point of a coverage path through the solution of the Inverse Kinematics. The solution departs from the conventional local optimisation of the coverage path shape in task space, or choosing appropriate but possibly disconnected configurations, to instead explicitly explore the leaast number of discontinuous motions through the analysis of the structure of valid configurations in joint-space. The two novel contributions of this paper include proof that the least number of path discontinuities is predicated on the surrounding environment, independent from the choice of the actual coverage path; thus has a minimum. And an efficient finite cellular decomposition method to optimally divide the workspace into the minimum number of cells, each traversable without discontinuties by any arbitrary coverage path within. Extensive simulation examples and real-world results on a 5 DoF manipulator are presented to prove the validity of the proposed strategy in realistic settings.




Abstract:Rapidly estimating the remaining wall thickness (RWT) is paramount for the non-destructive condition assessment evaluation of large critical metallic pipelines. A robotic vehicle with embedded magnetism-based sensors has been developed to traverse the inside of a pipeline and conduct inspections at the location of a break. However its sensing speed is constrained by the magnetic principle of operation, thus slowing down the overall operation in seeking dense RWT mapping. To ameliorate this drawback, this work proposes the partial scanning of the pipe and then employing Gaussian Processes (GPs) to infer RWT at the unseen pipe sections. Since GP prediction assumes to have normally distributed input data - which does correspond with real RWT measurements - Gaussian mixture (GM) models are proven in this work as fitting marginal distributions to effectively capture the probability of any RWT value in the inspected data. The effectiveness of the proposed approach is extensively validated from real-world data collected in collaboration with a water utility from a cast iron water main pipeline in Sydney, Australia.




Abstract:Knowing the geometry of a space is desirable for many applications, e.g. sound source localization, sound field reproduction or auralization. In circumstances where only acoustic signals can be obtained, estimating the geometry of a room is a challenging proposition. Existing methods have been proposed to reconstruct a room from the room impulse responses (RIRs). However, the sound source and microphones must be deployed in a feasible region of the room for it to work, which is impractical when the room is unknown. This work propose to employ a robot equipped with a sound source and four acoustic sensors, to follow a proposed path planning strategy to moves around the room to collect first image sources for room geometry estimation. The strategy can effectively drives the robot from a random initial location through the room so that the room geometry is guaranteed to be revealed. Effectiveness of the proposed approach is extensively validated in a synthetic environment, where the results obtained are highly promising.




Abstract:In this paper, we develop a system for the low-cost indoor localization and tracking problem using radio signal strength indicator, Inertial Measurement Unit (IMU), and magnetometer sensors. We develop a novel and simplified probabilistic IMU motion model as the proposal distribution of the sequential Monte-Carlo technique to track the robot trajectory. Our algorithm can globally localize and track a robot with a priori unknown location, given an informative prior map of the Bluetooth Low Energy (BLE) beacons. Also, we formulate the problem as an optimization problem that serves as the Back-end of the algorithm mentioned above (Front-end). Thus, by simultaneously solving for the robot trajectory and the map of BLE beacons, we recover a continuous and smooth trajectory of the robot, corrected locations of the BLE beacons, and the time-varying IMU bias. The evaluations achieved using hardware show that through the proposed closed-loop system the localization performance can be improved; furthermore, the system becomes robust to the error in the map of beacons by feeding back the optimized map to the Front-end.




Abstract:Most of the existing robotic exploration schemes use occupancy grid representations and geometric targets known as frontiers. The occupancy grid representation relies on the assumption of independence between grid cells and ignores structural correlations present in the environment. We develop a Gaussian Processes (GPs) occupancy mapping technique that is computationally tractable for online map building due to its incremental formulation and provides a continuous model of uncertainty over the map spatial coordinates. The standard way to represent geometric frontiers extracted from occupancy maps is to assign binary values to each grid cell. We extend this notion to novel probabilistic frontier maps computed efficiently using the gradient of the GP occupancy map. We also propose a mutual information-based greedy exploration technique built on that representation that takes into account all possible future observations. A major advantage of high-dimensional map inference is the fact that such techniques require fewer observations, leading to a faster map entropy reduction during exploration for map building scenarios. Evaluations using the publicly available datasets show the effectiveness of the proposed framework for robotic mapping and exploration tasks.




Abstract:In this article, we propose a sampling-based motion planning algorithm equipped with an information-theoretic convergence criterion for incremental informative motion planning. The proposed approach allows dense map representations and incorporates the full state uncertainty into the planning process. The problem is formulated as a constrained maximization problem. Our approach is built on rapidly-exploring information gathering algorithms and benefits from advantages of sampling-based optimal motion planning algorithms. We propose two information functions and their variants for fast and online computations. We prove an information-theoretic convergence for an entire exploration and information gathering mission based on the least upper bound of the average map entropy. A natural automatic stopping criterion for information-driven motion control results from the convergence analysis. We demonstrate the performance of the proposed algorithms using three scenarios: comparison of the proposed information functions and sensor configuration selection, robotic exploration in unknown environments, and a wireless signal strength monitoring task in a lake from a publicly available dataset collected using an autonomous surface vehicle.




Abstract:Acquiring the accurate 3-D position of a target person around a robot provides fundamental and valuable information that is applicable to a wide range of robotic tasks, including home service, navigation and entertainment. This paper presents a real-time robotic 3-D human tracking system which combines a monocular camera with an ultrasonic sensor by the extended Kalman filter (EKF). The proposed system consists of three sub-modules: monocular camera sensor tracking model, ultrasonic sensor tracking model and multi-sensor fusion. An improved visual tracking algorithm is presented to provide partial location estimation (2-D). The algorithm is designed to overcome severe occlusions, scale variation, target missing and achieve robust re-detection. The scale accuracy is further enhanced by the estimated 3-D information. An ultrasonic sensor array is employed to provide the range information from the target person to the robot and Gaussian Process Regression is used for partial location estimation (2-D). EKF is adopted to sequentially process multiple, heterogeneous measurements arriving in an asynchronous order from the vision sensor and the ultrasonic sensor separately. In the experiments, the proposed tracking system is tested in both simulation platform and actual mobile robot for various indoor and outdoor scenes. The experimental results show the superior performance of the 3-D tracking system in terms of both the accuracy and robustness.




Abstract:In this paper, we propose a real-time classification scheme to cope with noisy Radio Signal Strength Indicator (RSSI) measurements utilized in indoor positioning systems. RSSI values are often converted to distances for position estimation. However due to multipathing and shadowing effects, finding a unique sensor model using both parametric and non-parametric methods is highly challenging. We learn decision regions using the Gaussian Processes classification to accept measurements that are consistent with the operating sensor model. The proposed approach can perform online, does not rely on a particular sensor model or parameters, and is robust to sensor failures. The experimental results achieved using hardware show that available positioning algorithms can benefit from incorporating the classifier into their measurement model as a meta-sensor modeling technique.