Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic issues related to hand-intensive manufacturing processes. The system comprises a multi-modal sensor testbed to collect and synchronize operator upper body pose, hand pose and applied forces; a Biometric Assessment of Complete Hand (BACH) formulation to measure high-fidelity hand and finger risks; and industry-standard risk scores associated with upper body posture, RULA, and hand activity, HAL. Our findings demonstrate that BACH captures injurious activity with a higher granularity in comparison to the existing metrics. Machine learning models are also used to automate RULA and HAL scoring, and generalize well to unseen participants. Our assessment system, therefore, provides ergonomic interpretability of the manufacturing processes studied, and could be used to mitigate risks through minor workplace optimization and posture corrections.
We provide the first step toward developing a hierarchical control-estimation framework to actively plan robot trajectories for anomaly detection in confined spaces. The space is represented globally using a directed region graph, where a region is a landmark that needs to be visited (inspected). We devise a fast mixing Markov chain to find an ergodic route that traverses this graph so that the region visitation frequency is proportional to its anomaly detection uncertainty, while satisfying the edge directionality (region transition) constraint(s). Preliminary simulation results show fast convergence to the ergodic solution and confident estimation of the presence of anomalies in the inspected regions.
Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. Toward this goal, we extend our previous work to propose the TOPS2 descriptor, and an accompanying recognition framework, THOR2, inspired by a human reasoning mechanism known as object unity. We interleave color embeddings obtained using the Mapper algorithm for topological soft clustering with the shape-based TOPS descriptor to obtain the TOPS2 descriptor. THOR2, trained using synthetic data, achieves substantially higher recognition accuracy than the shape-based THOR framework and outperforms RGB-D ViT on two real-world datasets: the benchmark OCID dataset and the UW-IS Occluded dataset. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.
Time series forecasting has received a lot of attention with recurrent neural networks (RNNs) being one of the widely used models due to their ability to handle sequential data. Prior studies of RNNs for time series forecasting yield inconsistent results with limited insights as to why the performance varies for different datasets. In this paper, we provide an approach to link the characteristics of time series with the components of RNNs via the versatile metric of distance correlation. This metric allows us to examine the information flow through the RNN activation layers to be able to interpret and explain their performance. We empirically show that the RNN activation layers learn the lag structures of time series well. However, they gradually lose this information over a span of a few consecutive layers, thereby worsening the forecast quality for series with large lag structures. We also show that the activation layers cannot adequately model moving average and heteroskedastic time series processes. Last, we generate heatmaps for visual comparisons of the activation layers for different choices of the network hyperparameters to identify which of them affect the forecast performance. Our findings can, therefore, aid practitioners in assessing the effectiveness of RNNs for given time series data without actually training and evaluating the networks.
Recognition of occluded objects in unseen and unstructured indoor environments is a challenging problem for mobile robots. To address this challenge, we propose a new descriptor, TOPS, for point clouds generated from depth images and an accompanying recognition framework, THOR, inspired by human reasoning. The descriptor employs a novel slicing-based approach to compute topological features from filtrations of simplicial complexes using persistent homology, and facilitates reasoning-based recognition using object unity. Apart from a benchmark dataset, we report performance on a new dataset, the UW Indoor Scenes (UW-IS) Occluded dataset, curated using commodity hardware to reflect real-world scenarios with different environmental conditions and degrees of object occlusion. THOR outperforms state-of-the-art methods on both the datasets and achieves substantially higher recognition accuracy for all the scenarios of the UW-IS Occluded dataset. Therefore, THOR, is a promising step toward robust recognition in low-cost robots, meant for everyday use in indoor settings.
Bird's Eye View (BEV) representations are tremendously useful for perception-related automated driving tasks. However, generating BEVs from surround-view fisheye camera images is challenging due to the strong distortions introduced by such wide-angle lenses. We take the first step in addressing this challenge and introduce a baseline, F2BEV, to generate BEV height maps and semantic segmentation maps from fisheye images. F2BEV consists of a distortion-aware spatial cross attention module for querying and consolidating spatial information from fisheye image features in a transformer-style architecture followed by a task-specific head. We evaluate single-task and multi-task variants of F2BEV on our synthetic FB-SSEM dataset, all of which generate better BEV height and segmentation maps (in terms of the IoU) than a state-of-the-art BEV generation method operating on undistorted fisheye images. We also demonstrate height map generation from real-world fisheye images using F2BEV. An initial sample of our dataset is publicly available at https://tinyurl.com/58jvnscy
Many complex vehicular systems, such as large marine vessels, contain confined spaces like water tanks, which are critical for the safe functioning of the vehicles. It is particularly hazardous for humans to inspect such spaces due to limited accessibility, poor visibility, and unstructured configuration. While robots provide a viable alternative, they encounter the same set of challenges in realizing robust autonomy. In this work, we specifically address the problem of detecting foreign object debris (FODs) left inside the confined spaces using a visual mapping-based system that relies on Mahalanobis distance-driven comparisons between the nominal and online maps for local outlier identification. Simulation trials show extremely high recall but low precision for the outlier identification method. The assistance of remote humans is, therefore, taken to deal with the precision problem by going over the close-up robot camera images of the outlier regions. An online survey is conducted to show the usefulness of this assistance process. Physical experiments are also reported on a GPU-enabled mobile robot platform inside a scaled-down, prototype tank to demonstrate the feasibility of the FOD detection system.
Recognition of occluded objects in unseen indoor environments is a challenging problem for mobile robots. This work proposes a new slicing-based topological descriptor that captures the 3D shape of object point clouds to address this challenge. It yields similarities between the descriptors of the occluded and the corresponding unoccluded objects, enabling object unity-based recognition using a library of trained models. The descriptor is obtained by partitioning an object's point cloud into multiple 2D slices and constructing filtrations (nested sequences of simplicial complexes) on the slices to mimic further slicing of the slices, thereby capturing detailed shapes through persistent homology-generated features. We use nine different sequences of cluttered scenes from a benchmark dataset for performance evaluation. Our method outperforms two state-of-the-art deep learning-based point cloud classification methods, namely, DGCNN and SimpleView.
Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots. In this letter, we propose the use of topologically persistent features, which rely on the shape information of the objects, to address this challenge. In particular, we extract two kinds of features, namely, sparse persistence image (PI) and amplitude, by applying persistent homology to multi-directional height function-based filtrations of the cubical complexes representing the object segmentation maps. The features are then used to train a fully connected network for recognition. For performance evaluation, in addition to a widely-used shape dataset, we collect a new dataset comprising scene images from two different environments, namely, a living room and a mock warehouse. The scenes in both the environments include up to five different objects that are chosen from a given set of fourteen objects. The objects have varying poses and arrangements, and are imaged under different illumination conditions and camera poses. The recognition performance of our methods, which are trained using the living room images, remains relatively unaffected on the unseen warehouse images. In contrast, the performance of the state-of-the-art Faster R-CNN method decreases significantly. In fact, the use of sparse PI features yields higher overall recall and accuracy; and, better F1 scores on many of the individual object classes. We also implement the proposed method on a real-world robot to demonstrate its usefulness.