Model-free policy learning has been shown to be capable of learning manipulation policies which can solve long-time horizon tasks using single-step manipulation primitives. However, training these policies is a time-consuming process requiring large amounts of data. We propose the Local Dynamics Model (LDM) which efficiently learns the state-transition function for these manipulation primitives. By combining the LDM with model-free policy learning, we can learn policies which can solve complex manipulation tasks using one-step lookahead planning. We show that the LDM is both more sample-efficient and outperforms other model architectures. When combined with planning, we can outperform other model-based and model-free policies on several challenging manipulation tasks in simulation.
Roof-mounted spinning LiDAR sensors are widely used by autonomous vehicles, driving the need for real-time processing of 3D point sequences. However, most LiDAR semantic segmentation datasets and algorithms split these acquisitions into $360^\circ$ frames, leading to acquisition latency that is incompatible with realistic real-time applications and evaluations. We address this issue with two key contributions. First, we introduce HelixNet, a $10$ billion point dataset with fine-grained labels, timestamps, and sensor rotation information that allows an accurate assessment of real-time readiness of segmentation algorithms. Second, we propose Helix4D, a compact and efficient spatio-temporal transformer architecture specifically designed for rotating LiDAR point sequences. Helix4D operates on acquisition slices that correspond to a fraction of a full rotation of the sensor, significantly reducing the total latency. We present an extensive benchmark of the performance and real-time readiness of several state-of-the-art models on HelixNet and SemanticKITTI. Helix4D reaches accuracy on par with the best segmentation algorithms with a reduction of more than $5\times$ in terms of latency and $50\times$ in model size. Code and data are available at: https://romainloiseau.fr/helixnet
Automatic dietary monitoring has progressed significantly during the last years, offering a variety of solutions, both in terms of sensors and algorithms as well as in terms of what aspect or parameters of eating behavior are measured and monitored. Automatic detection of eating based on chewing sounds has been studied extensively, however, it requires a microphone to be mounted on the subject's head for capturing the relevant sounds. In this work, we evaluate the feasibility of using an off-the-shelf commercial device, the Razer Anzu smart-glasses, for automatic chewing detection. The smart-glasses are equipped with stereo speakers and microphones that communicate with smart-phones via Bluetooth. The microphone placement is not optimal for capturing chewing sounds, however, we find that it does not significantly affect the detection effectiveness. We apply an algorithm from literature with some adjustments on a challenging dataset that we have collected in house. Leave-one-subject-out experiments yield promising results, with an F1-score of 0.96 for the best case of duration-based evaluation of eating time.
Nowadays, yoga has become a part of life for many people. Exercises and sports technological assistance is implemented in yoga pose identification. In this work, a self-assistance based yoga posture identification technique is developed, which helps users to perform Yoga with the correction feature in Real-time. The work also presents Yoga-hand mudra (hand gestures) identification. The YOGI dataset has been developed which include 10 Yoga postures with around 400-900 images of each pose and also contain 5 mudras for identification of mudras postures. It contains around 500 images of each mudra. The feature has been extracted by making a skeleton on the body for yoga poses and hand for mudra poses. Two different algorithms have been used for creating a skeleton one for yoga poses and the second for hand mudras. Angles of the joints have been extracted as a features for different machine learning and deep learning models. among all the models XGBoost with RandomSearch CV is most accurate and gives 99.2\% accuracy. The complete design framework is described in the present paper.
Complete surgical resection of the tumor for Head and neck squamous cell carcinoma (HNSCC) remains challenging, given the devastating side effects of aggressive surgery and the anatomic proximity to vital structures. To address the clinical challenges, we introduce a wide-field, label-free imaging tool that can assist surgeons delineate tumor margins real-time. We assume that autofluorescence lifetime is a natural indicator of the health level of tissues, and ratio-metric measurement of the emission-decay state to the emission-peak state of excited fluorophores will enable rapid lifetime mapping of tissues. Here, we describe the principle, instrumentation, characterization of the imager and the intraoperative imaging of resected tissues from 13 patients undergoing head and neck cancer resection. 20 x 20 mm2 imaging takes 2 second/frame with a working distance of 50 mm, and characterization shows that the spatial resolution reached 70 {\mu}m and the least distinguishable fluorescence lifetime difference is 0.14 ns. Tissue imaging and Hematoxylin-Eosin stain slides comparison reveals its capability of delineating cancerous boundaries with submillimeter accuracy and a sensitivity of 91.86% and specificity of 84.38%.
Time series models with recurrent neural networks (RNNs) can have high accuracy but are unfortunately difficult to interpret as a result of feature-interactions, temporal-interactions, and non-linear transformations. Interpretability is important in domains like healthcare where constructing models that provide insight into the relationships they have learned are required to validate and trust model predictions. We want accurate time series models where users can understand the contribution of individual input features. We present the Interpretable-RNN (I-RNN) that balances model complexity and accuracy by forcing the relationship between variables in the model to be additive. Interactions are restricted between hidden states of the RNN and additively combined at the final step. I-RNN specifically captures the unique characteristics of clinical time series, which are unevenly sampled in time, asynchronously acquired, and have missing data. Importantly, the hidden state activations represent feature coefficients that correlate with the prediction target and can be visualized as risk curves that capture the global relationship between individual input features and the outcome. We evaluate the I-RNN model on the Physionet 2012 Challenge dataset to predict in-hospital mortality, and on a real-world clinical decision support task: predicting hemodynamic interventions in the intensive care unit. I-RNN provides explanations in the form of global and local feature importances comparable to highly intelligible models like decision trees trained on hand-engineered features while significantly outperforming them. I-RNN remains intelligible while providing accuracy comparable to state-of-the-art decay-based and interpolation-based recurrent time series models. The experimental results on real-world clinical datasets refute the myth that there is a tradeoff between accuracy and interpretability.
Laser inter-satellite links (LISLs) between satellites in a free-space optical satellite network (FSOSN) can be divided into two classes: permanent LISLs (PLs) and temporary LISLs (TLs). TLs are not desirable in next-generation FSOSNs (NG-FSOSNs) due to high LISL setup time, but they may become feasible in next-next-generation FSOSNs (NNG-FSOSNs). Using the satellite constellation for Phase I of Starlink, we study the impact of TLs on network latency in an NG-FSOSN (which has only PLs) versus an NNG-FSOSN (which has PLs and TLs) under different long-distance inter-continental data communications scenarios, including Sydney-Sao Paulo, Toronto-Istanbul, Madrid-Tokyo, and New York-Jakarta, and different LISL ranges for satellites, including 659.5 km, 1,319 km, 1,500 km, 1,700 km, 2,500 km, 3,500 km, and 5,016 km. It is observed from the results that TLs provide higher satellite connectivity and thereby higher network connectivity, and they lead to lower average network latency for the NNG-FSOSN compared to the NG-FSOSN in all scenarios at all LISL ranges. In comparison with the NG-FSOSN, the improvement in latency with the NNG-FSOSN is significant at LISL ranges of 1,500 km, 1,700 km, and 2,500 km, where the improvement is 16.83 ms, 23.43 ms, and 18.20 ms, respectively, for the Sydney-Sao Paulo inter-continental connection. For the Toronto-Istanbul, Madrid-Tokyo, and New York-Jakarta inter-continental connections, the improvement is 14.58 ms, 23.35 ms, and 23.52 ms, respectively, at the 1,700 km LISL range.
Time series prediction is the crucial task for many human activities e.g. weather forecasts or predicting stock prices. One solution to this problem is to use Recurrent Neural Networks (RNNs). Although they can yield accurate predictions, their learning process is slow and complex. Here we propose a Quantum Recurrent Neural Network (QRNN) to address these obstacles. The design of the network is based on the continuous-variable quantum computing paradigm. We demonstrate that the network is capable of learning time dependence of a few types of temporal data. Our numerical simulations show that the QRNN converges to optimal weights in fewer epochs than the classical network. Furthermore, for a small number of trainable parameters it can achieve lower loss than the latter.
Multi-beam LiDAR sensors are increasingly used in robotics, particularly for autonomous cars for localization and perception tasks. However, perception is closely linked to the localization task and the robot's ability to build a fine map of its environment. For this, we propose a new real-time LiDAR odometry method called CT-ICP, as well as a complete SLAM with loop closure. The principle of CT-ICP is to use an elastic formulation of the trajectory, with a continuity of poses intra-scan and discontinuity between scans, to be more robust to high frequencies in the movements of the sensor. The registration is based on scan-to-map with a dense point cloud as map structured in sparse voxels to operate in real time. At the same time, a fast method of loop closure detection using elevation images and an optimization of poses by graph allows to obtain a complete SLAM purely on LiDAR. To show the robustness of the method, we tested it on seven datasets: KITTI, KITTI-raw, KITTI-360, KITTI-CARLA, ParisLuco, Newer College, and NCLT in driving and high-frequency motion scenarios. The CT-ICP odometry is implemented in C++ and available online. The loop detection and pose graph optimization is in the framework pyLiDAR-SLAM in Python and also available online. CT-ICP is currently first, among those giving access to a public code, on the KITTI odometry leaderboard, with an average Relative Translation Error (RTE) of 0.59% and an average time per scan of 60ms on a CPU with a single thread.
Deep Neural Networks have been used in a wide variety of applications with significant success. However, their highly complex nature owing to comprising millions of parameters has lead to problems during deployment in pipelines with low latency requirements. As a result, it is more desirable to obtain lightweight neural networks which have the same performance during inference time. In this work, we propose a weight based pruning approach in which the weights are pruned gradually based on their momentum of the previous iterations. Each layer of the neural network is assigned an importance value based on their relative sparsity, followed by the magnitude of the weight in the previous iterations. We evaluate our approach on networks such as AlexNet, VGG16 and ResNet50 with image classification datasets such as CIFAR-10 and CIFAR-100. We found that the results outperformed the previous approaches with respect to accuracy and compression ratio. Our method is able to obtain a compression of 15% for the same degradation in accuracy on both the datasets.