Recently, there has been an increased interest in NeRF methods which reconstruct differentiable representation of three-dimensional scenes. One of the main limitations of such methods is their inability to assess the confidence of the model in its predictions. In this paper, we propose a new neural network model for the formation of extended vector representations, called uSF, which allows the model to predict not only color and semantic label of each point, but also estimate the corresponding values of uncertainty. We show that with a small number of images available for training, a model quantifying uncertainty performs better than a model without such functionality. Code of the uSF approach is publicly available at https://github.com/sevashasla/usf/.
Visual object navigation using learning methods is one of the key tasks in mobile robotics. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on a regular (backward) or delayed (forward) image sequence. We have implemented this representation into a full-fledged navigation approach called SkillTron, which can select robot skills from end-to-end policies based on reinforcement learning and classic map-based planning methods. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation. We conducted intensive experiments with the proposed approach in the Habitat environment, which showed a significant superiority in navigation quality metrics compared to state-of-the-art approaches. The developed code and used custom datasets are publicly available at github.com/AIRI-Institute/skill-fusion.
This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss function. We studied this model on datasets collected in the photorealistic Habitat and the synthetic AI2-THOR Simulators. We showed that obtaining additional images using the agent's actions in an indoor environment can improve the quality of semantic segmentation. The code of the proposed approach and datasets are publicly available at https://github.com/wingrune/SegmATRon.
Originally developed for natural language problems, transformer models have recently been widely used in offline reinforcement learning tasks. This is because the agent's history can be represented as a sequence, and the whole task can be reduced to the sequence modeling task. However, the quadratic complexity of the transformer operation limits the potential increase in context. Therefore, different versions of the memory mechanism are used to work with long sequences in a natural language. This paper proposes the Recurrent Memory Decision Transformer (RMDT), a model that uses a recurrent memory mechanism for reinforcement learning problems. We conduct thorough experiments on Atari games and MuJoCo control problems and show that our proposed model is significantly superior to its counterparts without the recurrent memory mechanism on Atari games. We also carefully study the effect of memory on the performance of the proposed model. These findings shed light on the potential of incorporating recurrent memory mechanisms to improve the performance of large-scale transformer models in offline reinforcement learning tasks. The Recurrent Memory Decision Transformer code is publicly available in the repository \url{https://anonymous.4open.science/r/RMDT-4FE4}.
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network. In other words, the detecting network must be confident enough about its predictions. In this paper, we present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer that works on fast pillar-based models in the same way a voxelizer works on slow voxel-based models. In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects that significantly improves model precision in a negligible time and computing cost. The developed code is publicly available at: https://github.com/YoushaaMurhij/RVCDet.
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and mapping. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habitat simulator, in which it is possible to generate photorealistic indoor scenes using both own sensor data and open datasets, such as Matterport3D. To study the main stages of solving the place recognition problem on the HPointLoc dataset, we proposed a new modular approach named as PNTR. It first performs an image retrieval with the Patch-NetVLAD method, then extracts keypoints and matches them using R2D2, LoFTR or SuperPoint with SuperGlue, and finally performs a camera pose optimization step with TEASER++. Such a solution to the place recognition problem has not been previously studied in existing publications. The PNTR approach has shown the best quality metrics on the HPointLoc dataset and has a high potential for real use in localization systems for unmanned vehicles. The proposed dataset and framework are publicly available: https://github.com/metra4ok/HPointLoc.
In the problem of quantum channel discrimination, one distinguishes between a given number of quantum channels, which is done by sending an input state through a channel and measuring the output state. This work studies applications of variational quantum circuits and machine learning techniques for discriminating such channels. In particular, we explore (i) the practical implementation of embedding this task into the framework of variational quantum computing, (ii) training a quantum classifier based on variational quantum circuits, and (iii) applying the quantum kernel estimation technique. For testing these three channel discrimination approaches, we considered a pair of entanglement-breaking channels and the depolarizing channel with two different depolarization factors. For the approach (i), we address solving the quantum channel discrimination problem using widely discussed parallel and sequential strategies. We show the advantage of the latter in terms of better convergence with less quantum resources. Quantum channel discrimination with a variational quantum classifier (ii) allows one to operate even with random and mixed input states and simple variational circuits. The kernel-based classification approach (iii) is also found effective as it allows one to discriminate depolarizing channels associated not with just fixed values of the depolarization factor, but with ranges of it. Additionally, we discovered that a simple modification of one of the commonly used kernels significantly increases the efficiency of this approach. Finally, our numerical findings reveal that the performance of variational methods of channel discrimination depends on the trace of the product of the output states. These findings demonstrate that quantum machine learning can be used to discriminate channels, such as those representing physical noise processes.
In this paper, we present a real-time 3D detection approach considering time-spatial feature map aggregation from different time steps of deep neural model inference (named feature map flow, FMF). Proposed approach improves the quality of 3D detection center-based baseline and provides real-time performance on the nuScenes and Waymo benchmark. Code is available at https://github.com/YoushaaMurhij/FMFNet
Solutions to many-body problem instances often involve an intractable number of degrees of freedom and admit no known approximations in general form. In practice, representing quantum-mechanical states of a given Hamiltonian using available numerical methods, in particular those based on variational Monte Carlo simulations, become exponentially more challenging with increasing system size. Recently quantum algorithms implemented as variational models, have been proposed to accelerate such simulations. The variational ansatz states are characterized by a polynomial number of parameters devised in a way to minimize the expectation value of a given Hamiltonian, which is emulated by local measurements. In this study, we develop a means to certify the termination of variational algorithms. We demonstrate our approach by applying it to three models: the transverse field Ising model, the model of one-dimensional spinless fermions with competing interactions and the Schwinger model of quantum electrodynamics. By means of comparison, we observe that our approach shows better performance near critical points in these models. We hence take a further step to improve the applicability and to certify the results of variational quantum simulators.