The efficiency of sampling-based motion planning brings wide application in autonomous mobile robots. Conventional rapidly exploring random tree (RRT) algorithm and its variants have gained great successes, but there are still challenges for the real-time optimal motion planning of mobile robots in dynamic environments. In this paper, based on Bidirectional RRT (Bi-RRT) and the use of an assisting metric (AM), we propose a novel motion planning algorithm, namely Bi-AM-RRT*. Different from the existing RRT-based methods, the AM is introduced in this paper to optimize the performance of robot motion planning in dynamic environments with obstacles. On this basis, the bidirectional search sampling strategy is employed, in order to increase the planning efficiency. Further, we present an improved rewiring method to shorten path lengths. The effectiveness and efficiency of the proposed Bi-AM-RRT* are proved through comparative experiments in different environments. Experimental results show that the Bi-AM-RRT* algorithm can achieve better performance in terms of path length and search time.
We investigate the applicability of U-Net based models for segmenting Urinary Bladder (UB) in male pelvic view UltraSound (US) images. The segmentation of UB in the US image aids radiologists in diagnosing the UB. However, UB in US images has arbitrary shapes, indistinct boundaries and considerably large inter- and intra-subject variability, making segmentation a quite challenging task. Our study of the state-of-the-art (SOTA) segmentation network, U-Net, for the problem reveals that it often fails to capture the salient characteristics of UB due to the varying shape and scales of anatomy in the noisy US image. Also, U-net has an excessive number of trainable parameters, reporting poor computational efficiency during training. We propose a Slim U-Net to address the challenges of UB segmentation. Slim U-Net proposes to efficiently preserve the salient features of UB by reshaping the structure of U-Net using a less number of 2D convolution layers in the contracting path, in order to preserve and impose them on expanding path. To effectively distinguish the blurred boundaries, we propose a novel annotation methodology, which includes the background area of the image at the boundary of a marked region of interest (RoI), thereby steering the model's attention towards boundaries. In addition, we suggested a combination of loss functions for network training in the complex segmentation of UB. The experimental results demonstrate that Slim U-net is statistically superior to U-net for UB segmentation. The Slim U-net further decreases the number of trainable parameters and training time by 54% and 57.7%, respectively, compared to the standard U-Net, without compromising the segmentation accuracy.
Knowledge tracing (KT) is the problem of predicting students' future performance based on their historical interactions with intelligent tutoring systems. Recently, many works present lots of special methods for applying deep neural networks to KT from different perspectives like model architecture, adversarial augmentation and etc., which make the overall algorithm and system become more and more complex. Furthermore, due to the lack of standardized evaluation protocol \citep{liu2022pykt}, there is no widely agreed KT baselines and published experimental comparisons become inconsistent and self-contradictory, i.e., the reported AUC scores of DKT on ASSISTments2009 range from 0.721 to 0.821 \citep{minn2018deep,yeung2018addressing}. Therefore, in this paper, we provide a strong but simple baseline method to deal with the KT task named \textsc{simpleKT}. Inspired by the Rasch model in psychometrics, we explicitly model question-specific variations to capture the individual differences among questions covering the same set of knowledge components that are a generalization of terms of concepts or skills needed for learners to accomplish steps in a task or a problem. Furthermore, instead of using sophisticated representations to capture student forgetting behaviors, we use the ordinary dot-product attention function to extract the time-aware information embedded in the student learning interactions. Extensive experiments show that such a simple baseline is able to always rank top 3 in terms of AUC scores and achieve 57 wins, 3 ties and 16 loss against 12 DLKT baseline methods on 7 public datasets of different domains. We believe this work serves as a strong baseline for future KT research. Code is available at \url{https://github.com/pykt-team/pykt-toolkit}\footnote{We merged our model to the \textsc{pyKT} benchmark at \url{https://pykt.org/}.}.
$Context.$ Core-collapse supernovae (CCSNe) are expected to emit gravitational wave signals that could be detected by current and future generation interferometers within the Milky Way and nearby galaxies. The stochastic nature of the signal arising from CCSNe requires alternative detection methods to matched filtering. $Aims.$ We aim to show the potential of machine learning (ML) for multi-label classification of different CCSNe simulated signals and noise transients using real data. We compared the performance of 1D and 2D convolutional neural networks (CNNs) on single and multiple detector data. For the first time, we tested multi-label classification also with long short-term memory (LSTM) networks. $Methods.$ We applied a search and classification procedure for CCSNe signals, using an event trigger generator, the Wavelet Detection Filter (WDF), coupled with ML. We used time series and time-frequency representations of the data as inputs to the ML models. To compute classification accuracies, we simultaneously injected, at detectable distance of 1\,kpc, CCSN waveforms, obtained from recent hydrodynamical simulations of neutrino-driven core-collapse, onto interferometer noise from the O2 LIGO and Virgo science run. $Results.$ We compared the performance of the three models on single detector data. We then merged the output of the models for single detector classification of noise and astrophysical transients, obtaining overall accuracies for LIGO ($\sim99\%$) and ($\sim80\%$) for Virgo. We extended our analysis to the multi-detector case using triggers coincident among the three ITFs and achieved an accuracy of $\sim98\%$.
Deep audio representation learning using multi-modal audio-visual data often leads to a better performance compared to uni-modal approaches. However, in real-world scenarios both modalities are not always available at the time of inference, leading to performance degradation by models trained for multi-modal inference. In this work, we propose a novel approach for deep audio representation learning using audio-visual data when the video modality is absent at inference. For this purpose, we adopt teacher-student knowledge distillation under the framework of learning using privileged information (LUPI). While the previous methods proposed for LUPI use soft-labels generated by the teacher, in our proposed method we use embeddings learned by the teacher to train the student network. We integrate our method in two different settings: sequential data where the features are divided into multiple segments throughout time, and non-sequential data where the entire features are treated as one whole segment. In the non-sequential setting both the teacher and student networks are comprised of an encoder component and a task header. We use the embeddings produced by the encoder component of the teacher to train the encoder of the student, while the task header of the student is trained using ground-truth labels. In the sequential setting, the networks have an additional aggregation component that is placed between the encoder and task header. We use two sets of embeddings produced by the encoder and aggregation component of the teacher to train the student. Similar to the non-sequential setting, the task header of the student network is trained using ground-truth labels. We test our framework on two different audio-visual tasks, namely speaker recognition and speech emotion recognition and show considerable improvements over sole audio-based recognition as well as prior works that use LUPI.
Knowledge tracing (KT) serves as a primary part of intelligent education systems. Most current KTs either rely on expert judgments or only exploit a single network structure, which affects the full expression of learning features. To adequately mine features of students' learning process, Deep Knowledge Tracing Based on Spatial and Temporal Deep Representation Learning for Learning Performance Prediction (DKT-STDRL) is proposed in this paper. DKT-STDRL extracts spatial features from students' learning history sequence, and then further extracts temporal features to extract deeper hidden information. Specifically, firstly, the DKT-STDRL model uses CNN to extract the spatial feature information of students' exercise sequences. Then, the spatial features are connected with the original students' exercise features as joint learning features. Then, the joint features are input into the BiLSTM part. Finally, the BiLSTM part extracts the temporal features from the joint learning features to obtain the prediction information of whether the students answer correctly at the next time step. Experiments on the public education datasets ASSISTment2009, ASSISTment2015, Synthetic-5, ASSISTchall, and Statics2011 prove that DKT-STDRL can achieve better prediction effects than DKT and CKT.
Miscommunication and communication challenges between instructors and students represents one of the primary barriers to post-secondary learning. Students often avoid or miss opportunities to ask questions during office hours due to insecurities or scheduling conflicts. Moreover, students need to work at their own pace to have the freedom and time for the self-contemplation needed to build conceptual understanding and develop creative thinking skills. To eliminate barriers to student engagement, academic institutions need to redefine their fundamental approach to education by proposing flexible educational pathways that recognize continuous learning. To this end, we developed an AI-augmented intelligent educational assistance framework based on a power language model (i.e., GPT-3) that automatically generates course-specific intelligent assistants regardless of discipline or academic level. The virtual intelligent teaching assistant (TA) system will serve as a voice-enabled helper capable of answering course-specific questions concerning curriculum, logistics and course policies. It is envisioned to improve access to course-related information for the students and reduce logistical workload for the instructors and TAs. Its GPT-3-based knowledge discovery component as well as the generalized system architecture is presented accompanied by a methodical evaluation of the system accuracy and performance.
Physics-Informed Neural Networks (PINNs) are Neural Network architectures trained to emulate solutions of differential equations without the necessity of solution data. They are currently ubiquitous in the scientific literature due to their flexible and promising settings. However, very little of the available research provides practical studies that aim for a better quantitative understanding of such architecture and its functioning. In this paper, we analyze the performance of PINNs for various architectural hyperparameters and algorithmic settings based on a novel error metric and other factors such as training time. The proposed metric and approach are tailored to evaluate how well a PINN generalizes to points outside its training domain. Besides, we investigate the effect of the algorithmic setup on the outcome prediction of a PINN, inside and outside its training domain, to explore the effect of each hyperparameter. Through our study, we assess how the algorithmic setup of PINNs influences their potential for generalization and deduce the settings which maximize the potential of a PINN for accurate generalization. The study that we present returns insightful and at times counterintuitive results on PINNs. These results can be useful in PINN applications when defining the model and evaluating it.
The accurate identification and precise localization of cephalometric landmarks enable the classification and quantification of anatomical abnormalities. The traditional way of marking cephalometric landmarks on lateral cephalograms is a monotonous and time-consuming job. Endeavours to develop automated landmark detection systems have persistently been made, however, they are inadequate for orthodontic applications due to unavailability of a reliable dataset. We proposed a new state-of-the-art dataset to facilitate the development of robust AI solutions for quantitative morphometric analysis. The dataset includes 1000 lateral cephalometric radiographs (LCRs) obtained from 7 different radiographic imaging devices with varying resolutions, making it the most diverse and comprehensive cephalometric dataset to date. The clinical experts of our team meticulously annotated each radiograph with 29 cephalometric landmarks, including the most significant soft tissue landmarks ever marked in any publicly available dataset. Additionally, our experts also labelled the cervical vertebral maturation (CVM) stage of the patient in a radiograph, making this dataset the first standard resource for CVM classification. We believe that this dataset will be instrumental in the development of reliable automated landmark detection frameworks for use in orthodontics and beyond.
The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons. This work takes a step in this direction by investigating contextual linear bandits where the current reward depends on at most $s$ prior actions and contexts (not necessarily consecutive), up to a time horizon of $h$. In order to avoid polynomial dependence on $h$, we propose new algorithms that leverage sparsity to discover the dependence pattern and arm parameters jointly. We consider both the data-poor ($T<h$) and data-rich ($T\ge h$) regimes, and derive respective regret upper bounds $\tilde O(d\sqrt{sT} +\min\{ q, T\})$ and $\tilde O(\sqrt{sdT})$, with sparsity $s$, feature dimension $d$, total time horizon $T$, and $q$ that is adaptive to the reward dependence pattern. Complementing upper bounds, we also show that learning over a single trajectory brings inherent challenges: While the dependence pattern and arm parameters form a rank-1 matrix, circulant matrices are not isometric over rank-1 manifolds and sample complexity indeed benefits from the sparse reward dependence structure. Our results necessitate a new analysis to address long-range temporal dependencies across data and avoid polynomial dependence on the reward horizon $h$. Specifically, we utilize connections to the restricted isometry property of circulant matrices formed by dependent sub-Gaussian vectors and establish new guarantees that are also of independent interest.