Gaussian processes are probabilistic models that are commonly used as functional priors in machine learning. Due to their probabilistic nature, they can be used to capture the prior information on the statistics of noise, smoothness of the functions, and training data uncertainty. However, their computational complexity quickly becomes intractable as the size of the data set grows. We propose a Hilbert space approximation-based quantum algorithm for Gaussian process regression to overcome this limitation. Our method consists of a combination of classical basis function expansion with quantum computing techniques of quantum principal component analysis, conditional rotations, and Hadamard and Swap tests. The quantum principal component analysis is used to estimate the eigenvalues while the conditional rotations and the Hadamard and Swap tests are employed to evaluate the posterior mean and variance of the Gaussian process. Our method provides polynomial computational complexity reduction over the classical method.
This paper presents sandra, a neuro-symbolic reasoner combining vectorial representations with deductive reasoning. Sandra builds a vector space constrained by an ontology and performs reasoning over it. The geometric nature of the reasoner allows its combination with neural networks, bridging the gap with symbolic knowledge representations. Sandra is based on the Description and Situation (DnS) ontology design pattern, a formalization of frame semantics. Given a set of facts (a situation) it allows to infer all possible perspectives (descriptions) that can provide a plausible interpretation for it, even in presence of incomplete information. We prove that our method is correct with respect to the DnS model. We experiment with two different tasks and their standard benchmarks, demonstrating that, without increasing complexity, sandra (i) outperforms all the baselines (ii) provides interpretability in the classification process, and (iii) allows control over the vector space, which is designed a priori.
Self-supervised features are typically used in place of filter-banks in speaker verification models. However, these models were originally designed to ingest filter-banks as inputs, and thus, training them on top of self-supervised features assumes that both feature types require the same amount of learning for the task. In this work, we observe that pre-trained self-supervised speech features inherently include information required for downstream speaker verification task, and therefore, we can simplify the downstream model without sacrificing performance. To this end, we revisit the design of the downstream model for speaker verification using self-supervised features. We show that we can simplify the model to use 97.51% fewer parameters while achieving a 29.93% average improvement in performance on SUPERB. Consequently, we show that the simplified downstream model is more data efficient compared to baseline--it achieves better performance with only 60% of the training data.
The bispectrum stands out as a revolutionary tool in frequency domain analysis, leaping the usual power spectrum by capturing crucial phase information between frequency components. In our innovative study, we have utilized the bispectrum to analyze and decode complex grasping movements, gathering EEG data from five human subjects. We put this data through its paces with three classifiers, focusing on both magnitude and phase-related features. The results highlight the bispectrum's incredible ability to delve into neural activity and differentiate between various grasping motions with the Support Vector Machine (SVM) classifier emerging as a standout performer. In binary classification, it achieved a remarkable 97\% accuracy in identifying power grasp, and in the more complex multiclass tasks, it maintained an impressive 94.93\% accuracy. This finding not only underscores the bispectrum's analytical strength but also showcases the SVM's exceptional capability in classification, opening new doors in our understanding of movement and neural dynamics.
Recommender Systems (RS) provide a relevant tool to mitigate the information overload problem. A large number of researchers have published hundreds of papers to improve different RS features. It is advisable to use RS frameworks that simplify RS researchers: a) to design and implement recommendations methods and, b) to speed up the execution time of the experiments. In this paper, we present CF4J, a Java library designed to carry out Collaborative Filtering based RS research experiments. CF4J has been designed from researchers to researchers. It allows: a) RS datasets reading, b) full and easy access to data and intermediate or final results, c) to extend their main functionalities, d) to concurrently execute the implemented methods, and e) to provide a thorough evaluation for the implementations by quality measures. In summary, CF4J serves as a library specifically designed for the research trial and error process.
It is well-known that there is no universal metric for image quality evaluation. In this case, distortion-specific metrics can be more reliable. The artifact imposed by image compression can be considered as a combination of various distortions. Depending on the image context, this combination can be different. As a result, Generalization can be regarded as the major challenge in compressed image quality assessment. In this approach, stacking is employed to provide a reliable method. Both semantic and low-level information are employed in the presented IQA to predict the human visual system. Moreover, the results of the Full-Reference (FR) and No-Reference (NR) models are aggregated to improve the proposed Full-Reference method for compressed image quality evaluation. The accuracy of the quality benchmark of the clic2024 perceptual image challenge was achieved 79.6\%, which illustrates the effectiveness of the proposed fusion-based approach.
Online Judge (OJ) systems are typically considered within programming-related courses as they yield fast and objective assessments of the code developed by the students. Such an evaluation generally provides a single decision based on a rubric, most commonly whether the submission successfully accomplished the assignment. Nevertheless, since in an educational context such information may be deemed insufficient, it would be beneficial for both the student and the instructor to receive additional feedback about the overall development of the task. This work aims to tackle this limitation by considering the further exploitation of the information gathered by the OJ and automatically inferring feedback for both the student and the instructor. More precisely, we consider the use of learning-based schemes -- particularly, multi-instance learning (MIL) and classical machine learning formulations -- to model student behavior. Besides, explainable artificial intelligence (XAI) is contemplated to provide human-understandable feedback. The proposal has been evaluated considering a case of study comprising 2500 submissions from roughly 90 different students from a programming-related course in a computer science degree. The results obtained validate the proposal: The model is capable of significantly predicting the user outcome (either passing or failing the assignment) solely based on the behavioral pattern inferred by the submissions provided to the OJ. Moreover, the proposal is able to identify prone-to-fail student groups and profiles as well as other relevant information, which eventually serves as feedback to both the student and the instructor.
This paper proposes a Simultaneous Calibration and Navigation (SCAN) algorithm of a multiple Ultrasonic Local Positioning Systems (ULPSs) that cover an extensive indoor area. The idea is the development of the same concept than SLAM (Simultaneous Localization and Mapping), in which a Mobile Robot (MR) estimates the map while it is navigating. The MR calibrates the beacons of several ULPSs while it is moving inside the localization area. The concept of calibration is the estimation of the position of the beacons referenced to a known map. The scenario is composed of some calibrated ULPSs that we denote as Globally Referenced Ultrasonic Local Positioning Systems (GRULPSs) that are located in strategic points like entrances covering the start and the end of a possible trajectory in the environment. Additionally, there are several non-calibrated ULPSs named Locally Referenced Ultrasonic Local Positioning Systems (LRULPSs) that are placed around the localization area. The proposal uses a MR with odometer for calibrating the beacons of the LRULPSs while it is navigating on their coverage area and go from one GRULPS to another. The algorithm is based on multiple filters running in parallel (one filter for each LRULPS and another one for the GRULPSs) that estimate the global and local trajectories of the MR (one trajectory for each local reference system of the LRULPSs) fusing the information related to the Ultrasound Signals (US) and the odometer of the MR. The position of the beacons of the LRULPSs are obtained by a transformation vector for each LRULPS that converts the local coordinates to the global reference system. Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF) and H-Inf Filter have been tested, in simulations and real experiments, in order to compare their performance in this case.
The increasing reliance on AI-driven solutions, particularly Large Language Models (LLMs) like the GPT series, for information retrieval highlights the critical need for their factuality and fairness, especially amidst the rampant spread of misinformation and disinformation online. Our study evaluates the factual accuracy, stability, and biases in widely adopted GPT models, including GPT-3.5 and GPT-4, contributing to reliability and integrity of AI-mediated information dissemination. We introduce 'Global-Liar,' a dataset uniquely balanced in terms of geographic and temporal representation, facilitating a more nuanced evaluation of LLM biases. Our analysis reveals that newer iterations of GPT models do not always equate to improved performance. Notably, the GPT-4 version from March demonstrates higher factual accuracy than its subsequent June release. Furthermore, a concerning bias is observed, privileging statements from the Global North over the Global South, thus potentially exacerbating existing informational inequities. Regions such as Africa and the Middle East are at a disadvantage, with much lower factual accuracy. The performance fluctuations over time suggest that model updates may not consistently benefit all regions equally. Our study also offers insights into the impact of various LLM configuration settings, such as binary decision forcing, model re-runs and temperature, on model's factuality. Models constrained to binary (true/false) choices exhibit reduced factuality compared to those allowing an 'unclear' option. Single inference at a low temperature setting matches the reliability of majority voting across various configurations. The insights gained highlight the need for culturally diverse and geographically inclusive model training and evaluation. This approach is key to achieving global equity in technology, distributing AI benefits fairly worldwide.
In recent years, weakly supervised semantic segmentation using image-level labels as supervision has received significant attention in the field of computer vision. Most existing methods have addressed the challenges arising from the lack of spatial information in these labels by focusing on facilitating supervised learning through the generation of pseudo-labels from class activation maps (CAMs). Due to the localized pattern detection of Convolutional Neural Networks (CNNs), CAMs often emphasize only the most discriminative parts of an object, making it challenging to accurately distinguish foreground objects from each other and the background. Recent studies have shown that Vision Transformer (ViT) features, due to their global view, are more effective in capturing the scene layout than CNNs. However, the use of hierarchical ViTs has not been extensively explored in this field. This work explores the use of Swin Transformer by proposing "SWTformer" to enhance the accuracy of the initial seed CAMs by bringing local and global views together. SWTformer-V1 generates class probabilities and CAMs using only the patch tokens as features. SWTformer-V2 incorporates a multi-scale feature fusion mechanism to extract additional information and utilizes a background-aware mechanism to generate more accurate localization maps with improved cross-object discrimination. Based on experiments on the PascalVOC 2012 dataset, SWTformer-V1 achieves a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models. It also yields comparable performance by 0.82% mIoU on average higher than other methods in generating initial localization maps, depending only on the classification network. SWTformer-V2 further improves the accuracy of the generated seed CAMs by 5.32% mIoU, further proving the effectiveness of the local-to-global view provided by the Swin transformer.