Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular in solving MF problems because it can deal with large data sets and is easy to do incremental learning. We observed that SGD for MF is memory bound. Meanwhile, single-node CPU systems with caching performs well only for small data sets; distributed systems have higher aggregated memory bandwidth but suffer from relatively slow network connection. This observation inspires us to accelerate MF by utilizing GPUs's high memory bandwidth and fast intra-node connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF problems. On a single CPU, we design two workload schedule schemes, i.e., batch-Hogwild! and wavefront-update that fully exploit the massive amount of cores. Especially, batch-Hogwild! as a vectorized version of Hogwild! overcomes the issue of memory discontinuity. We also develop highly-optimized kernels for SGD update, leveraging cache, warp-shuffle instructions and half-precision floats. We also design a partition scheme to utilize multiple GPUs while addressing the well-known convergence issue when parallelizing SGD. On three data sets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1X-28.2X as fast compared with state-of-art CPU solutions on 1-64 CPU nodes. Evaluations also show that cuMF_SGD scales well on multiple GPUs in large data sets.
Soft biometrics inference in surveillance scenarios is a topic of interest for various applications, particularly in security-related areas. However, soft biometric analysis is not extensively reported in wild conditions. In particular, previous works on gender recognition report their results in face datasets, with relatively good image quality and frontal poses. Given the uncertainty of the availability of the facial region in wild conditions, we consider that these methods are not adequate for surveillance settings. To overcome these limitations, we: 1) present frontal and wild face versions of three well-known surveillance datasets; and 2) propose a model that effectively and dynamically combines facial and body information, which makes it suitable for gender recognition in wild conditions. The frontal and wild face datasets derive from widely used Pedestrian Attribute Recognition (PAR) sets (PETA, PA-100K, and RAP), using a pose-based approach to filter the frontal samples and facial regions. This approach retrieves the facial region of images with varying image/subject conditions, where the state-of-the-art face detectors often fail. Our model combines facial and body information through a learnable fusion matrix and a channel-attention sub-network, focusing on the most influential body parts according to the specific image/subject features. We compare it with five PAR methods, consistently obtaining state-of-the-art results on gender recognition, and reducing the prediction errors by up to 24% in frontal samples. The announced PAR datasets versions and model serve as the basis for wild soft biometrics classification and are available in https://github.com/Tiago-Roxo.
The autonomous real-time optical navigation of planetary UAV is of the key technologies to ensure the success of the exploration. In such a GPS denied environment, vision-based localization is an optimal approach. In this paper, we proposed a multi-modal registration based SLAM algorithm, which estimates the location of a planet UAV using a nadir view camera on the UAV compared with pre-existing digital terrain model. To overcome the scale and appearance difference between on-board UAV images and pre-installed digital terrain model, a theoretical model is proposed to prove that topographic features of UAV image and DEM can be correlated in frequency domain via cross power spectrum. To provide the six-DOF of the UAV, we also developed an optimization approach which fuses the geo-referencing result into a SLAM system via LBA (Local Bundle Adjustment) to achieve robust and accurate vision-based navigation even in featureless planetary areas. To test the robustness and effectiveness of the proposed localization algorithm, a new cross-source drone-based localization dataset for planetary exploration is proposed. The proposed dataset includes 40200 synthetic drone images taken from nine planetary scenes with related DEM query images. Comparison experiments carried out demonstrate that over the flight distance of 33.8km, the proposed method achieved average localization error of 0.45 meters, compared to 1.31 meters by ORB-SLAM, with the processing speed of 12hz which will ensure a real-time performance. We will make our datasets available to encourage further work on this promising topic.
Automatic hate speech detection in online social networks is an important open problem in Natural Language Processing (NLP). Hate speech is a multidimensional issue, strongly dependant on language and cultural factors. Despite its relevance, research on this topic has been almost exclusively devoted to English. Most supervised learning resources, such as labeled datasets and NLP tools, have been created for this same language. Considering that a large portion of users worldwide speak in languages other than English, there is an important need for creating efficient approaches for multilingual hate speech detection. In this work we propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language, and to determine effective ways to achieve this. We propose a hate specific data representation and evaluate its effectiveness against general-purpose universal representations most of which, unlike our proposed model, have been trained on massive amounts of data. We focus on a cross-lingual setting, in which one needs to classify hate speech in one language without having access to any labeled data for that language. We show that the use of our simple yet specific multilingual hate representations improves classification results. We explain this with a qualitative analysis showing that our specific representation is able to capture some common patterns in how hate speech presents itself in different languages. Our proposal constitutes, to the best of our knowledge, the first attempt for constructing multilingual specific-task representations. Despite its simplicity, our model outperformed the previous approaches for most of the experimental setups. Our findings can orient future solutions toward the use of domain-specific representations.
An important topic in the community of remote sensing is how to combine the complementary information provided by the huge amount of unlabeled multi-sensor data, such as Synthetic Aperture Radar (SAR) and optical images. Recently, contrastive learning methods have reached remarkable success in obtaining meaningful feature representations from multi-view data. However, these methods only focus on the image-level features, which may not satisfy the requirement for dense prediction tasks such as land-cover mapping. In this work, we propose a new self-supervised approach for SAR-optical data-fusion that can learn disentangled pixel-wise feature representations directly by taking advantage of both multi-view contrastive learning and the BYOL. The two key contributions were proposed for this approach: multi-view contrastive loss to encode the multi-modal images and the shift operation to reconstruct learned representations for each pixel by building the local consistency between different augmented views. With the aim to validate the effectiveness of the proposed approach, we conduct experiments on the land cover mapping task, where we trained the proposed approach using unlabeled SAR-optical image pairs while labeled data pairs were used for the linear classification and finetuning evaluations. We empirically show that the presented approach outperforms the state-of-the-art methods. In particular, it achieves an improvement on both linear classification and finetuning evaluations and reduces the dimension of representations with respect to the image-level contrastive learning method. Moreover, the proposed method is also validated to bring a sharp improvement on SAR-optical feature fusion than the early fusion fashion for the land-cover mapping task.
This paper reports on the state-of-the-art in the application of multidimensional scaling (MDS) techniques to create semantic maps in linguistic research. MDS refers to a statistical technique that represents objects (lexical items, linguistic contexts, languages, etc.) as points in a space so that close similarity between the objects corresponds to close distances between the corresponding points in the representation. We focus on the recent trend to apply MDS to parallel corpus data in order to investigate a certain linguistic phenomenon from a cross-linguistic perspective. We first introduce the mathematical foundations of MDS, intended for non-experts, so that readers understand notions such as 'eigenvalues', 'dimensionality reduction', 'stress values', etc. as they appear in linguistic MDS writing. We then give an exhaustive overview of past research that employs MDS techniques in combination with parallel corpus data, and propose a set of terminology to succinctly describe the key parameters of a particular MDS application. We go over various research questions that have been answered with the aid of MDS maps, showing that the methodology covers topics in a spectrum ranging from classic typology (e.g. language classification) to formal linguistics (e.g. study of a phenomenon in a single language). We finally identify two lines of future research that build on the insights of earlier MDS research described in the paper. First, we envisage the use of MDS in the investigation of cross-linguistic variation of compositional structures, an important area in variation research that has not been approached by parallel corpus work yet. Second, we discuss how MDS can be complemented and compared with other dimensionality reduction techniques that have seen little use in the linguistic domain so far.
Physical embodiment is a required component for robots that are structurally coupled with their real-world environments. However, most socially interactive robots do not need to physically interact with their environments in order to perform their tasks. When and why should embodied robots be used instead of simpler and cheaper virtual agents? This paper reviews the existing work that explores the role of physical embodiment in socially interactive robots. This class consists of robots that are not only capable of engaging in social interaction with humans, but are using primarily their social capabilities to perform their desired functions. Socially interactive robots provide entertainment, information, and/or assistance; this last category is typically encompassed by socially assistive robotics. In all cases, such robots can achieve their primary functions without performing functional physical work. To comprehensively evaluate the existing body of work on embodiment, we first review work from established related fields including psychology, philosophy, and sociology. We then systematically review 65 studies evaluating aspects of embodiment published from 2003 to 2017 in major peer-reviewed robotics publication venues. We examine relevant aspects of the selected studies, focusing on the embodiments compared, tasks evaluated, social roles of robots, and measurements. We introduce three taxonomies for the types of robot embodiment, robot social roles, and human-robot tasks. These taxonomies are used to deconstruct the design and interaction spaces of socially interactive robots and facilitate analysis and discussion of the reviewed studies. We use this newly-defined methodology to critically discuss existing works, revealing topics within embodiment research for social interaction, assistive robotics, and service robotics.
Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftershocks of destructive earthquakes, remains a real challenge since existing methods rely on laborious expert supervision. To this end, in this paper, we present a machine learning-enhanced framework, ML-Picker, for the automatic identification of seismic P-phase arrivals on continuous and massive waveforms. More specifically, ML-Picker consists of three modules, namely, Trigger, Classifier, and Refiner, and an ensemble learning strategy is exploited to integrate several machine learning classifiers. An evaluation of the aftershocks following the $M8.0$ Wenchuan earthquake demonstrates that ML-Picker can not only achieve the best identification performance but also identify 120% more seismic P-phase arrivals as complementary data. Meanwhile, experimental results also reveal both the applicability of different machine learning models for waveforms collected from different seismic stations and the regularities of seismic P-phase arrivals that might be neglected during manual inspection. These findings clearly validate the effectiveness, efficiency, flexibility and stability of ML-Picker. In particular, with the preliminary version of ML-Picker, we won the championship in the First Season and were the runner-up in the Finals of the 2017 International Aftershock Detection Contest hosted by the China Earthquake Administration, in which 1,143 teams participated from around the world.
We consider the problem of identifying intermediate variables (or mediators) that regulate the effect of a treatment on a response variable. While there has been significant research on this topic, little work has been done when the set of potential mediators is high-dimensional and when they are interrelated. In particular, we assume that the causal structure of the treatment, the potential mediators and the response is a directed acyclic graph (DAG). High-dimensional DAG models have previously been used for the estimation of causal effects from observational data and methods called IDA and joint-IDA have been developed for estimating the effects of single interventions and multiple simultaneous interventions respectively. In this paper, we propose an IDA-type method called MIDA for estimating mediation effects from high-dimensional observational data. Although IDA and joint-IDA estimators have been shown to be consistent in certain sparse high-dimensional settings, their asymptotic properties such as convergence in distribution and inferential tools in such settings remained unknown. We prove high-dimensional consistency of MIDA for linear structural equation models with sub-Gaussian errors. More importantly, we derive distributional convergence results for MIDA in similar high-dimensional settings, which are applicable to IDA and joint-IDA estimators as well. To the best of our knowledge, these are the first distributional convergence results facilitating inference for IDA-type estimators. These results have been built on our novel theoretical results regarding uniform bounds for linear regression estimators over varying subsets of high-dimensional covariates, which may be of independent interest. Finally, we empirically validate our asymptotic theory and demonstrate the usefulness of MIDA in identifying large mediation effects via simulations and application to real data in genomics.
Educational technology has obtained great importance over the last fifteen years. At present, the umbrella of educational technology incorporates multitudes of engaging online environments and fields. Learning analytics and Massive Open Online Courses (MOOCs) are two of the most relevant emerging topics in this domain. Since they are open to everyone at no cost, MOOCs excel in attracting numerous participants that can reach hundreds and hundreds of thousands. Experts from different disciplines have shown significant interest in MOOCs as the phenomenon has rapidly grown. In fact, MOOCs have been proven to scale education in disparate areas. Their benefits are crystallized in the improvement of educational outcomes, reduction of costs and accessibility expansion. Due to their unusual massiveness, the large datasets of MOOC platforms require advanced tools and methodologies for further examination. The key importance of learning analytics is reflected here. MOOCs offer diverse challenges and practices for learning analytics to tackle. In view of that, this thesis combines both fields in order to investigate further steps in the learning analytics capabilities in MOOCs. The primary research of this dissertation focuses on the integration of learning analytics in MOOCs, and thereafter looks into examining students' behavior on one side and bridging MOOC issues on the other side. The research was done on the Austrian iMooX xMOOC platform. We followed the prototyping and case studies research methodology to carry out the research questions of this dissertation. The main contributions incorporate designing a general learning analytics framework, learning analytics prototype, records of students' behavior in nearly every MOOC's variables (discussion forums, interactions in videos, self-assessment quizzes, login frequency), a cluster of student engagement...