In the advent of wearable body-cameras, human activity classification from First-Person Videos (FPV) has become a topic of increasing importance for various applications, including in life-logging, law-enforcement, sports, workplace, and healthcare. One of the challenging aspects of FPV is its exposure to potentially sensitive objects within the user's field of view. In this work, we developed a privacy-aware activity classification system focusing on office videos. We utilized a Mask-RCNN with an Inception-ResNet hybrid as a feature extractor for detecting, and then blurring out sensitive objects (e.g., digital screens, human face, paper) from the videos. For activity classification, we incorporate an ensemble of Recurrent Neural Networks (RNNs) with ResNet, ResNext, and DenseNet based feature extractors. The proposed system was trained and evaluated on the FPV office video dataset that includes 18-classes made available through the IEEE Video and Image Processing (VIP) Cup 2019 competition. On the original unprotected FPVs, the proposed activity classifier ensemble reached an accuracy of 85.078% with precision, recall, and F1 scores of 0.88, 0.85 & 0.86, respectively. On privacy protected videos, the performances were slightly degraded, with accuracy, precision, recall, and F1 scores at 73.68%, 0.79, 0.75, and 0.74, respectively. The presented system won the 3rd prize in the IEEE VIP Cup 2019 competition.
It is commonly believed among the machine learning (ML) community that industry influence on the community itself as well as the scientific process is increasing since tech companies have begun to allocate a large amount of human and monetary resources to ML. However, concrete ethical implications and the quantitative scale of this influence are rather unknown. For this purpose we have not only carried out an informed ethical analysis of the field, but have inspected all papers of the main ML conferences NeurIPS, CVPR and ICML of the last 5 years - almost 11000 papers in total. Our statistical approach focuses on conflicts of interest, innovation and gender equality. We have obtained four main findings: (1) Academic-corporate collaborations are growing in numbers. At the same time, we found that conflicts of interest are rarely disclosed. (2) Industry publishes papers about trending ML topics on average two years earlier than academia. (3) Industry papers are not lagging behind academic papers concerning social impact considerations. (4) Finally, we demonstrate that industrial papers fall short of their academic counterparts with respect to the ratio of gender diversity. The results have been reviewed in the light of related research works from ethics and other disciplines. For the first time we have quantitatively analysed the influence of industry on the ML community. We believe that this is a good starting point for further fine-grained discussion. The main recommendation that follows from our research is for the community to openly declare conflicts of interest, also subtle or only potential ones, to foster trustworthiness and transparency.
In this paper, we address the problem of answering complex information needs by conversing conversations with search engines, in the sense that users can express their queries in natural language, and directly receivethe information they need from a short system response in a conversational manner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agents (CAs) and Conversational Search (CS). However, they either do not address complex information needs, or they are limited to the development of conceptual frameworks and/or laboratory-based user studies. We pursue two goals in this paper: (1) the creation of a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines, and (2) the development of astate-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE), using this dataset. SaaC is built based on a multi-turn conversational search dataset, where we further employ workers from a crowdsourcing platform to summarize each relevant passage into a short, conversational response. CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator, which enables us to generate more accurate responses. We carry out experiments to show that CaSE is able to outperform strong baselines. We also conduct extensive analyses on the SaaC dataset to show where there is room for further improvement beyond CaSE. Finally, we release the SaaC dataset and the code for CaSE and all models used for comparison to facilitate future research on this topic.
Speech recognition is one of the key topics in artificial intelligence, as it is one of the most common forms of communication in humans. Researchers have developed many speech-controlled prosthetic hands in the past decades, utilizing conventional speech recognition systems that use a combination of neural network and hidden Markov model. Recent advancements in general-purpose graphics processing units (GPGPUs) enable intelligent devices to run deep neural networks in real-time. Thus, state-of-the-art speech recognition systems have rapidly shifted from the paradigm of composite subsystems optimization to the paradigm of end-to-end optimization. However, a low-power embedded GPGPU cannot run these speech recognition systems in real-time. In this paper, we show the development of deep convolutional neural networks (CNN) for speech control of prosthetic hands that run in real-time on a NVIDIA Jetson TX2 developer kit. First, the device captures and converts speech into 2D features (like spectrogram). The CNN receives the 2D features and classifies the hand gestures. Finally, the hand gesture classes are sent to the prosthetic hand motion control system. The whole system is written in Python with Keras, a deep learning library that has a TensorFlow backend. Our experiments on the CNN demonstrate the 91% accuracy and 2ms running time of hand gestures (text output) from speech commands, which can be used to control the prosthetic hands in real-time.
Change detection in heterogeneous multitemporal satellite images is an emerging and challenging topic in remote sensing. In particular, one of the main challenges is to tackle the problem in an unsupervised manner. In this paper we propose an unsupervised framework for bitemporal heterogeneous change detection based on the comparison of affinity matrices and image regression. First, our method quantifies the similarity of affinity matrices computed from co-located image patches in the two images. This is done to automatically identify pixels that are likely to be unchanged. With the identified pixels as pseudo-training data, we learn a transformation to map the first image to the domain of the other image, and vice versa. Four regression methods are selected to carry out the transformation: Gaussian process regression, support vector regression, random forest regression, and a recently proposed kernel regression method called homogeneous pixel transformation. To evaluate the potentials and limitations of our framework, and also the benefits and disadvantages of each regression method, we perform experiments on two real data sets. The results indicate that the comparison of the affinity matrices can already be considered a change detection method by itself. However, image regression is shown to improve the results obtained by the previous step alone and produces accurate change detection maps despite of the heterogeneity of the multitemporal input data. Notably, the random forest regression approach excels by achieving similar accuracy as the other methods, but with a significantly lower computational cost and with fast and robust tuning of hyperparameters.
Adapting to social conventions is an unavoidable requirement for the acceptance of assistive and social robots. While the scientific community broadly accepts that assistive robots and social robot companions are unlikely to have widespread use in the near future, their presence in health-care and other medium-sized institutions is becoming a reality. These robots will have a beneficial impact in industry and other fields such as health care. The growing number of research contributions to social navigation is also indicative of the importance of the topic. To foster the future prevalence of these robots, they must be useful, but also socially accepted. The first step to be able to actively ask for collaboration or permission is to estimate whether the robot would make people feel uncomfortable otherwise, and that is precisely the goal of algorithms evaluating social navigation compliance. Some approaches provide analytic models, whereas others use machine learning techniques such as neural networks. This data report presents and describes SocNav1, a dataset for social navigation conventions. The aims of SocNav1 are two-fold: a) enabling comparison of the algorithms that robots use to assess the convenience of their presence in a particular position when navigating; b) providing a sufficient amount of data so that modern machine learning algorithms such as deep neural networks can be used. Because of the structured nature of the data, SocNav1 is particularly well-suited to be used to benchmark non-Euclidean machine learning algorithms such as Graph Neural Networks (see ). The dataset has been made available in a public repository.
This article identifies and characterises political narratives regarding Europe and broadcasted in UK press during 2016 and 2017. A new theoretical and operational framework is proposed for typifying discourse narratives propagated in the public opinion space, based on the social constructivism and structural linguistics approaches, and the mathematical theory of hypernetworks, where elementary units are aggregated into high-level entities. In this line of thought, a narrative is understood as a social construct where a related and coherent aggregate of terms within public discourse is repeated and propagated on media until it can be identified as a communication pattern, embodying meaning in a way that provides individuals some interpretation of their world. An inclusive methodology, with state-of-the-art technologies on natural language processing and network theory, implements this concept of narrative. A corpus from the Observatorium database, including articles from six UK newspapers and incorporating far-right, right-wing, and left-wing narratives, is analysed. The research revealed clear distinctions between narratives along the political spectrum. In 2016 far-right was particularly focused on emigration and refugees. Namely, during the referendum campaign, Europe was related to attacks on women and children, sexual offences, and terrorism. Right-wing was manly focused on internal politics, while left-wing was remarkably mentioning a diversity of non-political topics, such as sports, side by side with economics. During 2017, in general terrorism was less mentioned, and negotiations with EU, namely regarding economics, finance, and Ireland, became central.
The analysis of lesion within medical image data is desirable for efficient disease diagnosis, treatment and prognosis. The common lesion analysis tasks like segmentation and classification are mainly based on supervised learning with well-paired image-level or voxel-level labels. However, labeling the lesion in medical images is laborious requiring highly specialized knowledge. Inspired by the fact that radiologists make diagnoses based on expert knowledge on "healthiness" and "unhealthiness" developed from extensive experience, we propose an medical image synthesis model named abnormal-to-normal translation generative adversarial network (ANT-GAN) to predict a normal-looking medical image based on its abnormal-looking counterpart without the need of paired data for training. Unlike typical GANs, whose aim is to generate realistic samples with variations, our more restrictive model aims at producing the underlying normal-looking image corresponding to an image containing lesions, and thus requires a specialized design. With an ability to segment normal from abnormal tissue, our model is able to generate a highly realistic lesion-free medical image based on its true lesion-containing counterpart. Being able to provide a "normal" version of a medical image (possibly the same image if there is no illness) is not only an intriguing topic, but also can serve as a pre-processing and provide useful side information for medical imaging tasks like lesion segmentation or classification validated by our experiments.
The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions.
A sparse modeling is a major topic in machine learning and statistics. LASSO (Least Absolute Shrinkage and Selection Operator) is a popular sparse modeling method while it has been known to yield unexpected large bias especially at a sparse representation. There have been several studies for improving this problem such as the introduction of non-convex regularization terms. The important point is that this bias problem directly affects model selection in applications since a sparse representation cannot be selected by a prediction error based model selection even if it is a good representation. In this article, we considered to improve this problem by introducing a scaling that expands LASSO estimator to compensate excessive shrinkage, thus a large bias in LASSO estimator. We here gave an empirical value for the amount of scaling. There are two advantages of this scaling method as follows. Since the proposed scaling value is calculated by using LASSO estimator, we only need LASSO estimator that is obtained by a fast and stable optimization procedure such as LARS (Least Angle Regression) under LASSO modification or coordinate descent. And, the simplicity of our scaling method enables us to derive SURE (Stein's Unbiased Risk Estimate) under the modified LASSO estimator with scaling. Our scaling method together with model selection based on SURE is fully empirical and do not need additional hyper-parameters. In a simple numerical example, we verified that our scaling method actually improves LASSO and the SURE based model selection criterion can stably choose an appropriate sparse model.