Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

VaryFairyTED : A Fair in Rating Predictor for Public Speeches by Awareness of Verbal and Gesture Quality

Dec 11, 2020
Rupam Acharyya, Ankani Chattoraj, Shouman Das, Md. Iftekhar Tanveer, Ehsan Hoque

The role of verbal and non-verbal cues towards great public speaking has been a topic of exploration for many decades. We identify a commonality across present theories, the element of "variety or heterogeneity" in channels or modes of communication (e.g. resorting to stories, scientific facts, emotional connections, facial expressions etc.) which is essential for effectively communicating information. We use this observation to formalize a novel HEterogeneity Metric, HEM, that quantifies the quality of a talk both in the verbal and non-verbal domain (transcript and facial gestures). We use TED talks as an input repository of public speeches because it consists of speakers from a diverse community besides having a wide outreach. We show that there is an interesting relationship between HEM and the ratings of TED talks given to speakers by viewers. It emphasizes that HEM inherently and successfully represents the quality of a talk based on "variety or heterogeneity". Further, we also discover that HEM successfully captures the prevalent bias in ratings with respect to race and gender, that we call sensitive attributes (because prediction based on these might result in unfair outcome). We incorporate the HEM metric into the loss function of a neural network with the goal to reduce unfairness in rating predictions with respect to race and gender. Our results show that the modified loss function improves fairness in prediction without considerably affecting prediction accuracy of the neural network. Our work ties together a novel metric for public speeches in both verbal and non-verbal domain with the computational power of a neural network to design a fair prediction system for speakers.

  Access Paper or Ask Questions

3DPVNet: Patch-level 3D Hough Voting Network for 6D Pose Estimation

Sep 15, 2020
Yuanpeng Liu, Jun Zhou, Yuqi Zhang, Chao Ding, Jun Wang

In this paper, we focus on estimating the 6D pose of objects in point clouds. Although the topic has been widely studied, pose estimation in point clouds remains a challenging problem due to the noise and occlusion. To address the problem, a novel 3DPVNet is presented in this work, which utilizes 3D local patches to vote for the object 6D poses. 3DPVNet is comprised of three modules. In particular, a Patch Unification (\textbf{PU}) module is first introduced to normalize the input patch, and also create a standard local coordinate frame on it to generate a reliable vote. We then devise a Weight-guided Neighboring Feature Fusion (\textbf{WNFF}) module in the network, which fuses the neighboring features to yield a semi-global feature for the center patch. WNFF module mines the neighboring information of a local patch, such that the representation capability to local geometric characteristics is significantly enhanced, making the method robust to a certain level of noise. Moreover, we present a Patch-level Voting (\textbf{PV}) module to regress transformations and generates pose votes. After the aggregation of all votes from patches and a refinement step, the final pose of the object can be obtained. Compared to recent voting-based methods, 3DPVNet is patch-level, and directly carried out on point clouds. Therefore, 3DPVNet achieves less computation than point/pixel-level voting scheme, and has robustness to partial data. Experiments on several datasets demonstrate that 3DPVNet achieves the state-of-the-art performance, and is also robust against noise and occlusions.

* 9 pages, 5 figures 

  Access Paper or Ask Questions

Neutral Face Game Character Auto-Creation via PokerFace-GAN

Aug 17, 2020
Tianyang Shi, Zhengxia Zou, Xinhui Song, Zheng Song, Changjian Gu, Changjie Fan, Yi Yuan

Game character customization is one of the core features of many recent Role-Playing Games (RPGs), where players can edit the appearance of their in-game characters with their preferences. This paper studies the problem of automatically creating in-game characters with a single photo. In recent literature on this topic, neural networks are introduced to make game engine differentiable and the self-supervised learning is used to predict facial customization parameters. However, in previous methods, the expression parameters and facial identity parameters are highly coupled with each other, making it difficult to model the intrinsic facial features of the character. Besides, the neural network based renderer used in previous methods is also difficult to be extended to multi-view rendering cases. In this paper, considering the above problems, we propose a novel method named "PokerFace-GAN" for neutral face game character auto-creation. We first build a differentiable character renderer which is more flexible than the previous methods in multi-view rendering cases. We then take advantage of the adversarial training to effectively disentangle the expression parameters from the identity parameters and thus generate player-preferred neutral face (expression-less) characters. Since all components of our method are differentiable, our method can be easily trained under a multi-task self-supervised learning paradigm. Experiment results show that our method can generate vivid neutral face game characters that are highly similar to the input photos. The effectiveness of our method is verified by comparison results and ablation studies.

* Accepted by ACMMM 2020 

  Access Paper or Ask Questions

Deep Multi-attributed Graph Translation with Node-Edge Co-evolution

Mar 22, 2020
Xiaojie Guo, Liang Zhao, Cameron Nowzari, Setareh Rafatirad, Houman Homayoun, Sai Manoj Pudukotai Dinakarrao

Generalized from image and language translation, graph translation aims to generate a graph in the target domain by conditioning an input graph in the source domain. This promising topic has attracted fast-increasing attention recently. Existing works are limited to either merely predicting the node attributes of graphs with fixed topology or predicting only the graph topology without considering node attributes, but cannot simultaneously predict both of them, due to substantial challenges: 1) difficulty in characterizing the interactive, iterative, and asynchronous translation process of both nodes and edges and 2) difficulty in discovering and maintaining the inherent consistency between the node and edge in predicted graphs. These challenges prevent a generic, end-to-end framework for joint node and edge attributes prediction, which is a need for real-world applications such as malware confinement in IoT networks and structural-to-functional network translation. These real-world applications highly depend on hand-crafting and ad-hoc heuristic models, but cannot sufficiently utilize massive historical data. In this paper, we termed this generic problem "multi-attributed graph translation" and developed a novel framework integrating both node and edge translations seamlessly. The novel edge translation path is generic, which is proven to be a generalization of the existing topology translation models. Then, a spectral graph regularization based on our non-parametric graph Laplacian is proposed in order to learn and maintain the consistency of the predicted nodes and edges. Finally, extensive experiments on both synthetic and real-world application data demonstrated the effectiveness of the proposed method.

* International Conference on Data Mining (ICDM), Beijing, China, 2019, pp. 250-259 
* This paper has been accepted by International Conference on Data Mining (ICDM), Beijing, China, 2019 

  Access Paper or Ask Questions

Cataract influence on iris recognition performance

Sep 01, 2018
Mateusz Trokielewicz, Adam Czajka, Piotr Maciejewicz

This paper presents the experimental study revealing weaker performance of the automatic iris recognition methods for cataract-affected eyes when compared to healthy eyes. There is little research on the topic, mostly incorporating scarce databases that are often deficient in images representing more than one illness. We built our own database, acquiring 1288 eye images of 37 patients of the Medical University of Warsaw. Those images represent several common ocular diseases, such as cataract, along with less ordinary conditions, such as iris pattern alterations derived from illness or eye trauma. Images were captured in near-infrared light (used in biometrics) and for selected cases also in visible light (used in ophthalmological diagnosis). Since cataract is a disorder that is most populated by samples in the database, in this paper we focus solely on this illness. To assess the extent of the performance deterioration we use three iris recognition methodologies (commercial and academic solutions) to calculate genuine match scores for healthy eyes and those influenced by cataract. Results show a significant degradation in iris recognition reliability manifesting by worsening the genuine scores in all three matchers used in this study (12% of genuine score increase for an academic matcher, up to 175% of genuine score increase obtained for an example commercial matcher). This increase in genuine scores affected the final false non-match rate in two matchers. To our best knowledge this is the only study of such kind that employs more than one iris matcher, and analyzes the iris image segmentation as a potential source of decreased reliability.

* Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2014, 929020 (2014) 

  Access Paper or Ask Questions

An Iterative Co-Saliency Framework for RGBD Images

Nov 04, 2017
Runmin Cong, Jianjun Lei, Huazhu Fu, Weisi Lin, Qingming Huang, Xiaochun Cao, Chunping Hou

As a newly emerging and significant topic in computer vision community, co-saliency detection aims at discovering the common salient objects in multiple related images. The existing methods often generate the co-saliency map through a direct forward pipeline which is based on the designed cues or initialization, but lack the refinement-cycle scheme. Moreover, they mainly focus on RGB image and ignore the depth information for RGBD images. In this paper, we propose an iterative RGBD co-saliency framework, which utilizes the existing single saliency maps as the initialization, and generates the final RGBD cosaliency map by using a refinement-cycle model. Three schemes are employed in the proposed RGBD co-saliency framework, which include the addition scheme, deletion scheme, and iteration scheme. The addition scheme is used to highlight the salient regions based on intra-image depth propagation and saliency propagation, while the deletion scheme filters the saliency regions and removes the non-common salient regions based on interimage constraint. The iteration scheme is proposed to obtain more homogeneous and consistent co-saliency map. Furthermore, a novel descriptor, named depth shape prior, is proposed in the addition scheme to introduce the depth information to enhance identification of co-salient objects. The proposed method can effectively exploit any existing 2D saliency model to work well in RGBD co-saliency scenarios. The experiments on two RGBD cosaliency datasets demonstrate the effectiveness of our proposed framework.

* 13 pages, 13 figures, Accepted by IEEE Transactions on Cybernetics 2017. Project URL: 

  Access Paper or Ask Questions

OMNIRank: Risk Quantification for P2P Platforms with Deep Learning

Apr 27, 2017
Honglun Zhang, Haiyang Wang, Xiaming Chen, Yongkun Wang, Yaohui Jin

P2P lending presents as an innovative and flexible alternative for conventional lending institutions like banks, where lenders and borrowers directly make transactions and benefit each other without complicated verifications. However, due to lack of specialized laws, delegated monitoring and effective managements, P2P platforms may spawn potential risks, such as withdraw failures, investigation involvements and even runaway bosses, which cause great losses to lenders and are especially serious and notorious in China. Although there are abundant public information and data available on the Internet related to P2P platforms, challenges of multi-sourcing and heterogeneity matter. In this paper, we promote a novel deep learning model, OMNIRank, which comprehends multi-dimensional features of P2P platforms for risk quantification and produces scores for ranking. We first construct a large-scale flexible crawling framework and obtain great amounts of multi-source heterogeneous data of domestic P2P platforms since 2007 from the Internet. Purifications like duplication and noise removal, null handing, format unification and fusion are applied to improve data qualities. Then we extract deep features of P2P platforms via text comprehension, topic modeling, knowledge graph and sentiment analysis, which are delivered as inputs to OMNIRank, a deep learning model for risk quantification of P2P platforms. Finally, according to rankings generated by OMNIRank, we conduct flourish data visualizations and interactions, providing lenders with comprehensive information supports, decision suggestions and safety guarantees.

* 9 pages, in Chinese, 7 figures, CCFBD2016 

  Access Paper or Ask Questions

Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution

Mar 24, 2017
Eitan Adam Pechenick, Christopher M. Danforth, Peter Sheridan Dodds

It is tempting to treat frequency trends from the Google Books data sets as indicators of the "true" popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800--2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets. Our findings emphasize the need to fully characterize the dynamics of the Google Books corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.

* 13 pages, 16 figures 

  Access Paper or Ask Questions

"What is Relevant in a Text Document?": An Interpretable Machine Learning Approach

Dec 23, 2016
Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, Wojciech Samek

Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.

* 19 pages, 7 figures 

  Access Paper or Ask Questions

Geospatial Narratives and their Spatio-Temporal Dynamics: Commonsense Reasoning for High-level Analyses in Geographic Information Systems

Dec 16, 2013
Mehul Bhatt, Jan Oliver Wallgruen

The modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational methods that would underlie their spatial information theoretic underpinnings. We present the conceptual overview and architecture for the development of high-level semantic and qualitative analytical capabilities for dynamic geospatial domains. Building on formal methods in the areas of commonsense reasoning, qualitative reasoning, spatial and temporal representation and reasoning, reasoning about actions and change, and computational models of narrative, we identify concrete theoretical and practical challenges that accrue in the context of formal reasoning about `space, events, actions, and change'. With this as a basis, and within the backdrop of an illustrated scenario involving the spatio-temporal dynamics of urban narratives, we address specific problems and solutions techniques chiefly involving `qualitative abstraction', `data integration and spatial consistency', and `practical geospatial abduction'. From a broad topical viewpoint, we propose that next-generation dynamic GIS technology demands a transdisciplinary scientific perspective that brings together Geography, Artificial Intelligence, and Cognitive Science. Keywords: artificial intelligence; cognitive systems; human-computer interaction; geographic information systems; spatio-temporal dynamics; computational models of narrative; geospatial analysis; geospatial modelling; ontology; qualitative spatial modelling and reasoning; spatial assistance systems

* ISPRS International Journal of Geo-Information (ISSN 2220-9964); Special Issue on: Geospatial Monitoring and Modelling of Environmental Change}. IJGI. Editor: Duccio Rocchini. (pre-print of article in press) 

  Access Paper or Ask Questions