Game character customization is one of the core features of many recent Role-Playing Games (RPGs), where players can edit the appearance of their in-game characters with their preferences. This paper studies the problem of automatically creating in-game characters with a single photo. In recent literature on this topic, neural networks are introduced to make game engine differentiable and the self-supervised learning is used to predict facial customization parameters. However, in previous methods, the expression parameters and facial identity parameters are highly coupled with each other, making it difficult to model the intrinsic facial features of the character. Besides, the neural network based renderer used in previous methods is also difficult to be extended to multi-view rendering cases. In this paper, considering the above problems, we propose a novel method named "PokerFace-GAN" for neutral face game character auto-creation. We first build a differentiable character renderer which is more flexible than the previous methods in multi-view rendering cases. We then take advantage of the adversarial training to effectively disentangle the expression parameters from the identity parameters and thus generate player-preferred neutral face (expression-less) characters. Since all components of our method are differentiable, our method can be easily trained under a multi-task self-supervised learning paradigm. Experiment results show that our method can generate vivid neutral face game characters that are highly similar to the input photos. The effectiveness of our method is verified by comparison results and ablation studies.
Generalized from image and language translation, graph translation aims to generate a graph in the target domain by conditioning an input graph in the source domain. This promising topic has attracted fast-increasing attention recently. Existing works are limited to either merely predicting the node attributes of graphs with fixed topology or predicting only the graph topology without considering node attributes, but cannot simultaneously predict both of them, due to substantial challenges: 1) difficulty in characterizing the interactive, iterative, and asynchronous translation process of both nodes and edges and 2) difficulty in discovering and maintaining the inherent consistency between the node and edge in predicted graphs. These challenges prevent a generic, end-to-end framework for joint node and edge attributes prediction, which is a need for real-world applications such as malware confinement in IoT networks and structural-to-functional network translation. These real-world applications highly depend on hand-crafting and ad-hoc heuristic models, but cannot sufficiently utilize massive historical data. In this paper, we termed this generic problem "multi-attributed graph translation" and developed a novel framework integrating both node and edge translations seamlessly. The novel edge translation path is generic, which is proven to be a generalization of the existing topology translation models. Then, a spectral graph regularization based on our non-parametric graph Laplacian is proposed in order to learn and maintain the consistency of the predicted nodes and edges. Finally, extensive experiments on both synthetic and real-world application data demonstrated the effectiveness of the proposed method.
This paper presents the experimental study revealing weaker performance of the automatic iris recognition methods for cataract-affected eyes when compared to healthy eyes. There is little research on the topic, mostly incorporating scarce databases that are often deficient in images representing more than one illness. We built our own database, acquiring 1288 eye images of 37 patients of the Medical University of Warsaw. Those images represent several common ocular diseases, such as cataract, along with less ordinary conditions, such as iris pattern alterations derived from illness or eye trauma. Images were captured in near-infrared light (used in biometrics) and for selected cases also in visible light (used in ophthalmological diagnosis). Since cataract is a disorder that is most populated by samples in the database, in this paper we focus solely on this illness. To assess the extent of the performance deterioration we use three iris recognition methodologies (commercial and academic solutions) to calculate genuine match scores for healthy eyes and those influenced by cataract. Results show a significant degradation in iris recognition reliability manifesting by worsening the genuine scores in all three matchers used in this study (12% of genuine score increase for an academic matcher, up to 175% of genuine score increase obtained for an example commercial matcher). This increase in genuine scores affected the final false non-match rate in two matchers. To our best knowledge this is the only study of such kind that employs more than one iris matcher, and analyzes the iris image segmentation as a potential source of decreased reliability.
As a newly emerging and significant topic in computer vision community, co-saliency detection aims at discovering the common salient objects in multiple related images. The existing methods often generate the co-saliency map through a direct forward pipeline which is based on the designed cues or initialization, but lack the refinement-cycle scheme. Moreover, they mainly focus on RGB image and ignore the depth information for RGBD images. In this paper, we propose an iterative RGBD co-saliency framework, which utilizes the existing single saliency maps as the initialization, and generates the final RGBD cosaliency map by using a refinement-cycle model. Three schemes are employed in the proposed RGBD co-saliency framework, which include the addition scheme, deletion scheme, and iteration scheme. The addition scheme is used to highlight the salient regions based on intra-image depth propagation and saliency propagation, while the deletion scheme filters the saliency regions and removes the non-common salient regions based on interimage constraint. The iteration scheme is proposed to obtain more homogeneous and consistent co-saliency map. Furthermore, a novel descriptor, named depth shape prior, is proposed in the addition scheme to introduce the depth information to enhance identification of co-salient objects. The proposed method can effectively exploit any existing 2D saliency model to work well in RGBD co-saliency scenarios. The experiments on two RGBD cosaliency datasets demonstrate the effectiveness of our proposed framework.
P2P lending presents as an innovative and flexible alternative for conventional lending institutions like banks, where lenders and borrowers directly make transactions and benefit each other without complicated verifications. However, due to lack of specialized laws, delegated monitoring and effective managements, P2P platforms may spawn potential risks, such as withdraw failures, investigation involvements and even runaway bosses, which cause great losses to lenders and are especially serious and notorious in China. Although there are abundant public information and data available on the Internet related to P2P platforms, challenges of multi-sourcing and heterogeneity matter. In this paper, we promote a novel deep learning model, OMNIRank, which comprehends multi-dimensional features of P2P platforms for risk quantification and produces scores for ranking. We first construct a large-scale flexible crawling framework and obtain great amounts of multi-source heterogeneous data of domestic P2P platforms since 2007 from the Internet. Purifications like duplication and noise removal, null handing, format unification and fusion are applied to improve data qualities. Then we extract deep features of P2P platforms via text comprehension, topic modeling, knowledge graph and sentiment analysis, which are delivered as inputs to OMNIRank, a deep learning model for risk quantification of P2P platforms. Finally, according to rankings generated by OMNIRank, we conduct flourish data visualizations and interactions, providing lenders with comprehensive information supports, decision suggestions and safety guarantees.
It is tempting to treat frequency trends from the Google Books data sets as indicators of the "true" popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800--2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets. Our findings emphasize the need to fully characterize the dynamics of the Google Books corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.
The modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational methods that would underlie their spatial information theoretic underpinnings. We present the conceptual overview and architecture for the development of high-level semantic and qualitative analytical capabilities for dynamic geospatial domains. Building on formal methods in the areas of commonsense reasoning, qualitative reasoning, spatial and temporal representation and reasoning, reasoning about actions and change, and computational models of narrative, we identify concrete theoretical and practical challenges that accrue in the context of formal reasoning about `space, events, actions, and change'. With this as a basis, and within the backdrop of an illustrated scenario involving the spatio-temporal dynamics of urban narratives, we address specific problems and solutions techniques chiefly involving `qualitative abstraction', `data integration and spatial consistency', and `practical geospatial abduction'. From a broad topical viewpoint, we propose that next-generation dynamic GIS technology demands a transdisciplinary scientific perspective that brings together Geography, Artificial Intelligence, and Cognitive Science. Keywords: artificial intelligence; cognitive systems; human-computer interaction; geographic information systems; spatio-temporal dynamics; computational models of narrative; geospatial analysis; geospatial modelling; ontology; qualitative spatial modelling and reasoning; spatial assistance systems
As machine learning and artificial intelligence are more frequently being leveraged to tackle problems in the health sector, there has been increased interest in utilizing them in clinical decision-support. This has historically been the case in single modal data such as electronic health record data. Attempts to improve prediction and resemble the multimodal nature of clinical expert decision-making this has been met in the computational field of machine learning by a fusion of disparate data. This review was conducted to summarize this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) extension for Scoping Reviews to characterize multi-modal data fusion in health. We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 125 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. However, there exist a wide breadth of current applications. The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion. Lacking from the papers were clear clinical deployment strategies and pursuit of FDA-approved tools. These findings provide a map of the current literature on multimodal data fusion as applied to health diagnosis/prognosis problems. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.
Salient object detection (SOD) in optical remote sensing images (RSIs), or RSI-SOD, is an emerging topic in understanding optical RSIs. However, due to the difference between optical RSIs and natural scene images (NSIs), directly applying NSI-SOD methods to optical RSIs fails to achieve satisfactory results. In this paper, we propose a novel Adjacent Context Coordination Network (ACCoNet) to explore the coordination of adjacent features in an encoder-decoder architecture for RSI-SOD. Specifically, ACCoNet consists of three parts: an encoder, Adjacent Context Coordination Modules (ACCoMs), and a decoder. As the key component of ACCoNet, ACCoM activates the salient regions of output features of the encoder and transmits them to the decoder. ACCoM contains a local branch and two adjacent branches to coordinate the multi-level features simultaneously. The local branch highlights the salient regions in an adaptive way, while the adjacent branches introduce global information of adjacent levels to enhance salient regions. Additionally, to extend the capabilities of the classic decoder block (i.e., several cascaded convolutional layers), we extend it with two bifurcations and propose a Bifurcation-Aggregation Block to capture the contextual information in the decoder. Extensive experiments on two benchmark datasets demonstrate that the proposed ACCoNet outperforms 22 state-of-the-art methods under nine evaluation metrics, and runs up to 81 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/ACCoNet.