Synchronizing expectations and knowledge about the state of the world is an essential capability for effective collaboration. For robots to effectively collaborate with humans and other autonomous agents, it is critical that they be able to generate intelligible explanations to reconcile differences between their understanding of the world and that of their collaborators. In this work we present Single-shot Policy Explanation for Augmenting Rewards (SPEAR), a novel sequential optimization algorithm that uses semantic explanations derived from combinations of planning predicates to augment agents' reward functions, driving their policies to exhibit more optimal behavior. We provide an experimental validation of our algorithm's policy manipulation capabilities in two practically grounded applications and conclude with a performance analysis of SPEAR on domains of increasingly complex state space and predicate counts. We demonstrate that our method makes substantial improvements over the state-of-the-art in terms of runtime and addressable problem size, enabling an agent to leverage its own expertise to communicate actionable information to improve another's performance.
Generating a vivid, novel, and diverse essay with only several given topic words is a challenging task of natural language generation. In previous work, there are two problems left unsolved: neglect of sentiment beneath the text and insufficient utilization of topic-related knowledge. Therefore, we propose a novel Sentiment-Controllable topic-to-essay generator with a Topic Knowledge Graph enhanced decoder, named SCTKG, which is based on the conditional variational autoencoder (CVAE) framework. We firstly inject the sentiment information into the generator for controlling sentiment for each sentence, which leads to various generated essays. Then we design a Topic Knowledge Graph enhanced decoder. Unlike existing models that use knowledge entities separately, our model treats the knowledge graph as a whole and encodes more structured, connected semantic information in the graph to generate a more relevant essay. Experimental results show that our SCTKG can generate sentiment controllable essays and outperform the state-of-the-art approach in terms of topic relevance, fluency, and diversity on both automatic and human evaluation.
To alleviate the resource constraint for real-time point clouds applications that run on edge devices, we present BiPointNet, the first model binarization approach for efficient deep learning on point clouds. In this work, we discover that the immense performance drop of binarized models for point clouds is caused by two main challenges: aggregation-induced feature homogenization that leads to a degradation of information entropy, and scale distortion that hinders optimization and invalidates scale-sensitive structures. With theoretical justifications and in-depth analysis, we propose Entropy-Maximizing Aggregation(EMA) to modulate the distribution before aggregation for the maximum information entropy, andLayer-wise Scale Recovery(LSR) to efficiently restore feature scales. Extensive experiments show that our BiPointNet outperforms existing binarization methods by convincing margins, at the level even comparable with the full precision counterpart. We highlight that our techniques are generic which show significant improvements on various fundamental tasks and mainstream backbones. BiPoint-Net gives an impressive 14.7 times speedup and 18.9 times storage saving on real-world resource-constrained devices.
Physics and computer science have a long tradition of cross-fertilization. One of the latest outcomes of this mutually beneficial relationship is quantum information science, which comprises the study of information processing tasks that can be accomplished using quantum mechanical systems. Quantum Image Processing (QIMP) is an emergent field of quantum information science whose main goal is to strengthen our capacity for storing, processing, and retrieving visual information from images and video either by transitioning from digital to quantum paradigms or by complementing digital imaging with quantum techniques. The expectation is that harnessing the properties of quantum mechanical systems in QIMP will result in the realization of advanced technologies that will outperform, enhance or complement existing and upcoming digital technologies for image and video processing tasks.
In this paper we apply the constructor-theoretic approach to the theory of constructed emotions, showing that core affect valence and knowledge can be considered as two different observables, leading to information or superinformation conditions: this depends on subject's strategy, coherently with the affect infusion model. In the second part of the article we show that additional hypotheses on the structure of information allows to study emotions in terms of the contructor-theoretic version of phase task. Quantum algorithms are presented as an example of the connection between emotions and memory tasks.
Quantitative Susceptibility Mapping (QSM) is a new phase-based technique for quantifying magnetic susceptibility. The existing QSM reconstruction methods generally require complicated pre-processing on high-quality phase data. In this work, we propose to explore a new value of the high-pass filtered phase data generated in susceptibility weighted imaging (SWI), and develop an end-to-end Cross-connected {\psi}-Net (C{\psi}-Net) to reconstruct QSM directly from these phase data in SWI without additional pre-processing. C{\psi}-Net adds an intermediate branch in the classical U-Net to form a {\psi}-like structure. The specially designed dilated interaction block is embedded in each level of this branch to enlarge the receptive fields for capturing more susceptibility information from a wider spatial range of phase images. Moreover, the crossed connections are utilized between branches to implement a multi-resolution feature fusion scheme, which helps C{\psi}-Net capture rich contextual information for accurate reconstruction. The experimental results on a human dataset show that C{\psi}-Net achieves superior performance in our task over other QSM reconstruction algorithms.
We represent affine sub-manifolds of exponential family distributions as minimum relative entropy sub-manifolds. With such representation we derive analytical formulas for the inference from partial information on expectations and covariances of multivariate normal distributions; and we improve the numerical implementation via Monte Carlo simulations for the inference from partial information of generalized expectation type.
Textbook Question Answering is a complex task in the intersection of Machine Comprehension and Visual Question Answering that requires reasoning with multimodal information from text and diagrams. For the first time, this paper taps on the potential of transformer language models and bottom-up and top-down attention to tackle the language and visual understanding challenges this task entails. Rather than training a language-visual transformer from scratch we rely on pre-trained transformers, fine-tuning and ensembling. We add bottom-up and top-down attention to identify regions of interest corresponding to diagram constituents and their relationships, improving the selection of relevant visual information for each question and answer options. Our system ISAAQ reports unprecedented success in all TQA question types, with accuracies of 81.36%, 71.11% and 55.12% on true/false, text-only and diagram multiple choice questions. ISAAQ also demonstrates its broad applicability, obtaining state-of-the-art results in other demanding datasets.
Community affiliation of a node plays an important role in determining its contextual position in the network, which may raise privacy concerns when a sensitive node wants to hide its identity in a network. Oftentimes, a target community seeks to protect itself from adversaries so that its constituent members remain hidden inside the network. The current study focuses on hiding such sensitive communities so that the community affiliation of the targeted nodes can be concealed. This leads to the problem of community deception which investigates the avenues of minimally rewiring nodes in a network so that a given target community maximally hides from a community detection algorithm. We formalize the problem of community deception and introduce NEURAL, a novel method that greedily optimizes a node-centric objective function to determine the rewiring strategy. Theoretical settings pose a restriction on the number of strategies that can be employed to optimize the objective function, which in turn reduces the overhead of choosing the best strategy from multiple options. We also show that our objective function is submodular and monotone. When tested on both synthetic and 7 real-world networks, NEURAL is able to deceive 6 widely used community detection algorithms. We benchmark its performance with respect to 4 state-of-the-art methods on 4 evaluation metrics. Additionally, our qualitative analysis of 3 other attributed real-world networks reveals that NEURAL, quite strikingly, captures important meta-information about edges that otherwise could not be inferred by observing only their topological structures.
Cross-view geo-localization is to spot images of the same geographic target from different platforms, e.g., drone-view cameras and satellites. It is challenging in the large visual appearance changes caused by extreme viewpoint variations. Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center, but underestimate the contextual information in neighbor areas. In this work, we argue that neighbor areas can be leveraged as auxiliary information, enriching discriminative clues for geo-localization. Specifically, we introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information in an end-to-end manner. Without using extra part estimators, LPN adopts a square-ring feature partition strategy, which provides the attention according to the distance to the image center. It eases the part matching and enables the part-wise representation learning. Owing to the square-ring partition design, the proposed LPN has good scalability to rotation variations and achieves competitive results on two prevailing benchmarks, i.e., University-1652 and CVUSA. Besides, we also show the proposed LPN can be easily embedded into other frameworks to further boost performance.