Thanks to the rapid development of diffusion models, unprecedented progress has been witnessed in image synthesis. Prior works mostly rely on pre-trained linguistic models, but a text is often too abstract to properly specify all the spatial properties of an image, e.g., the layout configuration of a scene, leading to the sub-optimal results of complex scene generation. In this paper, we achieve accurate complex scene generation by proposing a semantically controllable Layout-AWare diffusion model, termed LAW-Diffusion. Distinct from the previous Layout-to-Image generation (L2I) methods that only explore category-aware relationships, LAW-Diffusion introduces a spatial dependency parser to encode the location-aware semantic coherence across objects as a layout embedding and produces a scene with perceptually harmonious object styles and contextual relations. To be specific, we delicately instantiate each object's regional semantics as an object region map and leverage a location-aware cross-object attention module to capture the spatial dependencies among those disentangled representations. We further propose an adaptive guidance schedule for our layout guidance to mitigate the trade-off between the regional semantic alignment and the texture fidelity of generated objects. Moreover, LAW-Diffusion allows for instance reconfiguration while maintaining the other regions in a synthesized image by introducing a layout-aware latent grafting mechanism to recompose its local regional semantics. To better verify the plausibility of generated scenes, we propose a new evaluation metric for the L2I task, dubbed Scene Relation Score (SRS) to measure how the images preserve the rational and harmonious relations among contextual objects. Comprehensive experiments demonstrate that our LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
In Open Set Domain Adaptation (OSDA), large amounts of target samples are drawn from the implicit categories that never appear in the source domain. Due to the lack of their specific belonging, existing methods indiscriminately regard them as a single class unknown. We challenge this broadly-adopted practice that may arouse unexpected detrimental effects because the decision boundaries between the implicit categories have been fully ignored. Instead, we propose Self-supervised Class-Discovering Adapter (SCDA) that attempts to achieve OSDA by gradually discovering those implicit classes, then incorporating them to restructure the classifier and update the domain-adaptive features iteratively. SCDA performs two alternate steps to achieve implicit class discovery and self-supervised OSDA, respectively. By jointly optimizing for two tasks, SCDA achieves the state-of-the-art in OSDA and shows a competitive performance to unearth the implicit target classes.
Human doctors with well-structured medical knowledge can diagnose a disease merely via a few conversations with patients about symptoms. In contrast, existing knowledge-grounded dialogue systems often require a large number of dialogue instances to learn as they fail to capture the correlations between different diseases and neglect the diagnostic experience shared among them. To address this issue, we propose a more natural and practical paradigm, i.e., low-resource medical dialogue generation, which can transfer the diagnostic experience from source diseases to target ones with a handful of data for adaptation. It is capitalized on a commonsense knowledge graph to characterize the prior disease-symptom relations. Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues. More importantly, by dynamically evolving disease-symptom graphs, GEML also well addresses the real-world challenges that the disease-symptom correlations of each disease may vary or evolve along with more diagnostic cases. Extensive experiment results on the CMDD dataset and our newly-collected Chunyu dataset testify the superiority of our approach over state-of-the-art approaches. Besides, our GEML can generate an enriched dialogue-sensitive knowledge graph in an online manner, which could benefit other tasks grounded on knowledge graph.
Medical automatic diagnosis (MAD) aims to learn an agent that mimics the behavior of a human doctor, i.e. inquiring symptoms and informing diseases. Due to medical ethics concerns, it is impractical to directly apply reinforcement learning techniques to solving MAD, e.g., training a reinforced agent with the human patient. Developing a patient simulator by using the collected patient-doctor dialogue records has been proposed as a promising approach to MAD. However, most of these existing works overlook the causal relationship between patient symptoms and disease diagnoses. For example, these simulators simply generate the ``not-sure'' response to the inquiry (i.e., symptom) that was not observed in one dialogue record. As a result, the MAD agent is usually trained without exploiting the counterfactual reasoning beyond the factual observations. To address this problem, this paper presents a propensity-based patient simulator (PBPS), which is capable of facilitating the training of MAD agents by generating informative counterfactual answers along with the disease diagnosis. Specifically, our PBPS estimates the propensity score of each record with the patient-doctor dialogue reasoning, and can thus generate the counterfactual answers by searching across records. That is, the unrecorded symptom for one patient can be found in the records of other patients according to the propensity score matching. A progressive assurance agent (P2A) can be thus trained with PBPS, which includes two separate yet cooperative branches accounting for the execution of symptom-inquiry and disease-diagnosis actions, respectively. The disease-diagnosis predicts the confidence of disease and drives the symptom-inquiry in terms of enhancing the confidence, and the two branches are jointly optimized with benefiting from each other.
Resembling the rapid learning capability of human, low-shot learning empowers vision systems to understand new concepts by training with few samples. Leading approaches derived from meta-learning on images with a single visual object. Obfuscated by a complex background and multiple objects in one image, they are hard to promote the research of low-shot object detection/segmentation. In this work, we present a flexible and general methodology to achieve these tasks. Our work extends Faster /Mask R-CNN by proposing meta-learning over RoI (Region-of-Interest) features instead of a full image feature. This simple spirit disentangles multi-object information merged with the background, without bells and whistles, enabling Faster /Mask R-CNN turn into a meta-learner to achieve the tasks. Specifically, we introduce a Predictor-head Remodeling Network (PRN) that shares its main backbone with Faster /Mask R-CNN. PRN receives images containing low-shot objects with their bounding boxes or masks to infer their class attentive vectors. The vectors take channel-wise soft-attention on RoI features, remodeling those R-CNN predictor heads to detect or segment the objects that are consistent with the classes these vectors represent. In our experiments, Meta R-CNN yields the state of the art in low-shot object detection and improves low-shot object segmentation by Mask R-CNN.
It is feasible and practically-valuable to bridge the characteristics between graph neural networks (GNNs) and logical reasoning. Despite considerable efforts and successes witnessed to solve Boolean satisfiability (SAT), it remains a mystery of GNN-based solvers for more complex predicate logic formulae. In this work, we conjectures with some evidences, that generally-defined GNNs present several limitations to certify the unsatisfiability (UNSAT) in Boolean formulae. It implies that GNNs may probably fail in learning the logical reasoning tasks if they contain proving UNSAT as the sub-problem included by most predicate logic formulae.
A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs). Hitherto algorithms excel in pairwise domains while as $m$ increases, remain struggling to scale themselves to fit a joint distribution. In this paper, we propose a domain-scalable DGM, i.e., MMI-ALI for $m$-domain joint distribution matching. As an $m$-domain ensemble model of ALIs \cite{dumoulin2016adversarially}, MMI-ALI is adversarially trained with maximizing Multivariate Mutual Information (MMI) w.r.t. joint variables of each pair of domains and their shared feature. The negative MMIs are upper bounded by a series of feasible losses that provably lead to matching $m$-domain joint distributions. MMI-ALI linearly scales as $m$ increases and thus, strikes a right balance between efficacy and scalability. We evaluate MMI-ALI in diverse challenging $m$-domain scenarios and verify its superiority.
(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training. Learning domain-invariant features helps to achieve this goal, whereas it underpins unlabeled samples drawn from a single or multiple explicit target domains (Multi-target DA). In this paper, we consider a more realistic transfer scenario: our target domain is comprised of multiple sub-targets implicitly blended with each other, so that learners could not identify which sub-target each unlabeled sample belongs to. This Blending-target Domain Adaptation (BTDA) scenario commonly appears in practice and threatens the validities of most existing DA algorithms, due to the presence of domain gaps and categorical misalignments among these hidden sub-targets. To reap the transfer performance gains in this new scenario, we propose Adversarial Meta-Adaptation Network (AMEAN). AMEAN entails two adversarial transfer learning processes. The first is a conventional adversarial transfer to bridge our source and mixed target domains. To circumvent the intra-target category misalignment, the second process presents as ``learning to adapt'': It deploys an unsupervised meta-learner receiving target data and their ongoing feature-learning feedbacks, to discover target clusters as our ``meta-sub-target'' domains. These meta-sub-targets auto-design our meta-sub-target DA loss, which empirically eliminates the implicit category mismatching in our mixed target. We evaluate AMEAN and a variety of DA algorithms in three benchmarks under the BTDA setup. Empirical results show that BTDA is a quite challenging transfer setup for most existing DA algorithms, yet AMEAN significantly outperforms these state-of-the-art baselines and effectively restrains the negative transfer effects in BTDA.
In this paper, we propose Hard Person Identity Mining (HPIM) that attempts to refine the hard example mining to improve the exploration efficacy in person re-identification. It is motivated by following observation: the more attributes some people share, the more difficult to separate their identities. Based on this observation, we develop HPIM via a transferred attribute describer, a deep multi-attribute classifier trained from the source noisy person attribute datasets. We encode each image into the attribute probabilistic description in the target person re-ID dataset. Afterwards in the attribute code space, we consider each person as a distribution to generate his view-specific attribute codes in different practical scenarios. Hence we estimate the person-specific statistical moments from zeroth to higher order, which are further used to calculate the central moment discrepancies between persons. Such discrepancy is a ground to choose hard identity to organize proper mini-batches, without concerning the person representation changing in metric learning. It presents as a complementary tool of hard example mining, which helps to explore the global instead of the local hard example constraint in the mini-batch built by randomly sampled identities. Extensive experiments on two person re-identification benchmarks validated the effectiveness of our proposed algorithm.
In this paper, we investigate the feasibility of learning GNN (Graph Neural Network) based solvers and GNN-based heuristics for specified QBF (Quantified Boolean Formula) problems. We design and evaluate several GNN architectures for 2QBF formulae, and conjecture that GNN has limitations in learning 2QBF solvers. Then we show how to learn a heuristic CEGAR 2QBF solver. We further explore generalizing GNN-based heuristics to larger unseen instances, and uncover some interesting challenges. In summary, this paper provides a comprehensive surveying view of applying GNN-embeddings to specified QBF solvers, and aims to offer guidance in applying ML to more complicated symbolic reasoning problems.