Diffusion models are a powerful class of generative models that can produce high-quality images, but they may suffer from data bias. Data bias occurs when the training data does not reflect the true distribution of the data domain, but rather exhibits some skewed or imbalanced patterns. For example, the CelebA dataset contains more female images than male images, which can lead to biased generation results and affect downstream applications. In this paper, we propose a novel method to mitigate data bias in diffusion models by applying manifold guidance. Our key idea is to estimate the manifold of the training data using a learnable information-theoretic approach, and then use it to guide the sampling process of diffusion models. In this way, we can encourage the generated images to be uniformly distributed on the data manifold, without changing the model architecture or requiring labels or retraining. We provide theoretical analysis and empirical evidence to show that our method can improve the quality and unbiasedness of image generation compared to standard diffusion models.
Recently, aspect sentiment quad prediction has received widespread attention in the field of aspect-based sentiment analysis. Existing studies extract quadruplets via pre-trained generative language models to paraphrase the original sentence into a templated target sequence. However, previous works only focus on what to generate but ignore what not to generate. We argue that considering the negative samples also leads to potential benefits. In this work, we propose a template-agnostic method to control the token-level generation, which boosts original learning and reduces mistakes simultaneously. Specifically, we introduce Monte Carlo dropout to understand the built-in uncertainty of pre-trained language models, acquiring the noises and errors. We further propose marginalized unlikelihood learning to suppress the uncertainty-aware mistake tokens. Finally, we introduce minimization entropy to balance the effects of marginalized unlikelihood learning. Extensive experiments on four public datasets demonstrate the effectiveness of our approach on various generation templates.
Generative Adversarial Networks (GANs) and their variants have achieved remarkable success on natural images. It aims to approximate the distribution of the training datasets. However, their performance degrades when applied to remote sensing (RS) images, and the discriminator often suffers from the overfitting problem. In this paper, we examine the differences between natural and RS images and find that the intrinsic dimensions of RS images are much lower than those of natural images. Besides, the low-dimensional data manifold of RS images may exacerbate the uneven sampling of training datasets and introduce biased information. The discriminator can easily overfit to the biased training distribution, leading to a faulty generation model, even the mode collapse problem. While existing GANs focus on the general distribution of RS datasets, they often neglect the underlying data manifold. In respond, we propose a learnable information-theoretic measure that preserves the intrinsic structures of the original data, and establish a unified GAN framework for manifold alignment in supervised and unsupervised RS image generation.
Optimizing and rendering Neural Radiance Fields is computationally expensive due to the vast number of samples required by volume rendering. Recent works have included alternative sampling approaches to help accelerate their methods, however, they are often not the focus of the work. In this paper, we investigate and compare multiple sampling approaches and demonstrate that improved sampling is generally applicable across NeRF variants under an unified concept of transmittance estimator. To facilitate future experiments, we develop NerfAcc, a Python toolbox that provides flexible APIs for incorporating advanced sampling methods into NeRF related methods. We demonstrate its flexibility by showing that it can reduce the training time of several recent NeRF methods by 1.5x to 20x with minimal modifications to the existing codebase. Additionally, highly customized NeRFs, such as Instant-NGP, can be implemented in native PyTorch using NerfAcc.
Benefiting from the injection of human prior knowledge, graphs, as derived discrete data, are semantically dense so that models can efficiently learn the semantic information from such data. Accordingly, graph neural networks (GNNs) indeed achieve impressive success in various fields. Revisiting the GNN learning paradigms, we discover that the relationship between human expertise and the knowledge modeled by GNNs still confuses researchers. To this end, we introduce motivating experiments and derive an empirical observation that the human expertise is gradually learned by the GNNs in general domains. By further observing the ramifications of introducing expertise logic into graph representation learning, we conclude that leading the GNNs to learn human expertise can improve the model performance. By exploring the intrinsic mechanism behind such observations, we elaborate the Structural Causal Model for the graph representation learning paradigm. Following the theoretical guidance, we innovatively introduce the auxiliary causal logic learning paradigm to improve the model to learn the expertise logic causally related to the graph representation learning task. In practice, the counterfactual technique is further performed to tackle the insufficient training issue during optimization. Plentiful experiments on the crafted and real-world domains support the consistent effectiveness of the proposed method.
We study the recent progress on dynamic view synthesis (DVS) from monocular video. Though existing approaches have demonstrated impressive results, we show a discrepancy between the practical capture process and the existing experimental protocols, which effectively leaks in multi-view signals during training. We define effective multi-view factors (EMFs) to quantify the amount of multi-view signal present in the input capture sequence based on the relative camera-scene motion. We introduce two new metrics: co-visibility masked image metrics and correspondence accuracy, which overcome the issue in existing protocols. We also propose a new iPhone dataset that includes more diverse real-life deformation sequences. Using our proposed experimental protocol, we show that the state-of-the-art approaches observe a 1-2 dB drop in masked PSNR in the absence of multi-view cues and 4-5 dB drop when modeling complex motion. Code and data can be found at https://hangg7.com/dycheck.
Recently, aspect sentiment quad prediction (ASQP) has become a popular task in the field of aspect-level sentiment analysis. Previous work utilizes a predefined template to paraphrase the original sentence into a structure target sequence, which can be easily decoded as quadruplets of the form (aspect category, aspect term, opinion term, sentiment polarity). The template involves the four elements in a fixed order. However, we observe that this solution contradicts with the order-free property of the ASQP task, since there is no need to fix the template order as long as the quadruplet is extracted correctly. Inspired by the observation, we study the effects of template orders and find that some orders help the generative model achieve better performance. It is hypothesized that different orders provide various views of the quadruplet. Therefore, we propose a simple but effective method to identify the most proper orders, and further combine multiple proper templates as data augmentation to improve the ASQP task. Specifically, we use the pre-trained language model to select the orders with minimal entropy. By fine-tuning the pre-trained language model with these template orders, our approach improves the performance of quad prediction, and outperforms state-of-the-art methods significantly in low-resource settings.
Nowadays, transformer-based models gradually become the default choice for artificial intelligence pioneers. The models also show superiority even in the few-shot scenarios. In this paper, we revisit the classical methods and propose a new few-shot alternative. Specifically, we investigate the few-shot one-class problem, which actually takes a known sample as a reference to detect whether an unknown instance belongs to the same class. This problem can be studied from the perspective of sequence match. It is shown that with meta-learning, the classical sequence match method, i.e. Compare-Aggregate, significantly outperforms transformer ones. The classical approach requires much less training cost. Furthermore, we perform an empirical comparison between two kinds of sequence match approaches under simple fine-tuning and meta-learning. Meta-learning causes the transformer models' features to have high-correlation dimensions. The reason is closely related to the number of layers and heads of transformer models. Experimental codes and data are available at https://github.com/hmt2014/FewOne
The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with the model learning semantic information, and current graph representation learning methods have not eliminated their influence. To tackle this issue, we propose Robust Causal Graph Representation Learning (RCGRL) to learn robust graph representations against confounding effects. RCGRL introduces an active approach to generate instrumental variables under unconditional moment restrictions, which empowers the graph representation learning model to eliminate confounders, thereby capturing discriminative information that is causally related to downstream predictions. We offer theorems and proofs to guarantee the theoretical effectiveness of the proposed approach. Empirically, we conduct extensive experiments on a synthetic dataset and multiple benchmark datasets. The results demonstrate that compared with state-of-the-art methods, RCGRL achieves better prediction performance and generalization ability.
Near-term quantum systems tend to be noisy. Crosstalk noise has been recognized as one of several major types of noises in superconducting Noisy Intermediate-Scale Quantum (NISQ) devices. Crosstalk arises from the concurrent execution of two-qubit gates on nearby qubits, such as \texttt{CX}. It might significantly raise the error rate of gates in comparison to running them individually. Crosstalk can be mitigated through scheduling or hardware machine tuning. Prior scientific studies, however, manage crosstalk at a really late phase in the compilation process, usually after hardware mapping is done. It may miss great opportunities of optimizing algorithm logic, routing, and crosstalk at the same time. In this paper, we push the envelope by considering all these factors simultaneously at the very early compilation stage. We propose a crosstalk-aware quantum program compilation framework called CQC that can enhance crosstalk mitigation while achieving satisfactory circuit depth. Moreover, we identify opportunities for translation from intermediate representation to the circuit for application-specific crosstalk mitigation, for instance, the \texttt{CX} ladder construction in variational quantum eigensolvers (VQE). Evaluations through simulation and on real IBM-Q devices show that our framework can significantly reduce the error rate by up to 6$\times$, with only $\sim$60\% circuit depth compared to state-of-the-art gate scheduling approaches. In particular, for VQE, we demonstrate 49\% circuit depth reduction with 9.6\% fidelity improvement over prior art on the H4 molecule using IBMQ Guadalupe. Our CQC framework will be released on GitHub.