Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kenji Kawaguchi

ChOiRe: Characterizing and Predicting Human Opinions with Chain of Opinion Reasoning

Nov 15, 2023
Xuan Long Do, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen

Aligning language models (LMs) with human opinion is challenging yet vital to enhance their grasp of human values, preferences, and beliefs. We present ChOiRe, a four-step solution framework to predict human opinion that differentiates between the user explicit personae (i.e. demographic or ideological attributes) that are manually declared and implicit personae inferred from user historical opinions. Specifically, it consists of (i) an LM analyzing the user explicit personae to filter out irrelevant attributes; (ii) the LM ranking the implicit persona opinions into a preferential list; (iii) Chain-of-Opinion (CoO) reasoning, where the LM sequentially analyzes the explicit personae and the most relevant implicit personae to perform opinion prediction; (iv) and where ChOiRe executes Step (iii) CoO multiple times with increasingly larger lists of implicit personae to overcome insufficient personae information to infer a final result. ChOiRe achieves new state-of-the-art effectiveness with limited inference calls, improving previous LLM-based techniques significantly by 3.22%.

* 17 pages

Via

Access Paper or Ask Questions

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Oct 23, 2023
Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. However, the previous MGM studies focus extensively on graph masking and encoder, while there is limited understanding of tokenizer and decoder. To bridge the gap, we first summarize popular molecule tokenizers at the granularity of node, edge, motif, and Graph Neural Networks (GNNs), and then examine their roles as the MGM's reconstruction targets. Further, we explore the potential of adopting an expressive decoder in MGM. Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. Finally, we propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy. We empirically validate that our method outperforms the existing molecule self-supervised learning methods. Our codes and checkpoints are available at https://github.com/syr-cn/SimSGT.

* NeurIPS 2023. 10 pages

Via

Access Paper or Ask Questions

PICProp: Physics-Informed Confidence Propagation for Uncertainty Quantification

Oct 20, 2023
Qianli Shen, Wai Hoh Tang, Zhun Deng, Apostolos Psaros, Kenji Kawaguchi

Standard approaches for uncertainty quantification in deep learning and physics-informed learning have persistent limitations. Indicatively, strong assumptions regarding the data likelihood are required, the performance highly depends on the selection of priors, and the posterior can be sampled only approximately, which leads to poor approximations because of the associated computational cost. This paper introduces and studies confidence interval (CI) estimation for deterministic partial differential equations as a novel problem. That is, to propagate confidence, in the form of CIs, from data locations to the entire domain with probabilistic guarantees. We propose a method, termed Physics-Informed Confidence Propagation (PICProp), based on bi-level optimization to compute a valid CI without making heavy assumptions. We provide a theorem regarding the validity of our method, and computational experiments, where the focus is on physics-informed learning.

* Accepted at NeurIPS 2023. Code is available at https://github.com/ShenQianli/PICProp

Via

Access Paper or Ask Questions

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Oct 19, 2023
Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception - a critical ability of human professionals in comprehending molecules' topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. MolCA enables an LM (e.g., Galactica) to understand both text- and graph-based molecular contents via the cross-modal projector. Specifically, the cross-modal projector is implemented as a Q-Former to connect a graph encoder's representation space and an LM's text space. Further, MolCA employs a uni-modal adapter (i.e., LoRA) for the LM's efficient adaptation to downstream tasks. Unlike previous studies that couple an LM with a graph encoder via cross-modal contrastive learning, MolCA retains the LM's ability of open-ended text generation and augments it with 2D graph information. To showcase its effectiveness, we extensively benchmark MolCA on tasks of molecule captioning, IUPAC name prediction, and molecule-text retrieval, on which MolCA significantly outperforms the baselines. Our codes and checkpoints can be found at https://github.com/acharkq/MolCA.

* EMNLP main conference. 9 pages

Via

Access Paper or Ask Questions

Self-Supervised Dataset Distillation for Transfer Learning

Oct 16, 2023
Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

Figure 1 for Self-Supervised Dataset Distillation for Transfer Learning

Figure 2 for Self-Supervised Dataset Distillation for Transfer Learning

Figure 3 for Self-Supervised Dataset Distillation for Transfer Learning

Figure 4 for Self-Supervised Dataset Distillation for Transfer Learning

Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.

Via

Access Paper or Ask Questions

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Oct 10, 2023
Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi

Figure 1 for AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Figure 2 for AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Figure 3 for AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Figure 4 for AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

* 32 pages including Appendix

Via

Access Paper or Ask Questions

Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Oct 10, 2023
Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

Figure 1 for Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Figure 2 for Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Figure 3 for Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Figure 4 for Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Via

Access Paper or Ask Questions

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Aug 17, 2023
Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

Figure 1 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Figure 2 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Figure 3 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Figure 4 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

* 16 pages, 14 figures

Via

Access Paper or Ask Questions

Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

Aug 09, 2023
Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi

Figure 1 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

Figure 2 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

Figure 3 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

Figure 4 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

The curse-of-dimensionality (CoD) taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs as Richard Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computations are prohibitively expensive, and true scaling of general nonlinear PDEs to high dimensions has never been achieved. In this paper, we develop a new method of scaling up physics-informed neural networks (PINNs) to solve arbitrary high-dimensional PDEs. The new method, called Stochastic Dimension Gradient Descent (SDGD), decomposes a gradient of PDEs into pieces corresponding to different dimensions and samples randomly a subset of these dimensional pieces in each iteration of training PINNs. We theoretically prove the convergence guarantee and other desired properties of the proposed method. We experimentally demonstrate that the proposed method allows us to solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman (HJB) and the Schr\"{o}dinger equations in thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach. For instance, we solve nontrivial nonlinear PDEs (one HJB equation and one Black-Scholes equation) in 100,000 dimensions in 6 hours on a single GPU using SDGD with PINNs. Since SDGD is a general training methodology of PINNs, SDGD can be applied to any current and future variants of PINNs to scale them up for arbitrary high-dimensional PDEs.

* 37 pages, 8 figures

Via

Access Paper or Ask Questions