Alert button
Picture for Kenji Kawaguchi

Kenji Kawaguchi

Alert button

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Aug 17, 2023
Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei

Figure 1 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods
Figure 2 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods
Figure 3 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods
Figure 4 for A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

* 16 pages, 14 figures 
Viaarxiv icon

Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

Aug 09, 2023
Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi

Figure 1 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks
Figure 2 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks
Figure 3 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks
Figure 4 for Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

The curse-of-dimensionality (CoD) taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs as Richard Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computations are prohibitively expensive, and true scaling of general nonlinear PDEs to high dimensions has never been achieved. In this paper, we develop a new method of scaling up physics-informed neural networks (PINNs) to solve arbitrary high-dimensional PDEs. The new method, called Stochastic Dimension Gradient Descent (SDGD), decomposes a gradient of PDEs into pieces corresponding to different dimensions and samples randomly a subset of these dimensional pieces in each iteration of training PINNs. We theoretically prove the convergence guarantee and other desired properties of the proposed method. We experimentally demonstrate that the proposed method allows us to solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman (HJB) and the Schr\"{o}dinger equations in thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach. For instance, we solve nontrivial nonlinear PDEs (one HJB equation and one Black-Scholes equation) in 100,000 dimensions in 6 hours on a single GPU using SDGD with PINNs. Since SDGD is a general training methodology of PINNs, SDGD can be applied to any current and future variants of PINNs to scale them up for arbitrary high-dimensional PDEs.

* 37 pages, 8 figures 
Viaarxiv icon

IF2Net: Innately Forgetting-Free Networks for Continual Learning

Jun 18, 2023
Depeng Li, Tianqi Wang, Bingrong Xu, Kenji Kawaguchi, Zhigang Zeng, Ponnuthurai Nagaratnam Suganthan

Figure 1 for IF2Net: Innately Forgetting-Free Networks for Continual Learning
Figure 2 for IF2Net: Innately Forgetting-Free Networks for Continual Learning
Figure 3 for IF2Net: Innately Forgetting-Free Networks for Continual Learning
Figure 4 for IF2Net: Innately Forgetting-Free Networks for Continual Learning

Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge. Motivated by the characteristics of neural networks, in which information is stored in weights on connections, we investigated how to design an Innately Forgetting-Free Network (IF2Net) for continual learning context. This study proposed a straightforward yet effective learning paradigm by ingeniously keeping the weights relative to each seen task untouched before and after learning a new task. We first presented the novel representation-level learning on task sequences with random weights. This technique refers to tweaking the drifted representations caused by randomization back to their separate task-optimal working states, but the involved weights are frozen and reused (opposite to well-known layer-wise updates of weights). Then, sequential decision-making without forgetting can be achieved by projecting the output weight updates into the parsimonious orthogonal space, making the adaptations not disturb old knowledge while maintaining model plasticity. IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time by integrating the respective strengths of randomization and orthogonalization. We validated the effectiveness of our approach in the extensive theoretical analysis and empirical study.

* 16 pages, 8 figures. Under review 
Viaarxiv icon

Multi-View Class Incremental Learning

Jun 16, 2023
Depeng Li, Tianqi Wang, Junwei Chen, Kenji Kawaguchi, Cheng Lian, Zhigang Zeng

Figure 1 for Multi-View Class Incremental Learning
Figure 2 for Multi-View Class Incremental Learning
Figure 3 for Multi-View Class Incremental Learning
Figure 4 for Multi-View Class Incremental Learning

Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. To make MVL methods more practical in an open-ended environment, this paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views, requiring no access to earlier views of data. However, MVCIL is challenged by the catastrophic forgetting of old information and the interference with learning new concepts. To address this, we first develop a randomization-based representation learning technique serving for feature extraction to guarantee their separate view-optimal working states, during which multiple views belonging to a class are presented sequentially; Then, we integrate them one by one in the orthogonality fusion subspace spanned by the extracted features; Finally, we introduce selective weight consolidation for learning-without-forgetting decision-making while encountering new classes. Extensive experiments on synthetic and real-world datasets validate the effectiveness of our approach.

* 34 pages,4 figures. Under review 
Viaarxiv icon

Fast Diffusion Model

Jun 12, 2023
Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang

Figure 1 for Fast Diffusion Model
Figure 2 for Fast Diffusion Model
Figure 3 for Fast Diffusion Model
Figure 4 for Fast Diffusion Model

Despite their success in real data synthesis, diffusion models (DMs) often suffer from slow and costly training and sampling issues, limiting their broader applications. To mitigate this, we propose a Fast Diffusion Model (FDM) which improves the diffusion process of DMs from a stochastic optimization perspective to speed up both training and sampling. Specifically, we first find that the diffusion process of DMs accords with the stochastic optimization process of stochastic gradient descent (SGD) on a stochastic time-variant problem. Note that momentum SGD uses both the current gradient and an extra momentum, achieving more stable and faster convergence. We are inspired to introduce momentum into the diffusion process to accelerate both training and sampling. However, this comes with the challenge of deriving the noise perturbation kernel from the momentum-based diffusion process. To this end, we frame the momentum-based process as a Damped Oscillation system whose critically damped state -- the kernel solution -- avoids oscillation and thus has a faster convergence speed of the diffusion process. Empirical results show that our FDM can be applied to several popular DM frameworks, e.g. VP, VE, and EDM, and reduces their training cost by about 50% with comparable image synthesis performance on CIFAR-10, FFHQ, and AFHQv2 datasets. Moreover, FDM decreases their sampling steps by about $3\times$ to achieve similar performance under the same deterministic samplers. The code is available at https://github.com/sail-sg/FDM.

Viaarxiv icon

How Does Information Bottleneck Help Deep Learning?

May 30, 2023
Kenji Kawaguchi, Zhun Deng, Xu Ji, Jiaoyang Huang

Figure 1 for How Does Information Bottleneck Help Deep Learning?
Figure 2 for How Does Information Bottleneck Help Deep Learning?
Figure 3 for How Does Information Bottleneck Help Deep Learning?
Figure 4 for How Does Information Bottleneck Help Deep Learning?

Numerous deep learning algorithms have been inspired by and understood via the notion of information bottleneck, where unnecessary information is (often implicitly) minimized while task-relevant information is maximized. However, a rigorous argument for justifying why it is desirable to control information bottlenecks has been elusive. In this paper, we provide the first rigorous learning theory for justifying the benefit of information bottleneck in deep learning by mathematically relating information bottleneck to generalization errors. Our theory proves that controlling information bottleneck is one way to control generalization errors in deep learning, although it is not the only or necessary way. We investigate the merit of our new mathematical findings with experiments across a range of architectures and learning settings. In many cases, generalization errors are shown to correlate with the degree of information bottleneck: i.e., the amount of the unnecessary information at hidden layers. This paper provides a theoretical foundation for current and future methods through the lens of information bottleneck. Our new generalization bounds scale with the degree of information bottleneck, unlike the previous bounds that scale with the number of parameters, VC dimension, Rademacher complexity, stability or robustness. Our code is publicly available at: https://github.com/xu-ji/information-bottleneck

* Accepted at ICML 2023. Code is available at https://github.com/xu-ji/information-bottleneck 
Viaarxiv icon

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

May 28, 2023
Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang

Figure 1 for Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
Figure 2 for Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
Figure 3 for Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
Figure 4 for Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small language models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and Flan-T5 models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE and StrategyQA. Notably, our method makes the 250M models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.

* Preprint. Under review 
Viaarxiv icon

Automatic Model Selection with Large Language Models for Reasoning

May 23, 2023
Xu Zhao, Yuxi Xie, Kenji Kawaguchi, Junxian He, Qizhe Xie

Figure 1 for Automatic Model Selection with Large Language Models for Reasoning
Figure 2 for Automatic Model Selection with Large Language Models for Reasoning
Figure 3 for Automatic Model Selection with Large Language Models for Reasoning
Figure 4 for Automatic Model Selection with Large Language Models for Reasoning

Chain-of-Thought and Program-Aided Language Models represent two distinct reasoning methods, each with its own strengths and weaknesses. We demonstrate that it is possible to combine the best of both worlds by using different models for different problems, employing a large language model (LLM) to perform model selection. Through a theoretical analysis, we discover that the performance improvement is determined by the differences between the combined methods and the success rate of choosing the correct model. On eight reasoning datasets, our proposed approach shows significant improvements. Furthermore, we achieve new state-of-the-art results on GSM8K and SVAMP with accuracies of 96.5% and 93.7%, respectively. Our code is publicly available at https://github.com/XuZhao0/Model-Selection-Reasoning.

Viaarxiv icon