Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dinh Phung

Optimal Transport for Structure Learning Under Missing Data

Feb 23, 2024

Vy Vo, He Zhao, Trung Le, Edwin V. Bonilla, Dinh Phung

Figure 1 for Optimal Transport for Structure Learning Under Missing Data

Figure 2 for Optimal Transport for Structure Learning Under Missing Data

Figure 3 for Optimal Transport for Structure Learning Under Missing Data

Figure 4 for Optimal Transport for Structure Learning Under Missing Data

Abstract:Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or preferably causal relations among variables. Merely filling in missing values with existing imputation methods and subsequently applying structure learning on the complete data is empirical shown to be sub-optimal. To this end, we propose in this paper a score-based algorithm, based on optimal transport, for learning causal structure from missing data. This optimal transport viewpoint diverges from existing score-based approaches that are dominantly based on EM. We project structure learning as a density fitting problem, where the goal is to find the causal model that induces a distribution of minimum Wasserstein distance with the distribution over the observed data. Through extensive simulations and real-data experiments, our framework is shown to recover the true causal graphs more effectively than the baselines in various simulations and real-data experiments. Empirical evidences also demonstrate the superior scalability of our approach, along with the flexibility to incorporate any off-the-shelf causal discovery methods for complete data.

Via

Access Paper or Ask Questions

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Feb 17, 2024

Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari

Figure 1 for Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Figure 2 for Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Figure 3 for Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Figure 4 for Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Abstract:Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers. However, previous research on evaluating LLMs has solely focused on answer accuracy, neglecting the correctness of the generated CoT. In this paper, we delve deeper into the CoT reasoning capabilities of LLMs in multi-hop question answering by utilizing knowledge graphs (KGs). We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT. Through experiments conducted on 5 different families of LLMs across 2 multi-hop question-answering datasets, we find that LLMs possess sufficient knowledge to perform reasoning. However, there exists a significant disparity between answer accuracy and faithfulness of the CoT reasoning generated by LLMs, indicating that they often arrive at correct answers through incorrect reasoning.

* Minh-Vuong Nguyen and Linhao Luo are co-first authors and contributed equally to the preparation of this manuscript

Via

Access Paper or Ask Questions

A Class-aware Optimal Transport Approach with Higher-Order Moment Matching for Unsupervised Domain Adaptation

Jan 29, 2024

Tuan Nguyen, Van Nguyen, Trung Le, He Zhao, Quan Hung Tran, Dinh Phung

Abstract:Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. In this paper, we introduce a novel approach called class-aware optimal transport (OT), which measures the OT distance between a distribution over the source class-conditional distributions and a mixture of source and target data distribution. Our class-aware OT leverages a cost function that determines the matching extent between a given data example and a source class-conditional distribution. By optimizing this cost function, we find the optimal matching between target examples and source class-conditional distributions, effectively addressing the data and label shifts that occur between the two domains. To handle the class-aware OT efficiently, we propose an amortization solution that employs deep neural networks to formulate the transportation probabilities and the cost function. Additionally, we propose minimizing class-aware Higher-order Moment Matching (HMM) to align the corresponding class regions on the source and target domains. The class-aware HMM component offers an economical computational approach for accurately evaluating the HMM distance between the two distributions. Extensive experiments on benchmark datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art baselines.

* 18 pages

Via

Access Paper or Ask Questions

Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay

Dec 10, 2023

Khanh Doan, Quyen Tran, Tuan Nguyen, Dinh Phung, Trung Le

Figure 1 for Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay

Figure 2 for Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay

Figure 3 for Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay

Figure 4 for Class-Prototype Conditional Diffusion Model for Continual Learning with Generative Replay

Abstract:Mitigating catastrophic forgetting is a key hurdle in continual learning. Deep Generative Replay (GR) provides techniques focused on generating samples from prior tasks to enhance the model's memory capabilities. With the progression in generative AI, generative models have advanced from Generative Adversarial Networks (GANs) to the more recent Diffusion Models (DMs). A major issue is the deterioration in the quality of generated data compared to the original, as the generator continuously self-learns from its outputs. This degradation can lead to the potential risk of catastrophic forgetting occurring in the classifier. To address this, we propose the Class-Prototype Conditional Diffusion Model (CPDM), a GR-based approach for continual learning that enhances image quality in generators and thus reduces catastrophic forgetting in classifiers. The cornerstone of CPDM is a learnable class-prototype that captures the core characteristics of images in a given class. This prototype, integrated into the diffusion model's denoising process, ensures the generation of high-quality images. It maintains its effectiveness for old tasks even when new tasks are introduced, preserving image generation quality and reducing the risk of catastrophic forgetting in classifiers. Our empirical studies on diverse datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art models, highlighting its exceptional ability to preserve image quality and enhance the model's memory retention.

Via

Access Paper or Ask Questions

KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Nov 30, 2023

Quyen Tran, Lam Tran, Khoat Than, Toan Tran, Dinh Phung, Trung Le

Figure 1 for KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Figure 2 for KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Figure 3 for KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Figure 4 for KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Abstract:Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when lacking control over the correlations between old task queries and keys of future tasks, the shift of features in the latent space, and the relative separation of latent vectors learned in independent tasks. In this work, we introduce a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features. Furthermore, we introduce a One-Versus-All (OVA) prototype-based component that enhances the classification head distinction. Experimental results on benchmark datasets demonstrate that our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.

Via

Access Paper or Ask Questions

Robust Contrastive Learning With Theory Guarantee

Nov 16, 2023

Ngoc N. Tran, Lam Tran, Hoang Phan, Anh Bui, Tung Pham, Toan Tran, Dinh Phung, Trung Le

Figure 1 for Robust Contrastive Learning With Theory Guarantee

Figure 2 for Robust Contrastive Learning With Theory Guarantee

Figure 3 for Robust Contrastive Learning With Theory Guarantee

Figure 4 for Robust Contrastive Learning With Theory Guarantee

Abstract:Contrastive learning (CL) is a self-supervised training paradigm that allows us to extract meaningful features without any label information. A typical CL framework is divided into two phases, where it first tries to learn the features from unlabelled data, and then uses those features to train a linear classifier with the labeled data. While a fair amount of existing theoretical works have analyzed how the unsupervised loss in the first phase can support the supervised loss in the second phase, none has examined the connection between the unsupervised loss and the robust supervised loss, which can shed light on how to construct an effective unsupervised loss for the first phase of CL. To fill this gap, our work develops rigorous theories to dissect and identify which components in the unsupervised loss can help improve the robust supervised loss and conduct proper experiments to verify our findings.

* 27 pages, 0 figures. arXiv admin note: text overlap with arXiv:2305.10252

Via

Access Paper or Ask Questions

PhoGPT: Generative Pre-training for Vietnamese

Nov 06, 2023

Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Nhung Nguyen, Thien Huu Nguyen, Dinh Phung, Hung Bui

Figure 1 for PhoGPT: Generative Pre-training for Vietnamese

Abstract:We open-source a state-of-the-art 7.5B-parameter generative model series named PhoGPT for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-7B5 and its instruction-following variant, PhoGPT-7B5-Instruct. In addition, we also demonstrate its superior performance compared to previous open-source models through a human evaluation experiment. GitHub: https://github.com/VinAIResearch/PhoGPT

* PhoGPT Technical Report - 4 pages

Via

Access Paper or Ask Questions

Systematic Assessment of Factual Knowledge in Large Language Models

Oct 30, 2023

Linhao Luo, Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari

Abstract:Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.

* Accepted by EMNLP 2023 Findings

Via

Access Paper or Ask Questions

Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

Oct 02, 2023

Thanh Nguyen-Duc, Trung Le, Roland Bammer, He Zhao, Jianfei Cai, Dinh Phung

Abstract:Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data. Existing semi-supervised segmentation methods are usually based on the smoothness assumption. This assumption implies that the model output distributions of two similar data samples are encouraged to be invariant. In other words, the smoothness assumption states that similar samples (e.g., adding small perturbations to an image) should have similar outputs. In this paper, we introduce a novel cross-adversarial local distribution (Cross-ALD) regularization to further enhance the smoothness assumption for semi-supervised medical image segmentation task. We conducted comprehensive experiments that the Cross-ALD archives state-of-the-art performance against many recent methods on the public LA and ACDC datasets.

* MICCAI 2023

Via

Access Paper or Ask Questions

Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation

Sep 30, 2023

Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Quan Hung Tran, Dinh Phung

Figure 1 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation

Figure 2 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation

Figure 3 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation

Figure 4 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation

Abstract:Data-Free Knowledge Distillation (DFKD) has recently made remarkable advancements with its core principle of transferring knowledge from a teacher neural network to a student neural network without requiring access to the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in the production of low-quality data and imposing substantial time requirements for training the generator. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the randomness source from the input to a noisy layer and utilizes the meaningful label-text embedding (LTE) as the input. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches.

Via

Access Paper or Ask Questions