Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dinh Phung

RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Sep 29, 2023

Tuan Truong, Hoang-Phi Nguyen, Tung Pham, Minh-Tuan Tran, Mehrtash Harandi, Dinh Phung, Trung Le

Figure 1 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Figure 2 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Figure 3 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Figure 4 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Abstract:Nowadays, understanding the geometry of the loss landscape shows promise in enhancing a model's generalization ability. In this work, we draw upon prior works that apply geometric principles to optimization and present a novel approach to improve robustness and generalization ability for constrained optimization problems. Indeed, this paper aims to generalize the Sharpness-Aware Minimization (SAM) optimizer to Riemannian manifolds. In doing so, we first extend the concept of sharpness and introduce a novel notion of sharpness on manifolds. To support this notion of sharpness, we present a theoretical analysis characterizing generalization capabilities with respect to manifold sharpness, which demonstrates a tighter bound on the generalization gap, a result not known before. Motivated by this analysis, we introduce our algorithm, Riemannian Sharpness-Aware Minimization (RSAM). To demonstrate RSAM's ability to enhance generalization ability, we evaluate and contrast our algorithm on a broad set of problems, such as image classification and contrastive learning across different datasets, including CIFAR100, CIFAR10, and FGVCAircraft. Our code is publicly available at \url{https://t.ly/RiemannianSAM}.

Via

Access Paper or Ask Questions

Towards Generalising Neural Topical Representations

Jul 24, 2023

Xiaohao Yang, He Zhao, Dinh Phung, Lan Du

Figure 1 for Towards Generalising Neural Topical Representations

Figure 2 for Towards Generalising Neural Topical Representations

Figure 3 for Towards Generalising Neural Topical Representations

Figure 4 for Towards Generalising Neural Topical Representations

Abstract:Topic models have evolved from conventional Bayesian probabilistic models to Neural Topic Models (NTMs) over the last two decays. Although NTMs have achieved promising performance when trained and tested on a specific corpus, their generalisation ability across corpora is rarely studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation for documents in a different target corpus without retraining. In this work, we aim to improve NTMs further so that their benefits generalise reliably across corpora and tasks. To do so, we propose to model similar documents by minimising their semantical distance when training NTMs. Specifically, similar documents are created by data augmentation during training; The semantical distance between documents is measured by the Hierarchical Topic Transport Distance (HOTT), which computes the Optimal Transport (OT) distance between the topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalisation ability regarding neural topical representation across corpora.

Via

Access Paper or Ask Questions

Optimal Transport Model Distributional Robustness

Jun 07, 2023

Van-Anh Nguyen, Trung Le, Anh Tuan Bui, Thanh-Toan Do, Dinh Phung

Abstract:Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in data space. In this work, we explore an optimal transport-based distributional robustness framework on model spaces. Specifically, we examine a model distribution in a Wasserstein ball of a given center model distribution that maximizes the loss. We have developed theories that allow us to learn the optimal robust center model distribution. Interestingly, through our developed theories, we can flexibly incorporate the concept of sharpness awareness into training a single model, ensemble models, and Bayesian Neural Networks by considering specific forms of the center model distribution, such as a Dirac delta distribution over a single model, a uniform distribution over several models, and a general Bayesian Neural Network. Furthermore, we demonstrate that sharpness-aware minimization (SAM) is a specific case of our framework when using a Dirac delta distribution over a single model, while our framework can be viewed as a probabilistic extension of SAM. We conduct extensive experiments to demonstrate the usefulness of our framework in the aforementioned settings, and the results show remarkable improvements in our approaches to the baselines.

Via

Access Paper or Ask Questions

Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

May 26, 2023

Michael Fu, Trung Le, Van Nguyen, Chakkrit Tantithamthavorn, Dinh Phung

Figure 1 for Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Figure 2 for Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Figure 3 for Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Figure 4 for Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Abstract:Deep learning (DL) models have become increasingly popular in identifying software vulnerabilities. Prior studies found that vulnerabilities across different vulnerable programs may exhibit similar vulnerable scopes, implicitly forming discernible vulnerability patterns that can be learned by DL models through supervised training. However, vulnerable scopes still manifest in various spatial locations and formats within a program, posing challenges for models to accurately identify vulnerable statements. Despite this challenge, state-of-the-art vulnerability detection approaches fail to exploit the vulnerability patterns that arise in vulnerable programs. To take full advantage of vulnerability patterns and unleash the ability of DL models, we propose a novel vulnerability-matching approach in this paper, drawing inspiration from program analysis tools that locate vulnerabilities based on pre-defined patterns. Specifically, a vulnerability codebook is learned, which consists of quantized vectors representing various vulnerability patterns. During inference, the codebook is iterated to match all learned patterns and predict the presence of potential vulnerabilities within a given program. Our approach was extensively evaluated on a real-world dataset comprising more than 188,000 C/C++ functions. The evaluation results show that our approach achieves an F1-score of 94% (6% higher than the previous best) and 82% (19% higher than the previous best) for function and statement-level vulnerability identification, respectively. These substantial enhancements highlight the effectiveness of our approach to identifying vulnerabilities. The training code and pre-trained models are available at https://github.com/optimatch/optimatch.

Via

Access Paper or Ask Questions

Learning Directed Graphical Models with Optimal Transport

May 25, 2023

Vy Vo, Trung Le, Long-Tung Vuong, He Zhao, Edwin Bonilla, Dinh Phung

Figure 1 for Learning Directed Graphical Models with Optimal Transport

Figure 2 for Learning Directed Graphical Models with Optimal Transport

Figure 3 for Learning Directed Graphical Models with Optimal Transport

Figure 4 for Learning Directed Graphical Models with Optimal Transport

Abstract:Estimating the parameters of a probabilistic directed graphical model from incomplete data remains a long-standing challenge. This is because, in the presence of latent variables, both the likelihood function and posterior distribution are intractable without further assumptions about structural dependencies or model classes. While existing learning methods are fundamentally based on likelihood maximization, here we offer a new view of the parameter learning problem through the lens of optimal transport. This perspective licenses a framework that operates on many directed graphs without making unrealistic assumptions on the posterior over the latent variables or resorting to black-box variational approximations. We develop a theoretical framework and support it with extensive empirical evidence demonstrating the flexibility and versatility of our approach. Across experiments, we show that not only can our method recover the ground-truth parameters but it also performs competitively on downstream applications, notably the non-trivial task of discrete representation learning.

Via

Access Paper or Ask Questions

Sharpness & Shift-Aware Self-Supervised Learning

May 17, 2023

Ngoc N. Tran, Son Duong, Hoang Phan, Tung Pham, Dinh Phung, Trung Le

Figure 1 for Sharpness & Shift-Aware Self-Supervised Learning

Figure 2 for Sharpness & Shift-Aware Self-Supervised Learning

Figure 3 for Sharpness & Shift-Aware Self-Supervised Learning

Figure 4 for Sharpness & Shift-Aware Self-Supervised Learning

Abstract:Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. In this paper, we consider classification as a downstream task in phase 2 and develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. Our theories signify that sharpness-aware feature extractors benefit the classification task in phase 2 and the existing data shift between the ideal (i.e., the ideal one used in theory development) and practical (i.e., the practical one used in implementation) distributions to generate positive pairs also remarkably affects this classification task. Further harvesting these theoretical findings, we propose to minimize the sharpness of the feature extractor and a new Fourier-based data augmentation technique to relieve the data shift in the distributions generating positive pairs, reaching Sharpness & Shift-Aware Contrastive Learning (SSA-CLR). We conduct extensive experiments to verify our theoretical findings and demonstrate that sharpness & shift-aware contrastive learning can remarkably boost the performance as well as obtaining more robust extracted features compared with the baselines.

Via

Access Paper or Ask Questions

Active Continual Learning: Labelling Queries in a Sequence of Tasks

May 06, 2023

Thuy-Trang Vu, Shahram Khadivi, Dinh Phung, Gholamreza Haffari

Figure 1 for Active Continual Learning: Labelling Queries in a Sequence of Tasks

Figure 2 for Active Continual Learning: Labelling Queries in a Sequence of Tasks

Figure 3 for Active Continual Learning: Labelling Queries in a Sequence of Tasks

Figure 4 for Active Continual Learning: Labelling Queries in a Sequence of Tasks

Abstract:Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL). While tasks arrive sequentially, the training data are often prepared and annotated independently, leading to CL of incoming supervised learning tasks. This paper considers the under-explored problem of active continual learning (ACL) for a sequence of active learning (AL) tasks, where each incoming task includes a pool of unlabelled data and an annotation budget. We investigate the effectiveness and interplay between several AL and CL algorithms in the domain, class and task-incremental scenarios. Our experiments reveal the trade-off between two contrasting goals of not forgetting the old knowledge and the ability to quickly learn in CL and AL. While conditioning the query strategy on the annotations collected for the previous tasks leads to improved task performance on the domain and task incremental learning, our proposed forgetting-learning profile suggests a gap in balancing the effect of AL and CL for the class-incremental scenario.

Via

Access Paper or Ask Questions

Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Apr 26, 2023

Anh Bui, Trung Le, He Zhao, Quan Tran, Paul Montague, Dinh Phung

Figure 1 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Figure 2 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Figure 3 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Figure 4 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Abstract:Deep learning models, even the-state-of-the-art ones, are highly vulnerable to adversarial examples. Adversarial training is one of the most efficient methods to improve the model's robustness. The key factor for the success of adversarial training is the capability to generate qualified and divergent adversarial examples which satisfy some objectives/goals (e.g., finding adversarial examples that maximize the model losses for simultaneously attacking multiple models). Therefore, multi-objective optimization (MOO) is a natural tool for adversarial example generation to achieve multiple objectives/goals simultaneously. However, we observe that a naive application of MOO tends to maximize all objectives/goals equally, without caring if an objective/goal has been achieved yet. This leads to useless effort to further improve the goal-achieved tasks, while putting less focus on the goal-unachieved tasks. In this paper, we propose \emph{Task Oriented MOO} to address this issue, in the context where we can explicitly define the goal achievement for a task. Our principle is to only maintain the goal-achieved tasks, while letting the optimizer spend more effort on improving the goal-unachieved tasks. We conduct comprehensive experiments for our Task Oriented MOO on various adversarial example generation schemes. The experimental results firmly demonstrate the merit of our proposed approach. Our code is available at \url{https://github.com/tuananhbui89/TAMOO}.

Via

Access Paper or Ask Questions

Hyperbolic Geometry in Computer Vision: A Survey

Apr 21, 2023

Pengfei Fang, Mehrtash Harandi, Trung Le, Dinh Phung

Figure 1 for Hyperbolic Geometry in Computer Vision: A Survey

Figure 2 for Hyperbolic Geometry in Computer Vision: A Survey

Figure 3 for Hyperbolic Geometry in Computer Vision: A Survey

Figure 4 for Hyperbolic Geometry in Computer Vision: A Survey

Abstract:Hyperbolic geometry, a Riemannian manifold endowed with constant sectional negative curvature, has been considered an alternative embedding space in many learning scenarios, \eg, natural language processing, graph learning, \etc, as a result of its intriguing property of encoding the data's hierarchical structure (like irregular graph or tree-likeness data). Recent studies prove that such data hierarchy also exists in the visual dataset, and investigate the successful practice of hyperbolic geometry in the computer vision (CV) regime, ranging from the classical image classification to advanced model adaptation learning. This paper presents the first and most up-to-date literature review of hyperbolic spaces for CV applications. To this end, we first introduce the background of hyperbolic geometry, followed by a comprehensive investigation of algorithms, with geometric prior of hyperbolic space, in the context of visual applications. We also conclude this manuscript and identify possible future directions.

* First survey paper for the hyperbolic geometry in CV applications

Via

Access Paper or Ask Questions

Vector Quantized Wasserstein Auto-Encoder

Feb 12, 2023

Tung-Long Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung

Abstract:Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE), most of work in learning deep discrete representations has mainly focused on improving the original VQ-VAE form and none of them has studied learning deep discrete representations from the generative viewpoint. In this work, we study learning deep discrete representations from the generative viewpoint. Specifically, we endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution via minimizing a WS distance between them. We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution. Finally, we empirically evaluate our method on several well-known benchmarks, where it achieves better qualitative and quantitative performances than the other VQ-VAE variants in terms of the codebook utilization and image reconstruction/generation.

Via

Access Paper or Ask Questions