Alert button
Picture for Quan Tran

Quan Tran

Alert button

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Jul 24, 2023
Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen

Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration.

* Accepted at INTERSPEECH 2023, 6 pages 
Viaarxiv icon

Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Apr 26, 2023
Anh Bui, Trung Le, He Zhao, Quan Tran, Paul Montague, Dinh Phung

Figure 1 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization
Figure 2 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization
Figure 3 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization
Figure 4 for Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Deep learning models, even the-state-of-the-art ones, are highly vulnerable to adversarial examples. Adversarial training is one of the most efficient methods to improve the model's robustness. The key factor for the success of adversarial training is the capability to generate qualified and divergent adversarial examples which satisfy some objectives/goals (e.g., finding adversarial examples that maximize the model losses for simultaneously attacking multiple models). Therefore, multi-objective optimization (MOO) is a natural tool for adversarial example generation to achieve multiple objectives/goals simultaneously. However, we observe that a naive application of MOO tends to maximize all objectives/goals equally, without caring if an objective/goal has been achieved yet. This leads to useless effort to further improve the goal-achieved tasks, while putting less focus on the goal-unachieved tasks. In this paper, we propose \emph{Task Oriented MOO} to address this issue, in the context where we can explicitly define the goal achievement for a task. Our principle is to only maintain the goal-achieved tasks, while letting the optimizer spend more effort on improving the goal-unachieved tasks. We conduct comprehensive experiments for our Task Oriented MOO on various adversarial example generation schemes. The experimental results firmly demonstrate the merit of our proposed approach. Our code is available at \url{https://github.com/tuananhbui89/TAMOO}.

Viaarxiv icon

A Unified Wasserstein Distributional Robustness Framework for Adversarial Training

Feb 27, 2022
Tuan Anh Bui, Trung Le, Quan Tran, He Zhao, Dinh Phung

Figure 1 for A Unified Wasserstein Distributional Robustness Framework for Adversarial Training
Figure 2 for A Unified Wasserstein Distributional Robustness Framework for Adversarial Training
Figure 3 for A Unified Wasserstein Distributional Robustness Framework for Adversarial Training
Figure 4 for A Unified Wasserstein Distributional Robustness Framework for Adversarial Training

It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and TRADES, typically seek a pointwise adversary that generates the worst-case adversarial example by independently perturbing each data sample, as a way to "probe" the vulnerability of the classifier. Arguably, there are unexplored benefits in considering such adversarial effects from an entire distribution. To this end, this paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework. This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms. Extensive experiments show that our distributional robustness AT algorithms robustify further their standard AT counterparts in various settings.

Viaarxiv icon

Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images

Oct 01, 2021
Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille

Figure 1 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Figure 2 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Figure 3 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Figure 4 for Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images

While neural symbolic methods demonstrate impressive performance in visual question answering on synthetic images, their performance suffers on real images. We identify that the long-tail distribution of visual concepts and unequal importance of reasoning steps in real data are the two key obstacles that limit the models' real-world potentials. To address these challenges, we propose a new paradigm, Calibrating Concepts and Operations (CCO), which enables neural symbolic models to capture underlying data characteristics and to reason with hierarchical importance. Specifically, we introduce an executor with learnable concept embedding magnitudes for handling distribution imbalance, and an operation calibrator for highlighting important operations and suppressing redundant ones. Our experiments show CCO substantially boosts the performance of neural symbolic methods on real images. By evaluating models on the real world dataset GQA, CCO helps the neural symbolic method NSCL outperforms its vanilla counterpart by 9.1% (from 47.0% to 56.1%); this result also largely reduces the performance gap between symbolic and non-symbolic methods. Additionally, we create a perturbed test set for better understanding and analyzing model performance on real images. Code is available at https://github.com/Lizw14/CaliCO.git .

* To appear in ICCV2021; Code at https://github.com/Lizw14/CaliCO.git 
Viaarxiv icon

Learning to Predict Visual Attributes in the Wild

Jun 17, 2021
Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava

Figure 1 for Learning to Predict Visual Attributes in the Wild
Figure 2 for Learning to Predict Visual Attributes in the Wild
Figure 3 for Learning to Predict Visual Attributes in the Wild
Figure 4 for Learning to Predict Visual Attributes in the Wild

Visual attributes constitute a large portion of information contained in a scene. Objects can be described using a wide variety of attributes which portray their visual appearance (color, texture), geometry (shape, size, posture), and other intrinsic properties (state, action). Existing work is mostly limited to study of attribute prediction in specific domains. In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances. Formally, object attribute prediction is a multi-label classification problem where all attributes that apply to an object must be predicted. Our dataset poses significant challenges to existing methods due to large number of attributes, label sparsity, data imbalance, and object occlusion. To this end, we propose several techniques that systematically tackle these challenges, including a base model that utilizes both low- and high-level CNN features with multi-hop attention, reweighting and resampling techniques, a novel negative label expansion scheme, and a novel supervised attribute-aware contrastive learning algorithm. Using these techniques, we achieve near 3.7 mAP and 5.7 overall F1 points improvement over the current state of the art. Further details about the VAW dataset can be found at http://vawdataset.com/.

* Accepted to CVPR 2021 
Viaarxiv icon

Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering

Nov 05, 2020
Quan Tran, Nhan Dam, Tuan Lai, Franck Dernoncourt, Trung Le, Nham Le, Dinh Phung

Figure 1 for Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Figure 2 for Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Figure 3 for Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering

Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications.

* Accepted to COLING 2020 
Viaarxiv icon

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

Aug 04, 2020
Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li

Figure 1 for Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
Figure 2 for Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
Figure 3 for Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
Figure 4 for Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. It is a challenging task considering the large variation of image domains and the lack of training supervision. Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset, and manipulates the embedded visual features by applying text-guided vector arithmetic on the image feature maps. A structure-preserving image decoder then generates the manipulated images from the manipulated feature maps. We further propose an on-the-fly sample-specific optimization approach with cycle-consistency constraints to regularize the manipulated images and force them to preserve details of the source images. Our approach shows promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.

* To appear on ECCV 2020. Introduction video at https://youtu.be/8E3bwvjCHYE and code at https://github.com/xh-liu/Open-Edit 
Viaarxiv icon

Context-Aware Group Captioning via Self-Attention and Contrastive Features

Apr 07, 2020
Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan Yuille

Figure 1 for Context-Aware Group Captioning via Self-Attention and Contrastive Features
Figure 2 for Context-Aware Group Captioning via Self-Attention and Contrastive Features
Figure 3 for Context-Aware Group Captioning via Self-Attention and Contrastive Features
Figure 4 for Context-Aware Group Captioning via Self-Attention and Contrastive Features

While image captioning has progressed rapidly, existing works focus mainly on describing single images. In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images. Context-aware group captioning requires not only summarizing information from both the target and reference image group but also contrasting between them. To solve this problem, we propose a framework combining self-attention mechanism with contrastive feature construction to effectively summarize common information from each image group while capturing discriminative information between them. To build the dataset for this task, we propose to group the images and generate the group captions based on single image captions using scene graphs matching. Our datasets are constructed on top of the public Conceptual Captions dataset and our new Stock Captions dataset. Experiments on the two datasets show the effectiveness of our method on this new task. Related Datasets and code are released at https://lizw14.github.io/project/groupcap .

* To appear in CVPR 2020; Project page: https://lizw14.github.io/project/groupcap 
Viaarxiv icon

Named Entity Recognition with stack residual LSTM and trainable bias decoding

Jul 11, 2017
Quan Tran, Andrew MacKinlay, Antonio Jimeno Yepes

Figure 1 for Named Entity Recognition with stack residual LSTM and trainable bias decoding
Figure 2 for Named Entity Recognition with stack residual LSTM and trainable bias decoding
Figure 3 for Named Entity Recognition with stack residual LSTM and trainable bias decoding
Figure 4 for Named Entity Recognition with stack residual LSTM and trainable bias decoding

Recurrent Neural Network models are the state-of-the-art for Named Entity Recognition (NER). We present two innovations to improve the performance of these models. The first innovation is the introduction of residual connections between the Stacked Recurrent Neural Network model to address the degradation problem of deep neural networks. The second innovation is a bias decoding mechanism that allows the trained system to adapt to non-differentiable and externally computed objectives, such as the entity-based F-measure. Our work improves the state-of-the-art results for both Spanish and English languages on the standard train/development/test split of the CoNLL 2003 Shared Task NER dataset.

Viaarxiv icon