Alert button
Picture for Trang Nguyen

Trang Nguyen

Alert button

Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering

Oct 09, 2023
Trang Nguyen, Naoaki Okazaki

Generalization in Visual Question Answering (VQA) requires models to answer questions about images with contexts beyond the training distribution. Existing attempts primarily refine unimodal aspects, overlooking enhancements in multimodal aspects. Besides, diverse interpretations of the input lead to various modes of answer generation, highlighting the role of causal reasoning between interpreting and answering steps in VQA. Through this lens, we propose Cognitive pathways VQA (CopVQA) improving the multimodal predictions by emphasizing causal reasoning factors. CopVQA first operates a pool of pathways that capture diverse causal reasoning flows through interpreting and answering stages. Mirroring human cognition, we decompose the responsibility of each stage into distinct experts and a cognition-enabled component (CC). The two CCs strategically execute one expert for each stage at a time. Finally, we prioritize answer predictions governed by pathways involving both CCs while disregarding answers produced by either CC, thereby emphasizing causal reasoning and supporting generalization. Our experiments on real-life and medical data consistently verify that CopVQA improves VQA performance and generalization across baselines and domains. Notably, CopVQA achieves a new state-of-the-art (SOTA) on PathVQA dataset and comparable accuracy to the current SOTA on VQA-CPv2, VQAv2, and VQA RAD, with one-fourth of the model size.

Viaarxiv icon

Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

Oct 05, 2023
Trang Nguyen, Alexander Tong, Kanika Madan, Yoshua Bengio, Dianbo Liu

Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular processes. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.

Viaarxiv icon

Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

May 31, 2023
Ayush Chakravarthy, Trang Nguyen, Anirudh Goyal, Yoshua Bengio, Michael C. Mozer

Figure 1 for Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior
Figure 2 for Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior
Figure 3 for Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior
Figure 4 for Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

The aim of object-centric vision is to construct an explicit representation of the objects in a scene. This representation is obtained via a set of interchangeable modules called \emph{slots} or \emph{object files} that compete for local patches of an image. The competition has a weak inductive bias to preserve spatial continuity; consequently, one slot may claim patches scattered diffusely throughout the image. In contrast, the inductive bias of human vision is strong, to the degree that attention has classically been described with a spotlight metaphor. We incorporate a spatial-locality prior into state-of-the-art object-centric vision models and obtain significant improvements in segmenting objects in both synthetic and real-world datasets. Similar to human visual attention, the combination of image content and spatial constraints yield robust unsupervised object-centric learning, including less sensitivity to model hyperparameters.

* 16 pages, 3 figures, under review at NeurIPS 2023 
Viaarxiv icon

The 2022 NIST Language Recognition Evaluation

Feb 28, 2023
Yooyoung Lee, Craig Greenberg, Eliot Godard, Asad A. Butt, Elliot Singer, Trang Nguyen, Lisa Mason, Douglas Reynolds

Figure 1 for The 2022 NIST Language Recognition Evaluation
Figure 2 for The 2022 NIST Language Recognition Evaluation
Figure 3 for The 2022 NIST Language Recognition Evaluation
Figure 4 for The 2022 NIST Language Recognition Evaluation

In 2022, the U.S. National Institute of Standards and Technology (NIST) conducted the latest Language Recognition Evaluation (LRE) in an ongoing series administered by NIST since 1996 to foster research in language recognition and to measure state-of-the-art technology. Similar to previous LREs, LRE22 focused on conversational telephone speech (CTS) and broadcast narrowband speech (BNBS) data. LRE22 also introduced new evaluation features, such as an emphasis on African languages, including low resource languages, and a test set consisting of segments containing between 3s and 35s of speech randomly sampled and extracted from longer recordings. A total of 21 research organizations, forming 16 teams, participated in this 3-month long evaluation and made a total of 65 valid system submissions to be evaluated. This paper presents an overview of LRE22 and an analysis of system performance over different evaluation conditions. The evaluation results suggest that Oromo and Tigrinya are easier to detect while Xhosa and Zulu are more challenging. A greater confusability is seen for some language pairs. When speech duration increased, system performance significantly increased up to a certain duration, and then a diminishing return on system performance is observed afterward.

* 5 pages, 10 figures 
Viaarxiv icon

Reusable Slotwise Mechanisms

Feb 21, 2023
Trang Nguyen, Amin Mansouri, Kanika Madan, Khuong Nguyen, Kartik Ahuja, Dianbo Liu, Yoshua Bengio

Figure 1 for Reusable Slotwise Mechanisms
Figure 2 for Reusable Slotwise Mechanisms
Figure 3 for Reusable Slotwise Mechanisms
Figure 4 for Reusable Slotwise Mechanisms

Agents that can understand and reason over the dynamics of objects can have a better capability to act robustly and generalize to novel scenarios. Such an ability, however, requires a suitable representation of the scene as well as an understanding of the mechanisms that govern the interactions of different subsets of objects. To address this problem, we propose RSM, or Reusable Slotwise Mechanisms, that jointly learns a slotwise representation of the scene and a modular architecture that dynamically chooses one mechanism among a set of reusable mechanisms to predict the next state of each slot. RSM crucially takes advantage of a \textit{Central Contextual Information (CCI)}, which lets each selected reusable mechanism access the rest of the slots through a bottleneck, effectively allowing for modeling higher order and complex interactions that might require a sparse subset of objects. We show how this model outperforms state-of-the-art methods in a variety of next-step prediction tasks ranging from grid-world environments to Atari 2600 games. Particularly, we challenge methods that model the dynamics with Graph Neural Networks (GNNs) on top of slotwise representations, and modular architectures that restrict the interactions to be only pairwise. Finally, we show that RSM is able to generalize to scenes with objects varying in number and shape, highlighting its out-of-distribution generalization capabilities. Our implementation is available online\footnote{\hyperlink{}{}}.

Viaarxiv icon

Fast Approximation of the Generalized Sliced-Wasserstein Distance

Oct 19, 2022
Dung Le, Huy Nguyen, Khai Nguyen, Trang Nguyen, Nhat Ho

Figure 1 for Fast Approximation of the Generalized Sliced-Wasserstein Distance
Figure 2 for Fast Approximation of the Generalized Sliced-Wasserstein Distance

Generalized sliced Wasserstein distance is a variant of sliced Wasserstein distance that exploits the power of non-linear projection through a given defining function to better capture the complex structures of the probability distributions. Similar to sliced Wasserstein distance, generalized sliced Wasserstein is defined as an expectation over random projections which can be approximated by the Monte Carlo method. However, the complexity of that approximation can be expensive in high-dimensional settings. To that end, we propose to form deterministic and fast approximations of the generalized sliced Wasserstein distance by using the concentration of random projections when the defining functions are polynomial function, circular function, and neural network type function. Our approximations hinge upon an important result that one-dimensional projections of a high-dimensional random vector are approximately Gaussian.

* 22 pages, 2 figures. Dung Le, Huy Nguyen and Khai Nguyen contributed equally to this work 
Viaarxiv icon