Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anh Nguyen

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Apr 14, 2024

Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De

Figure 1 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Figure 2 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Figure 3 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Figure 4 for DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Abstract:Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

Via

Access Paper or Ask Questions

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Apr 11, 2024

Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan, Anh Nguyen

Figure 1 for Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Figure 2 for Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Figure 3 for Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Figure 4 for Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Abstract:High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed images, named Samba. Samba utilizes an encoder-decoder architecture, with Samba blocks serving as the encoder for efficient multi-level semantic information extraction, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets, comparing its performance against top-performing CNN and ViT methods. The results reveal that Samba achieved unparalleled performance on commonly used remote sensing datasets for semantic segmentation. Our proposed Samba demonstrates for the first time the effectiveness of SSM in semantic segmentation of remotely sensed images, setting a new benchmark in performance for Mamba-based techniques in this specific application. The source code and baseline implementations are available at https://github.com/zhuqinfeng1999/Samba.

Via

Access Paper or Ask Questions

Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization

Apr 11, 2024

Olatunji Mumini Omisore, Toluwanimi Akinyemi, Anh Nguyen, Lei Wang

Abstract:Although robot-assisted cardiovascular catheterization is commonly performed for intervention of cardiovascular diseases, more studies are needed to support the procedure with automated tool segmentation. This can aid surgeons on tool tracking and visualization during intervention. Learning-based segmentation has recently offered state-of-the-art segmentation performances however, generating ground-truth signals for fully-supervised methods is labor-intensive and time consuming for the interventionists. In this study, a weakly-supervised learning method with multi-lateral pseudo labeling is proposed for tool segmentation in cardiac angiograms. The method includes a modified U-Net model with one encoder and multiple lateral-branched decoders that produce pseudo labels as supervision signals under different perturbation. The pseudo labels are self-generated through a mixed loss function and shared consistency in the decoders. We trained the model end-to-end with weakly-annotated data obtained during robotic cardiac catheterization. Experiments with the proposed model shows weakly annotated data has closer performance to when fully annotated data is used. Compared to three existing weakly-supervised methods, our approach yielded higher segmentation performance across three different cardiac angiogram data. With ablation study, we showed consistent performance under different parameters. Thus, we offer a less expensive method for real-time tool segmentation and tracking during robot-assisted cardiac catheterization.

Via

Access Paper or Ask Questions

ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

Mar 22, 2024

Minh Tran, Winston Bounsavy, Khoa Vo, Anh Nguyen, Tri Nguyen, Ngan Le

Abstract:Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utilization of amodal features through the amodal-to-visible can confuse the visible features due to the extra information of occluded/hidden segments not presented in visible display. Consequently, this compromised quality of visible features during the subsequent visible-to-amodal transition. To tackle this issue, we introduce ShapeFormer, a decoupled Transformer-based model with a visible-to-amodal transition. It facilitates the explicit relationship between output segmentations and avoids the need for amodal-to-visible transitions. ShapeFormer comprises three key modules: (i) Visible-Occluding Mask Head for predicting visible segmentation with occlusion awareness, (ii) Shape-Prior Amodal Mask Head for predicting amodal and occluded masks, and (iii) Category-Specific Shape Prior Retriever aims to provide shape prior knowledge. Comprehensive experiments and extensive ablation studies across various AIS benchmarks demonstrate the effectiveness of our ShapeFormer. The code is available at: https://github.com/UARK-AICV/ShapeFormer

* Accepted to IJCNN 2024

Via

Access Paper or Ask Questions

On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification

Mar 19, 2024

Simon Klüttermann, Jérôme Rutinowski, Anh Nguyen, Britta Grimme, Moritz Roidl, Emmanuel Müller

Figure 1 for On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification

Figure 2 for On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification

Figure 3 for On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification

Figure 4 for On the Effectiveness of Heterogeneous Ensemble Methods for Re-identification

Abstract:In this contribution, we introduce a novel ensemble method for the re-identification of industrial entities, using images of chipwood pallets and galvanized metal plates as dataset examples. Our algorithms replace commonly used, complex siamese neural networks with an ensemble of simplified, rudimentary models, providing wider applicability, especially in hardware-restricted scenarios. Each ensemble sub-model uses different types of extracted features of the given data as its input, allowing for the creation of effective ensembles in a fraction of the training duration needed for more complex state-of-the-art models. We reach state-of-the-art performance at our task, with a Rank-1 accuracy of over 77% and a Rank-10 accuracy of over 99%, and introduce five distinct feature extraction approaches, and study their combination using different ensemble methods.

Via

Access Paper or Ask Questions

PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

Mar 08, 2024

Thang M. Pham, Peijie Chen, Tin Nguyen, Seunghyun Yoon, Trung Bui, Anh Nguyen

Figure 1 for PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

Figure 2 for PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

Figure 3 for PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

Figure 4 for PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

Abstract:CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder. That is, CLIP performs poorly on new classes or the classes whose names rarely appear on the Internet (e.g., scientific names of birds). For fine-grained classification, we propose PEEB - an explainable and editable classifier to (1) express the class name into a set of pre-defined text descriptors that describe the visual parts of that class; and (2) match the embeddings of the detected parts to their textual descriptors in each class to compute a logit score for classification. In a zero-shot setting where the class names are unknown, PEEB outperforms CLIP by a large margin (~10x in accuracy). Compared to part-based classifiers, PEEB is not only the state-of-the-art on the supervised-learning setting (88.80% accuracy) but also the first to enable users to edit the class definitions to form a new classifier without retraining. Compared to concept bottleneck models, PEEB is also the state-of-the-art in both zero-shot and supervised learning settings.

* Under review

Via

Access Paper or Ask Questions

Autonomous Catheterization with Open-source Simulator and Expert Trajectory

Jan 20, 2024

Tudor Jianu, Baoru Huang, Tuan Vo, Minh Nhat Vu, Jingxuan Kang, Hoan Nguyen, Olatunji Omisore, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen

Figure 1 for Autonomous Catheterization with Open-source Simulator and Expert Trajectory

Figure 2 for Autonomous Catheterization with Open-source Simulator and Expert Trajectory

Figure 3 for Autonomous Catheterization with Open-source Simulator and Expert Trajectory

Figure 4 for Autonomous Catheterization with Open-source Simulator and Expert Trajectory

Abstract:Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures. In this chapter, we introduce CathSim, the first open-source simulator for endovascular intervention to address these limitations. CathSim emphasizes real-time performance to enable rapid development and testing of learning algorithms. We validate CathSim against the real robot and show that our simulator can successfully mimic the behavior of the real robot. Based on CathSim, we develop a multimodal expert navigation network and demonstrate its effectiveness in downstream endovascular navigation tasks. The intensive experimental results suggest that CathSim has the potential to significantly accelerate research in the autonomous catheterization field. Our project is publicly available at https://github.com/airvlab/cathsim.

* Code: https://github.com/airvlab/cathsim

Via

Access Paper or Ask Questions

WAVER: Writing-style Agnostic Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge

Dec 27, 2023

Huy Le, Tung Kieu, Anh Nguyen, Ngan Le

Abstract:Text-video retrieval, a prominent sub-field within the domain of multimodal information retrieval, has witnessed remarkable growth in recent years. However, existing methods assume video scenes are consistent with unbiased descriptions. These limitations fail to align with real-world scenarios since descriptions can be influenced by annotator biases, diverse writing styles, and varying textual perspectives. To overcome the aforementioned problems, we introduce WAVER, a cross-domain knowledge distillation framework via vision-language models through open-vocabulary knowledge designed to tackle the challenge of handling different writing styles in video descriptions. WAVER capitalizes on the open-vocabulary properties that lie in pre-trained vision-language models and employs an implicit knowledge distillation approach to transfer text-based knowledge from a teacher model to a vision-based student. Empirical studies conducted across four standard benchmark datasets, encompassing various settings, provide compelling evidence that WAVER can achieve state-of-the-art performance in text-video retrieval task while handling writing-style variations.

* Accepted to ICASSP 2024

Via

Access Paper or Ask Questions

Leveraging Habitat Information for Fine-grained Bird Identification

Dec 22, 2023

Tin Nguyen, Anh Nguyen

Abstract:Traditional bird classifiers mostly rely on the visual characteristics of birds. Some prior works even train classifiers to be invariant to the background, completely discarding the living environment of birds. Instead, we are the first to explore integrating habitat information, one of the four major cues for identifying birds by ornithologists, into modern bird classifiers. We focus on two leading model types: (1) CNNs and ViTs trained on the downstream bird datasets; and (2) original, multi-modal CLIP. Training CNNs and ViTs with habitat-augmented data results in an improvement of up to +0.83 and +0.23 points on NABirds and CUB-200, respectively. Similarly, adding habitat descriptors to the prompts for CLIP yields a substantial accuracy boost of up to +0.99 and +1.1 points on NABirds and CUB-200, respectively. We find consistent accuracy improvement after integrating habitat features into the image augmentation process and into the textual descriptors of vision-language CLIP classifiers. Code is available at: https://anonymous.4open.science/r/reasoning-8B7E/.

Via

Access Paper or Ask Questions

TinyGSM: achieving >80% on GSM8k with small language models

Dec 14, 2023

Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang

Figure 1 for TinyGSM: achieving >80% on GSM8k with small language models

Figure 2 for TinyGSM: achieving >80% on GSM8k with small language models

Figure 3 for TinyGSM: achieving >80% on GSM8k with small language models

Figure 4 for TinyGSM: achieving >80% on GSM8k with small language models

Abstract:Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break the 80\% barrier on the GSM8K benchmark remains to be 34B. Our work studies how high-quality datasets may be the key for small language models to acquire mathematical reasoning. We introduce \texttt{TinyGSM}, a synthetic dataset of 12.3M grade school math problems paired with Python solutions, generated fully by GPT-3.5. After finetuning on \texttt{TinyGSM}, we find that a duo of a 1.3B generation model and a 1.3B verifier model can achieve 81.5\% accuracy, outperforming existing models that are orders of magnitude larger. This also rivals the performance of the GPT-3.5 ``teacher'' model (77.4\%), from which our model's training data is generated. Our approach is simple and has two key components: 1) the high-quality dataset \texttt{TinyGSM}, 2) the use of a verifier, which selects the final outputs from multiple candidate generations.

Via

Access Paper or Ask Questions