Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haoliang Li

Nanyang Technological University, Singapore

Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Jun 14, 2024

Hui Liu, Wenya Wang, Hao Sun, Chris Xing Tian, Chenqi Kong, Xin Dong, Haoliang Li

Figure 1 for Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Figure 2 for Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Figure 3 for Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Figure 4 for Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Abstract:Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across tasks. These methods generally assume the selection process captures similarities between the exemplar and the target instance, however, it remains unknown what kinds of similarities are captured and vital to performing ICL. To dive into this question, we analyze the working mechanisms of the learning-based demonstration selection methods and empirically identify two important factors related to similarity measurement: 1) The ability to integrate different levels of task-agnostic text similarities between the input of exemplars and test cases enhances generalization power across different tasks. 2) Incorporating task-specific labels when measuring the similarities significantly improves the performance on each specific task. We validate these two findings through extensive quantitative and qualitative analyses across ten datasets and various LLMs. Based on our findings, we introduce two effective yet simplified exemplar selection methods catering to task-agnostic and task-specific demands, eliminating the costly LLM inference overhead.

Via

Access Paper or Ask Questions

Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

May 28, 2024

Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan, Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang(+1 more)

Figure 1 for Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

Figure 2 for Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

Figure 3 for Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

Figure 4 for Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

Abstract:The advancement of artificial intelligence (AI) for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome these limitations, we propose a universal, extensible framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models, enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors and ease the addition of new classes. We train our Universal Model on 3,410 CT volumes assembled from 14 publicly available datasets and then test it on 6,173 CT volumes from four external datasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD) public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary, Universal Model exhibits remarkable computational efficiency (6x faster than other dataset-specific models), demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks, and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes, models, and datasets are available at https://github.com/ljwztc/CLIP-Driven-Universal-Model

* Accepted to Medical Image Analysis

Via

Access Paper or Ask Questions

Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method

May 07, 2024

Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, Haoliang Li

Abstract:The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies.

Via

Access Paper or Ask Questions

Gradient-Congruity Guided Federated Sparse Training

May 02, 2024

Chris Xing Tian, Yibing Liu, Haoliang Li, Ray C. C. Cheung, Shiqi Wang

Figure 1 for Gradient-Congruity Guided Federated Sparse Training

Figure 2 for Gradient-Congruity Guided Federated Sparse Training

Figure 3 for Gradient-Congruity Guided Federated Sparse Training

Figure 4 for Gradient-Congruity Guided Federated Sparse Training

Abstract:Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs regarding resource-constrained devices, and poor generalization performance due to the heterogeneity of data across edge clients and the presence of out-of-distribution data. In this paper, we propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework to address these issues. Our method leverages the idea that the neurons, in which the associated gradients with conflicting directions with respect to the global model contain irrelevant or less generalized information for other clients, and could be pruned during the sparse training process. Conversely, the neurons where the associated gradients with consistent directions could be grown in a higher priority. In this way, FedSGC can greatly reduce the local computation and communication overheads while, at the same time, enhancing the generalization abilities of FL. We evaluate our method on challenging non-i.i.d settings and show that it achieves competitive accuracy with state-of-the-art FL methods across various scenarios while minimizing computation and communication costs.

Via

Access Paper or Ask Questions

MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection

Apr 12, 2024

Chenqi Kong, Anwei Luo, Song Xia, Yi Yu, Haoliang Li, Alex C. Kot

Abstract:Deepfakes have recently raised significant trust issues and security concerns among the public. Compared to CNN face forgery detectors, ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance. However, these approaches still exhibit the following limitations: (1). Fully fine-tuning ViT-based models from ImageNet weights demands substantial computational and storage resources; (2). ViT-based methods struggle to capture local forgery clues, leading to model bias and limited generalizability. To tackle these challenges, this work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach. MoE-FFD only updates lightweight Low-Rank Adaptation (LoRA) and Adapter layers while keeping the ViT backbone frozen, thereby achieving parameter-efficient training. Moreover, MoE-FFD leverages the expressivity of transformers and local priors of CNNs to simultaneously extract global and local forgery clues. Additionally, novel MoE modules are designed to scale the model's capacity and select optimal forgery experts, further enhancing forgery detection performance. The proposed MoE learning scheme can be seamlessly adapted to various transformer backbones in a plug-and-play manner. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art face forgery detection performance with reduced parameter overhead. The code will be released upon acceptance.

Via

Access Paper or Ask Questions

Log Neural Controlled Differential Equations: The Lie Brackets Make a Difference

Feb 28, 2024

Benjamin Walker, Andrew D. McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, Terry Lyons

Abstract:The vector field of a controlled differential equation (CDE) describes the relationship between a control path and the evolution of a solution path. Neural CDEs (NCDEs) treat time series data as observations from a control path, parameterise a CDE's vector field using a neural network, and use the solution path as a continuously evolving hidden state. As their formulation makes them robust to irregular sampling rates, NCDEs are a powerful approach for modelling real-world data. Building on neural rough differential equations (NRDEs), we introduce Log-NCDEs, a novel and effective method for training NCDEs. The core component of Log-NCDEs is the Log-ODE method, a tool from the study of rough paths for approximating a CDE's solution. On a range of multivariate time series classification benchmarks, Log-NCDEs are shown to achieve a higher average test set accuracy than NCDEs, NRDEs, and two state-of-the-art models, S5 and the linear recurrent unit.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection

Feb 12, 2024

Hui Liu, Wenya Wang, Haoru Li, Haoliang Li

Figure 1 for TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection

Figure 2 for TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection

Figure 3 for TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection

Figure 4 for TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection

Abstract:The proliferation of fake news has emerged as a severe societal problem, raising significant interest from industry and academia. While existing deep-learning based methods have made progress in detecting fake news accurately, their reliability may be compromised caused by the non-transparent reasoning processes, poor generalization abilities and inherent risks of integration with large language models (LLMs). To address this challenge, we propose {\methodname}, a novel framework for trustworthy fake news detection that prioritizes explainability, generalizability and controllability of models. This is achieved via a dual-system framework that integrates cognition and decision systems, adhering to the principles above. The cognition system harnesses human expertise to generate logical predicates, which guide LLMs in generating human-readable logic atoms. Meanwhile, the decision system deduces generalizable logic rules to aggregate these atoms, enabling the identification of the truthfulness of the input news across diverse domains and enhancing transparency in the decision-making process. Finally, we present comprehensive evaluation results on four datasets, demonstrating the feasibility and trustworthiness of our proposed framework. Our implementation is available at \url{https://github.com/less-and-less-bugs/Trust_TELLER}.

* 28 pages, 2 figures, 16 tables

Via

Access Paper or Ask Questions

Domain Generalization with Small Data

Feb 09, 2024

Kecheng Chen, Elena Gal, Hong Yan, Haoliang Li

Figure 1 for Domain Generalization with Small Data

Figure 2 for Domain Generalization with Small Data

Figure 3 for Domain Generalization with Small Data

Figure 4 for Domain Generalization with Small Data

Abstract:In this work, we propose to tackle the problem of domain generalization in the context of \textit{insufficient samples}. Instead of extracting latent feature embeddings based on deterministic models, we propose to learn a domain-invariant representation based on the probabilistic framework by mapping each data point into probabilistic embeddings. Specifically, we first extend empirical maximum mean discrepancy (MMD) to a novel probabilistic MMD that can measure the discrepancy between mixture distributions (i.e., source domains) consisting of a series of latent distributions rather than latent points. Moreover, instead of imposing the contrastive semantic alignment (CSA) loss based on pairs of latent points, a novel probabilistic CSA loss encourages positive probabilistic embedding pairs to be closer while pulling other negative ones apart. Benefiting from the learned representation captured by probabilistic models, our proposed method can marriage the measurement on the \textit{distribution over distributions} (i.e., the global perspective alignment) and the distribution-based contrastive semantic alignment (i.e., the local perspective alignment). Extensive experimental results on three challenging medical datasets show the effectiveness of our proposed method in the context of insufficient data compared with state-of-the-art methods.

* This paper has been accepted by International Journal of Computer Vision

Via

Access Paper or Ask Questions

Target-agnostic Source-free Domain Adaptation for Regression Tasks

Dec 01, 2023

Tianlang He, Zhiqiu Xia, Jierun Chen, Haoliang Li, S. -H. Gary Chan

Figure 1 for Target-agnostic Source-free Domain Adaptation for Regression Tasks

Figure 2 for Target-agnostic Source-free Domain Adaptation for Regression Tasks

Figure 3 for Target-agnostic Source-free Domain Adaptation for Regression Tasks

Figure 4 for Target-agnostic Source-free Domain Adaptation for Regression Tasks

Abstract:Unsupervised domain adaptation (UDA) seeks to bridge the domain gap between the target and source using unlabeled target data. Source-free UDA removes the requirement for labeled source data at the target to preserve data privacy and storage. However, work on source-free UDA assumes knowledge of domain gap distribution, and hence is limited to either target-aware or classification task. To overcome it, we propose TASFAR, a novel target-agnostic source-free domain adaptation approach for regression tasks. Using prediction confidence, TASFAR estimates a label density map as the target label distribution, which is then used to calibrate the source model on the target domain. We have conducted extensive experiments on four regression tasks with various domain gaps, namely, pedestrian dead reckoning for different users, image-based people counting in different scenes, housing-price prediction at different districts, and taxi-trip duration prediction from different departure points. TASFAR is shown to substantially outperform the state-of-the-art source-free UDA approaches by averagely reducing 22% errors for the four tasks and achieve notably comparable accuracy as source-based UDA without using source data.

* Accepted by ICDE 2024

Via

Access Paper or Ask Questions

Robust Domain Misinformation Detection via Multi-modal Feature Alignment

Nov 24, 2023

Hui Liu, Wenya Wang, Hao Sun, Anderson Rocha, Haoliang Li

Figure 1 for Robust Domain Misinformation Detection via Multi-modal Feature Alignment

Figure 2 for Robust Domain Misinformation Detection via Multi-modal Feature Alignment

Figure 3 for Robust Domain Misinformation Detection via Multi-modal Feature Alignment

Figure 4 for Robust Domain Misinformation Detection via Multi-modal Feature Alignment

Abstract:Social media misinformation harms individuals and societies and is potentialized by fast-growing multi-modal content (i.e., texts and images), which accounts for higher "credibility" than text-only news pieces. Although existing supervised misinformation detection methods have obtained acceptable performances in key setups, they may require large amounts of labeled data from various events, which can be time-consuming and tedious. In turn, directly training a model by leveraging a publicly available dataset may fail to generalize due to domain shifts between the training data (a.k.a. source domains) and the data from target domains. Most prior work on domain shift focuses on a single modality (e.g., text modality) and ignores the scenario where sufficient unlabeled target domain data may not be readily available in an early stage. The lack of data often happens due to the dynamic propagation trend (i.e., the number of posts related to fake news increases slowly before catching the public attention). We propose a novel robust domain and cross-modal approach (\textbf{RDCM}) for multi-modal misinformation detection. It reduces the domain shift by aligning the joint distribution of textual and visual modalities through an inter-domain alignment module and bridges the semantic gap between both modalities through a cross-modality alignment module. We also propose a framework that simultaneously considers application scenarios of domain generalization (in which the target domain data is unavailable) and domain adaptation (in which unlabeled target domain data is available). Evaluation results on two public multi-modal misinformation detection datasets (Pheme and Twitter Datasets) evince the superiority of the proposed model. The formal implementation of this paper can be found in this link: https://github.com/less-and-less-bugs/RDCM

* Accepted by TIFS 2023

Via

Access Paper or Ask Questions