Alert button
Picture for Jiaqi Ma

Jiaqi Ma

Alert button

A Metadata-Driven Approach to Understand Graph Neural Networks

Oct 30, 2023
Ting Wei Li, Qiaozhu Mei, Jiaqi Ma

Graph Neural Networks (GNNs) have achieved remarkable success in various applications, but their performance can be sensitive to specific data properties of the graph datasets they operate on. Current literature on understanding the limitations of GNNs has primarily employed a $\textit{model-driven}$ approach that leverage heuristics and domain knowledge from network science or graph theory to model the GNN behaviors, which is time-consuming and highly subjective. In this work, we propose a $\textit{metadata-driven}$ approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. We perform a multivariate sparse regression analysis on the metadata derived from benchmarking GNN performance across diverse datasets, yielding a set of salient data properties. To validate the effectiveness of our data-driven approach, we focus on one identified data property, the degree distribution, and investigate how this property influences GNN performance through theoretical analysis and controlled experiments. Our theoretical findings reveal that datasets with more balanced degree distribution exhibit better linear separability of node representations, thus leading to better GNN performance. We also conduct controlled experiments using synthetic datasets with varying degree distributions, and the results align well with our theoretical findings. Collectively, both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.

Viaarxiv icon

Investigating the Fairness of Large Language Models for Predictions on Tabular Data

Oct 23, 2023
Yanchen Liu, Srishti Gautam, Jiaqi Ma, Himabindu Lakkaraju

Recent literature has suggested the potential of using large language models (LLMs) to make predictions for tabular tasks. However, LLMs have been shown to exhibit harmful social biases that reflect the stereotypes and inequalities present in the society. To this end, as well as the widespread use of tabular data in many high-stake applications, it is imperative to explore the following questions: what sources of information do LLMs draw upon when making predictions for tabular tasks; whether and to what extent are LLM predictions for tabular tasks influenced by social biases and stereotypes; and what are the consequential implications for fairness? Through a series of experiments, we delve into these questions and show that LLMs tend to inherit social biases from their training data which significantly impact their fairness in tabular prediction tasks. Furthermore, our investigations show that in the context of bias mitigation, though in-context learning and fine-tuning have a moderate effect, the fairness metric gap between different subgroups is still larger than that in traditional machine learning models, such as Random Forest and shallow Neural Networks. This observation emphasizes that the social biases are inherent within the LLMs themselves and inherited from their pre-training corpus, not only from the downstream task datasets. Besides, we demonstrate that label-flipping of in-context examples can significantly reduce biases, further highlighting the presence of inherent bias within LLMs.

Viaarxiv icon

Optimizing the Placement of Roadside LiDARs for Autonomous Driving

Oct 11, 2023
Wentao Jiang, Hao Xiang, Xinyu Cai, Runsheng Xu, Jiaqi Ma, Yikang Li, Gim Hee Lee, Si Liu

Multi-agent cooperative perception is an increasingly popular topic in the field of autonomous driving, where roadside LiDARs play an essential role. However, how to optimize the placement of roadside LiDARs is a crucial but often overlooked problem. This paper proposes an approach to optimize the placement of roadside LiDARs by selecting optimized positions within the scene for better perception performance. To efficiently obtain the best combination of locations, a greedy algorithm based on perceptual gain is proposed, which selects the location that can maximize the perceptual gain sequentially. We define perceptual gain as the increased perceptual capability when a new LiDAR is placed. To obtain the perception capability, we propose a perception predictor that learns to evaluate LiDAR placement using only a single point cloud frame. A dataset named Roadside-Opt is created using the CARLA simulator to facilitate research on the roadside LiDAR placement problem.

Viaarxiv icon

Can LLMs Effectively Leverage Graph Structural Information: When and Why

Sep 29, 2023
Jin Huang, Xingjian Zhang, Qiaozhu Mei, Jiaqi Ma

This paper studies Large Language Models (LLMs) augmented with structured data--particularly graphs--a crucial data modality that remains underexplored in the LLM literature. We aim to understand when and why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs on node classification tasks with textual features. To address the ``when'' question, we examine a variety of prompting methods for encoding structural information, in settings where textual node features are either rich or scarce. For the ``why'' questions, we probe into two potential contributing factors to the LLM performance: data leakage and homophily. Our exploration of these questions reveals that (i) LLMs can benefit from structural information, especially when textual node features are scarce; (ii) there is no substantial evidence indicating that the performance of LLMs is significantly attributed to data leakage; and (iii) the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node\footnote{Codes and datasets are at: \url{https://github.com/TRAIS-Lab/LLM-Structured-Data}}.

Viaarxiv icon

Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

Aug 31, 2023
Si Liu, Chen Gao, Yuan Chen, Xingyu Peng, Xianghao Kong, Kun Wang, Runsheng Xu, Wentao Jiang, Hao Xiang, Jiaqi Ma, Miao Wang

Figure 1 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception
Figure 2 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception
Figure 3 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception
Figure 4 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

Vehicle-to-everything (V2X) autonomous driving opens up a promising direction for developing a new generation of intelligent transportation systems. Collaborative perception (CP) as an essential component to achieve V2X can overcome the inherent limitations of individual perception, including occlusion and long-range perception. In this survey, we provide a comprehensive review of CP methods for V2X scenarios, bringing a profound and in-depth understanding to the community. Specifically, we first introduce the architecture and workflow of typical V2X systems, which affords a broader perspective to understand the entire V2X system and the role of CP within it. Then, we thoroughly summarize and analyze existing V2X perception datasets and CP methods. Particularly, we introduce numerous CP methods from various crucial perspectives, including collaboration stages, roadside sensors placement, latency compensation, performance-bandwidth trade-off, attack/defense, pose alignment, etc. Moreover, we conduct extensive experimental analyses to compare and examine current CP methods, revealing some essential and unexplored insights. Specifically, we analyze the performance changes of different methods under different bandwidths, providing a deep insight into the performance-bandwidth trade-off issue. Also, we examine methods under different LiDAR ranges. To study the model robustness, we further investigate the effects of various simulated real-world noises on the performance of different CP methods, covering communication latency, lossy communication, localization errors, and mixed noises. In addition, we look into the sim-to-real generalization ability of existing CP methods. At last, we thoroughly discuss issues and challenges, highlighting promising directions for future efforts. Our codes for experimental analysis will be public at https://github.com/memberRE/Collaborative-Perception.

* 19 pages 
Viaarxiv icon

Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage

Aug 08, 2023
Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju

Figure 1 for Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage
Figure 2 for Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage
Figure 3 for Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage
Figure 4 for Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage

Machine learning models are increasingly utilized across impactful domains to predict individual outcomes. As such, many models provide algorithmic recourse to individuals who receive negative outcomes. However, recourse can be leveraged by adversaries to disclose private information. This work presents the first attempt at mitigating such attacks. We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR). Using logistic regression classifiers and real world and synthetic datasets, we find that DPM and LR perform well in reducing what an adversary can infer, especially at low FPR. When training dataset size is large enough, we find particular success in preventing privacy leakage while maintaining model and recourse accuracy with our novel LR method.

* Proceedings of The Second Workshop on New Frontiers in Adversarial Machine Learning (AdvML-Frontiers @ ICML 2023) 
Viaarxiv icon

Fair Machine Unlearning: Data Removal while Mitigating Disparities

Jul 27, 2023
Alex Oesterling, Jiaqi Ma, Flavio P. Calmon, Hima Lakkaraju

Figure 1 for Fair Machine Unlearning: Data Removal while Mitigating Disparities
Figure 2 for Fair Machine Unlearning: Data Removal while Mitigating Disparities
Figure 3 for Fair Machine Unlearning: Data Removal while Mitigating Disparities
Figure 4 for Fair Machine Unlearning: Data Removal while Mitigating Disparities

As public consciousness regarding the collection and use of personal information by corporations grows, it is of increasing importance that consumers be active participants in the curation of corporate datasets. In light of this, data governance frameworks such as the General Data Protection Regulation (GDPR) have outlined the right to be forgotten as a key principle allowing individuals to request that their personal data be deleted from the databases and models used by organizations. To achieve forgetting in practice, several machine unlearning methods have been proposed to address the computational inefficiencies of retraining a model from scratch with each unlearning request. While efficient online alternatives to retraining, it is unclear how these methods impact other properties critical to real-world applications, such as fairness. In this work, we propose the first fair machine unlearning method that can provably and efficiently unlearn data instances while preserving group fairness. We derive theoretical results which demonstrate that our method can provably unlearn data instances while maintaining fairness objectives. Extensive experimentation with real-world datasets highlight the efficacy of our method at unlearning data instances while preserving fairness.

* 27 pages, 3 figures, accepted to ICML 2023 DMLR Workshop 
Viaarxiv icon

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

Jul 25, 2023
Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju

Figure 1 for Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions
Figure 2 for Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions
Figure 3 for Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions
Figure 4 for Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.

* Accepted to Workshop on Challenges in Deployable Generative AI at ICML 2023 
Viaarxiv icon

Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather

Jul 20, 2023
Jinlong Li, Runsheng Xu, Jin Ma, Qin Zou, Jiaqi Ma, Hongkai Yu

Figure 1 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather
Figure 2 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather
Figure 3 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather
Figure 4 for Domain Adaptation based Enhanced Detection for Autonomous Driving in Foggy and Rainy Weather

Typically, object detection methods for autonomous driving that rely on supervised learning make the assumption of a consistent feature distribution between the training and testing data, however such assumption may fail in different weather conditions. Due to the domain gap, a detection model trained under clear weather may not perform well in foggy and rainy conditions. Overcoming detection bottlenecks in foggy and rainy weather is a real challenge for autonomous vehicles deployed in the wild. To bridge the domain gap and improve the performance of object detectionin foggy and rainy weather, this paper presents a novel framework for domain-adaptive object detection. The adaptations at both the image-level and object-level are intended to minimize the differences in image style and object appearance between domains. Furthermore, in order to improve the model's performance on challenging examples, we introduce a novel adversarial gradient reversal layer that conducts adversarial mining on difficult instances in addition to domain adaptation. Additionally, we suggest generating an auxiliary domain through data augmentation to enforce a new domain-level metric regularization. Experimental findings on public V2V benchmark exhibit a substantial enhancement in object detection specifically for foggy and rainy driving scenarios.

* only change the title of this paper 
Viaarxiv icon

S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Jul 18, 2023
Jinlong Li, Runsheng Xu, Xinyu Liu, Baolu Li, Qin Zou, Jiaqi Ma, Hongkai Yu

Figure 1 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
Figure 2 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
Figure 3 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
Figure 4 for S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Due to the lack of real multi-agent data and time-consuming of labeling, existing multi-agent cooperative perception algorithms usually select the simulated sensor data for training and validating. However, the perception performance is degraded when these simulation-trained models are deployed to the real world, due to the significant domain gap between the simulated and real data. In this paper, we propose the first Simulation-to-Reality transfer learning framework for multi-agent cooperative perception using a novel Vision Transformer, named as S2R-ViT, which considers both the Implementation Gap and Feature Gap between simulated and real data. We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Implementation Gap and an agent-based feature adaptation module with inter-agent and ego-agent discriminators to reduce the Feature Gap. Our intensive experiments on the public multi-agent cooperative perception datasets OPV2V and V2V4Real demonstrate that the proposed S2R-ViT can effectively bridge the gap from simulation to reality and outperform other methods significantly for point cloud-based 3D object detection.

* correct the complie error in Fig.5 
Viaarxiv icon