Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Wang

Jeff

AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Jun 04, 2024

Li Lin, Santosh, Xin Wang, Shu Hu

Figure 1 for AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Figure 2 for AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Figure 3 for AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Figure 4 for AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Abstract:AI-generated faces have enriched human life, such as entertainment, education, and art. However, they also pose misuse risks. Therefore, detecting AI-generated faces becomes crucial, yet current detectors show biased performance across different demographic groups. Mitigating biases can be done by designing algorithmic fairness methods, which usually require demographically annotated face datasets for model training. However, no existing dataset comprehensively encompasses both demographic attributes and diverse generative methods, which hinders the development of fair detectors for AI-generated faces. In this work, we introduce the AI-Face dataset, the first million-scale demographically annotated AI-generated face image dataset, including real faces, faces from deepfake videos, and faces generated by Generative Adversarial Networks and Diffusion Models. Based on this dataset, we conduct the first comprehensive fairness benchmark to assess various AI face detectors and provide valuable insights and findings to promote the future fair design of AI face detectors. Our AI-Face dataset and benchmark code are publicly available at https://github.com/Purdue-M2/AI-Face-FairnessBench.

Via

Access Paper or Ask Questions

Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access

Jun 03, 2024

Shengsong Luo, Junjie Ma, Chongbin Xu, Xin Wang

Figure 1 for Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access

Abstract:We consider the identifiability issue of maximum likelihood based activity detection in massive MIMO based grant-free random access. A prior work by Chen et al. indicates that the identifiability undergoes a phase transition for commonly-used random signatures. In this paper, we provide an analytical characterization of the boundary of the phase transition curve. Our theoretical results agree well with the numerical experiments.

Via

Access Paper or Ask Questions

Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

May 27, 2024

Xin He, Wenqi Fan, Ruobing Wang, Yili Wang, Ying Wang, Shirui Pan, Xin Wang

Figure 1 for Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

Figure 2 for Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

Figure 3 for Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

Figure 4 for Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

Abstract:Social recommendation models weave social interactions into their design to provide uniquely personalized recommendation results for users. However, social networks not only amplify the popularity bias in recommendation models, resulting in more frequent recommendation of hot items and fewer long-tail items, but also include a substantial amount of redundant information that is essentially meaningless for the model's performance. Existing social recommendation models fail to address the issues of popularity bias and the redundancy of social information, as they directly characterize social influence across the entire social network without making targeted adjustments. In this paper, we propose a Condition-Guided Social Recommendation Model (named CGSoRec) to mitigate the model's popularity bias by denoising the social network and adjusting the weights of user's social preferences. More specifically, CGSoRec first includes a Condition-Guided Social Denoising Model (CSD) to remove redundant social relations in the social network for capturing users' social preferences with items more precisely. Then, CGSoRec calculates users' social preferences based on denoised social network and adjusts the weights in users' social preferences to make them can counteract the popularity bias present in the recommendation model. At last, CGSoRec includes a Condition-Guided Diffusion Recommendation Model (CGD) to introduce the adjusted social preferences as conditions to control the recommendation results for a debiased direction. Comprehensive experiments on three real-world datasets demonstrate the effectiveness of our proposed method. The code is in: https://github.com/hexin5515/CGSoRec.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

May 27, 2024

Rui Miao, Kaixiong Zhou, Yili Wang, Ninghao Liu, Ying Wang, Xin Wang

Figure 1 for Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Figure 2 for Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Figure 3 for Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Figure 4 for Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Abstract:Graph neural networks (GNNs) have exhibited prominent performance in learning graph-structured data. Considering node classification task, based on the i.i.d assumption among node labels, the traditional supervised learning simply sums up cross-entropy losses of the independent training nodes and applies the average loss to optimize GNNs' weights. But different from other data formats, the nodes are naturally connected. It is found that the independent distribution modeling of node labels restricts GNNs' capability to generalize over the entire graph and defend adversarial attacks. In this work, we propose a new framework, termed joint-cluster supervised learning, to model the joint distribution of each node with its corresponding cluster. We learn the joint distribution of node and cluster labels conditioned on their representations, and train GNNs with the obtained joint loss. In this way, the data-label reference signals extracted from the local cluster explicitly strengthen the discrimination ability on the target node. The extensive experiments demonstrate that our joint-cluster supervised learning can effectively bolster GNNs' node classification accuracy. Furthermore, being benefited from the reference signals which may be free from spiteful interference, our learning paradigm significantly protects the node classification from being affected by the adversarial attack.

* ICML 2024
* 20 pages, 4 figures

Via

Access Paper or Ask Questions

Augmenting Textual Generation via Topology Aware Retrieval

May 27, 2024

Yu Wang, Nedim Lipka, Ruiyi Zhang, Alexa Siu, Yuying Zhao, Bo Ni, Xin Wang, Ryan Rossi, Tyler Derr

Figure 1 for Augmenting Textual Generation via Topology Aware Retrieval

Figure 2 for Augmenting Textual Generation via Topology Aware Retrieval

Figure 3 for Augmenting Textual Generation via Topology Aware Retrieval

Figure 4 for Augmenting Textual Generation via Topology Aware Retrieval

Abstract:Despite the impressive advancements of Large Language Models (LLMs) in generating text, they are often limited by the knowledge contained in the input and prone to producing inaccurate or hallucinated content. To tackle these issues, Retrieval-augmented Generation (RAG) is employed as an effective strategy to enhance the available knowledge base and anchor the responses in reality by pulling additional texts from external databases. In real-world applications, texts are often linked through entities within a graph, such as citations in academic papers or comments in social networks. This paper exploits these topological relationships to guide the retrieval process in RAG. Specifically, we explore two kinds of topological connections: proximity-based, focusing on closely connected nodes, and role-based, which looks at nodes sharing similar subgraph structures. Our empirical research confirms their relevance to text relationships, leading us to develop a Topology-aware Retrieval-augmented Generation framework. This framework includes a retrieval module that selects texts based on their topological relationships and an aggregation module that integrates these texts into prompts to stimulate LLMs for text generation. We have curated established text-attributed networks and conducted comprehensive experiments to validate the effectiveness of this framework, demonstrating its potential to enhance RAG with topological awareness.

Via

Access Paper or Ask Questions

Causal-Aware Graph Neural Architecture Search under Distribution Shifts

May 26, 2024

Peiwen Li, Xin Wang, Zeyang Zhang, Yijian Qin, Ziwei Zhang, Jialong Wang, Yang Li, Wenwu Zhu

Figure 1 for Causal-Aware Graph Neural Architecture Search under Distribution Shifts

Figure 2 for Causal-Aware Graph Neural Architecture Search under Distribution Shifts

Figure 3 for Causal-Aware Graph Neural Architecture Search under Distribution Shifts

Figure 4 for Causal-Aware Graph Neural Architecture Search under Distribution Shifts

Abstract:Graph NAS has emerged as a promising approach for autonomously designing GNN architectures by leveraging the correlations between graphs and architectures. Existing methods fail to generalize under distribution shifts that are ubiquitous in real-world graph scenarios, mainly because the graph-architecture correlations they exploit might be spurious and varying across distributions. We propose to handle the distribution shifts in the graph architecture search process by discovering and exploiting the causal relationship between graphs and architectures to search for the optimal architectures that can generalize under distribution shifts. The problem remains unexplored with following challenges: how to discover the causal graph-architecture relationship that has stable predictive abilities across distributions, and how to handle distribution shifts with the discovered causal graph-architecture relationship to search the generalized graph architectures. To address these challenges, we propose Causal-aware Graph Neural Architecture Search (CARNAS), which is able to capture the causal graph-architecture relationship during the architecture search process and discover the generalized graph architecture under distribution shifts. Specifically, we propose Disentangled Causal Subgraph Identification to capture the causal subgraphs that have stable prediction abilities across distributions. Then, we propose Graph Embedding Intervention to intervene on causal subgraphs within the latent space, ensuring that these subgraphs encapsulate essential features for prediction while excluding non-causal elements. Additionally, we propose Invariant Architecture Customization to reinforce the causal invariant nature of the causal subgraphs, which are utilized to tailor generalized graph architectures. Extensive experiments demonstrate that CARNAS achieves advanced out-of-distribution generalization ability.

Via

Access Paper or Ask Questions

UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

May 25, 2024

Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching, Hongtu Zhu, Xin Wang

Figure 1 for UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

Figure 2 for UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

Figure 3 for UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

Figure 4 for UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

Abstract:Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. However, manual segmentation is labor-intensive, time-consuming, and prone to variability, necessitating automated methods. Current machine learning approaches, while promising, face challenges such as overfitting, high computational demands, and the need for extensive annotated data. To address these issues, we propose a UU-Mamba model that integrates the U-Mamba model with the Sharpness-Aware Minimization optimizer and an uncertainty-aware loss function. SAM enhances generalization by finding flat minima in the loss landscape, mitigating overfitting. The uncertainty-aware loss combines region-based, distribution-based, and pixel-based losses, improving segmentation accuracy and robustness. Our method, evaluated on the ACDC cardiac dataset, outperforms state-of-the-art models (TransUNet, Swin-Unet, nnUNet, nnFormer), achieving superior Dice Similarity Coefficient and Mean Squared Error results, demonstrating the effectiveness of our approach in cardiac MRI segmentation.

Via

Access Paper or Ask Questions

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

May 21, 2024

Hanlei Zhang, Hua Xu, Fei Long, Xin Wang, Kai Gao

Figure 1 for Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Figure 2 for Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Figure 3 for Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Figure 4 for Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Abstract:Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information for discerning complex semantics in unsupervised scenarios. This paper introduces a novel unsupervised multimodal clustering method (UMC), making a pioneering contribution to this field. UMC introduces a unique approach to constructing augmentation views for multimodal data, which are then used to perform pre-training to establish well-initialized representations for subsequent clustering. An innovative strategy is proposed to dynamically select high-quality samples as guidance for representation learning, gauged by the density of each sample's nearest neighbors. Besides, it is equipped to automatically determine the optimal value for the top-$K$ parameter in each cluster to refine sample selection. Finally, both high- and low-quality samples are used to learn representations conducive to effective clustering. We build baselines on benchmark multimodal intent and dialogue act datasets. UMC shows remarkable improvements of 2-6\% scores in clustering metrics over state-of-the-art methods, marking the first successful endeavor in this domain. The complete code and data are available at https://github.com/thuiar/UMC.

* Accepted by ACL 2024, Main Conference, Long Paper

Via

Access Paper or Ask Questions

DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control

May 21, 2024

Hong Chen, Xin Wang, Yipeng Zhang, Yuwei Zhou, Zeyang Zhang, Siao Tang, Wenwu Zhu

Figure 1 for DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control

Figure 2 for DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control

Figure 3 for DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control

Figure 4 for DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control

Abstract:Generating customized content in videos has received increasing attention recently. However, existing works primarily focus on customized text-to-video generation for single subject, suffering from subject-missing and attribute-binding problems when the video is expected to contain multiple subjects. Furthermore, existing models struggle to assign the desired actions to the corresponding subjects (action-binding problem), failing to achieve satisfactory multi-subject generation performance. To tackle the problems, in this paper, we propose DisenStudio, a novel framework that can generate text-guided videos for customized multiple subjects, given few images for each subject. Specifically, DisenStudio enhances a pretrained diffusion-based text-to-video model with our proposed spatial-disentangled cross-attention mechanism to associate each subject with the desired action. Then the model is customized for the multiple subjects with the proposed motion-preserved disentangled finetuning, which involves three tuning strategies: multi-subject co-occurrence tuning, masked single-subject tuning, and multi-subject motion-preserved tuning. The first two strategies guarantee the subject occurrence and preserve their visual attributes, and the third strategy helps the model maintain the temporal motion-generation ability when finetuning on static images. We conduct extensive experiments to demonstrate our proposed DisenStudio significantly outperforms existing methods in various metrics. Additionally, we show that DisenStudio can be used as a powerful tool for various controllable generation applications.

Via

Access Paper or Ask Questions

EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling

May 20, 2024

Mengqi Lei, Xin Wang

Figure 1 for EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling

Figure 2 for EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling

Figure 3 for EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling

Figure 4 for EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling

Abstract:Accurate segmentation of polyps in colonoscopy images is essential for early-stage diagnosis and management of colorectal cancer. Despite advancements in deep learning for polyp segmentation, enduring limitations persist. The edges of polyps are typically ambiguous, making them difficult to discern from the background, and the model performance is often compromised by the influence of irrelevant or unimportant features. To alleviate these challenges, we propose a novel model named Edge-Prioritized Polyp Segmentation (EPPS). Specifically, we incorporate an Edge Mapping Engine (EME) aimed at accurately extracting the edges of polyps. Subsequently, an Edge Information Injector (EII) is devised to augment the mask prediction by injecting the captured edge information into Decoder blocks. Furthermore, we introduce a component called Selective Feature Decoupler (SFD) to suppress the influence of noise and extraneous features on the model. Extensive experiments on 3 widely used polyp segmentation benchmarks demonstrate the superior performance of our method compared with other state-of-the-art approaches.

Via

Access Paper or Ask Questions