Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yifan Chen

Partition-wise Graph Filtering: A Unified Perspective Through the Lens of Graph Coarsening

May 20, 2025

Guoming Li, Jian Yang, Yifan Chen

Abstract:Filtering-based graph neural networks (GNNs) constitute a distinct class of GNNs that employ graph filters to handle graph-structured data, achieving notable success in various graph-related tasks. Conventional methods adopt a graph-wise filtering paradigm, imposing a uniform filter across all nodes, yet recent findings suggest that this rigid paradigm struggles with heterophilic graphs. To overcome this, recent works have introduced node-wise filtering, which assigns distinct filters to individual nodes, offering enhanced adaptability. However, a fundamental gap remains: a comprehensive framework unifying these two strategies is still absent, limiting theoretical insights into the filtering paradigms. Moreover, through the lens of Contextual Stochastic Block Model, we reveal that a synthesis of graph-wise and node-wise filtering provides a sufficient solution for classification on graphs exhibiting both homophily and heterophily, suggesting the risk of excessive parameterization and potential overfitting with node-wise filtering. To address the limitations, this paper introduces Coarsening-guided Partition-wise Filtering (CPF). CPF innovates by performing filtering on node partitions. The method begins with structure-aware partition-wise filtering, which filters node partitions obtained via graph coarsening algorithms, and then performs feature-aware partition-wise filtering, refining node embeddings via filtering on clusters produced by $k$-means clustering over features. In-depth analysis is conducted for each phase of CPF, showing its superiority over other paradigms. Finally, benchmark node classification experiments, along with a real-world graph anomaly detection application, validate CPF's efficacy and practical utility.

* Accepted at the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 February Cycle

Via

Access Paper or Ask Questions

New affine invariant ensemble samplers and their dimensional scaling

May 05, 2025

Yifan Chen

Abstract:We introduce some new affine invariant ensemble samplers that are easy to construct and improve upon existing widely used algorithms, especially for high-dimensional problems. Specifically, we propose a derivative-free ensemble side move sampler that performs favorably compared to popular samplers in the \texttt{emcee} package. Additionally, we develop a class of derivative-based ensemble Hamiltonian Monte Carlo (HMC) samplers with affine invariance, which outperform standard HMC without affine invariance when sampling highly skewed distributions. We provide asymptotic scaling analysis for high-dimensional Gaussian targets to further elucidate the properties of these affine invariant ensemble samplers. In particular, with derivative information, the affine invariant ensemble HMC can scale much better with dimension compared to derivative-free ensemble samplers.

* Any feedback welcome!

Via

Access Paper or Ask Questions

Embedding Empirical Distributions for Computing Optimal Transport Maps

Apr 24, 2025

Mingchen Jiang, Peng Xu, Xichen Ye, Xiaohui Chen, Yun Yang, Yifan Chen

Abstract:Distributional data have become increasingly prominent in modern signal processing, highlighting the necessity of computing optimal transport (OT) maps across multiple probability distributions. Nevertheless, recent studies on neural OT methods predominantly focused on the efficient computation of a single map between two distributions. To address this challenge, we introduce a novel approach to learning transport maps for new empirical distributions. Specifically, we employ the transformer architecture to produce embeddings from distributional data of varying length; these embeddings are then fed into a hypernetwork to generate neural OT maps. Various numerical experiments were conducted to validate the embeddings and the generated OT maps. The model implementation and the code are provided on https://github.com/jiangmingchen/HOTET.

Via

Access Paper or Ask Questions

ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration

Mar 10, 2025

Mengting Ai, Tianxin Wei, Yifan Chen, Zhichen Zeng, Ritchie Zhao, Girish Varatkar, Bita Darvish Rouhani, Xianfeng Tang, Hanghang Tong, Jingrui He

Abstract:Mixture-of-Experts (MoE) Transformer, the backbone architecture of multiple phenomenal language models, leverages sparsity by activating only a fraction of model parameters for each input token. The sparse structure, while allowing constant time costs, results in space inefficiency: we still need to load all the model parameters during inference. We introduce ResMoE, an innovative MoE approximation framework that utilizes Wasserstein barycenter to extract a common expert (barycenter expert) and approximate the residuals between this barycenter expert and the original ones. ResMoE enhances the space efficiency for inference of large-scale MoE Transformers in a one-shot and data-agnostic manner without retraining while maintaining minimal accuracy loss, thereby paving the way for broader accessibility to large language models. We demonstrate the effectiveness of ResMoE through extensive experiments on Switch Transformer, Mixtral, and DeepSeekMoE models. The results show that ResMoE can reduce the number of parameters in an expert by up to 75% while maintaining comparable performance. The code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/ResMoE.

* KDD 2025

Via

Access Paper or Ask Questions

LexRAG: Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation

Feb 28, 2025

Haitao Li, Yifan Chen, Yiran Hu, Qingyao Ai, Junjie Chen, Xiaoyu Yang, Jianhui Yang, Yueyue Wu, Zeyang Liu, Yiqun Liu

Abstract:Retrieval-augmented generation (RAG) has proven highly effective in improving large language models (LLMs) across various domains. However, there is no benchmark specifically designed to assess the effectiveness of RAG in the legal domain, which restricts progress in this area. To fill this gap, we propose LexRAG, the first benchmark to evaluate RAG systems for multi-turn legal consultations. LexRAG consists of 1,013 multi-turn dialogue samples and 17,228 candidate legal articles. Each sample is annotated by legal experts and consists of five rounds of progressive questioning. LexRAG includes two key tasks: (1) Conversational knowledge retrieval, requiring accurate retrieval of relevant legal articles based on multi-turn context. (2) Response generation, focusing on producing legally sound answers. To ensure reliable reproducibility, we develop LexiT, a legal RAG toolkit that provides a comprehensive implementation of RAG system components tailored for the legal domain. Additionally, we introduce an LLM-as-a-judge evaluation pipeline to enable detailed and effective assessment. Through experimental analysis of various LLMs and retrieval methods, we reveal the key limitations of existing RAG systems in handling legal consultation conversations. LexRAG establishes a new benchmark for the practical application of RAG systems in the legal domain, with its code and data available at https://github.com/CSHaitao/LexRAG.

* 10 pages

Via

Access Paper or Ask Questions

CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation

Feb 25, 2025

Haitao Li, Jiaying Ye, Yiran Hu, Jia Chen, Qingyao Ai, Yueyue Wu, Junjie Chen, Yifan Chen, Cheng Luo, Quan Zhou(+1 more)

Figure 1 for CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation

Figure 2 for CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation

Figure 3 for CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation

Figure 4 for CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation

Abstract:Legal case documents play a critical role in judicial proceedings. As the number of cases continues to rise, the reliance on manual drafting of legal case documents is facing increasing pressure and challenges. The development of large language models (LLMs) offers a promising solution for automating document generation. However, existing benchmarks fail to fully capture the complexities involved in drafting legal case documents in real-world scenarios. To address this gap, we introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain. CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections. It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results. To the best of our knowledge, CaseGen is the first benchmark designed to evaluate LLMs in the context of legal case document generation. To ensure an accurate and comprehensive evaluation, we design the LLM-as-a-judge evaluation framework and validate its effectiveness through human annotations. We evaluate several widely used general-domain LLMs and legal-specific LLMs, highlighting their limitations in case document generation and pinpointing areas for potential improvement. This work marks a step toward a more effective framework for automating legal case documents drafting, paving the way for the reliable application of AI in the legal field. The dataset and code are publicly available at https://github.com/CSHaitao/CaseGen.

* 18 pages

Via

Access Paper or Ask Questions

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Feb 20, 2025

M-A-P Team, Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, Kang Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin(+85 more)

Abstract:Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

Via

Access Paper or Ask Questions

Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems

Jan 08, 2025

Baojun Che, Yifan Chen, Zhenghao Huan, Daniel Zhengyu Huang, Weijie Wang

Figure 1 for Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems

Figure 2 for Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems

Figure 3 for Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems

Figure 4 for Stable Derivative Free Gaussian Mixture Variational Inference for Bayesian Inverse Problems

Abstract:This paper is concerned with the approximation of probability distributions known up to normalization constants, with a focus on Bayesian inference for large-scale inverse problems in scientific computing. In this context, key challenges include costly repeated evaluations of forward models, multimodality, and inaccessible gradients for the forward model. To address them, we develop a variational inference framework that combines Fisher-Rao natural gradient with specialized quadrature rules to enable derivative free updates of Gaussian mixture variational families. The resulting method, termed Derivative Free Gaussian Mixture Variational Inference (DF-GMVI), guarantees covariance positivity and affine invariance, offering a stable and efficient framework for approximating complex posterior distributions. The effectiveness of DF-GMVI is demonstrated through numerical experiments on challenging scenarios, including distributions with multiple modes, infinitely many modes, and curved modes in spaces with up to hundreds of dimensions. The method's practicality is further demonstrated in a large-scale application, where it successfully recovers the initial conditions of the Navier-Stokes equations from solution data at positive times.

* 25 pages, 10 figures

Via

Access Paper or Ask Questions

Catch Causal Signals from Edges for Label Imbalance in Graph Classification

Jan 07, 2025

Fengrui Zhang, Yujia Yin, Hongzong Li, Yifan Chen, Tianyi Qu

Figure 1 for Catch Causal Signals from Edges for Label Imbalance in Graph Classification

Figure 2 for Catch Causal Signals from Edges for Label Imbalance in Graph Classification

Figure 3 for Catch Causal Signals from Edges for Label Imbalance in Graph Classification

Figure 4 for Catch Causal Signals from Edges for Label Imbalance in Graph Classification

Abstract:Despite significant advancements in causal research on graphs and its application to cracking label imbalance, the role of edge features in detecting the causal effects within graphs has been largely overlooked, leaving existing methods with untapped potential for further performance gains. In this paper, we enhance the causal attention mechanism through effectively leveraging edge information to disentangle the causal subgraph from the original graph, as well as further utilizing edge features to reshape graph representations. Capturing more comprehensive causal signals, our design leads to improved performance on graph classification tasks with label imbalance issues. We evaluate our approach on real-word datasets PTC, Tox21, and ogbg-molhiv, observing improvements over baselines. Overall, we highlight the importance of edge features in graph causal detection and provide a promising direction for addressing label imbalance challenges in graph-level tasks. The model implementation details and the codes are available on https://github.com/fengrui-z/ECAL

* ICASSP 2025

Via

Access Paper or Ask Questions

LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Jan 05, 2025

Jiaping Wang, Simiao Zhang, Qiao-Chu He, Yifan Chen

Figure 1 for LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Figure 2 for LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Figure 3 for LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Figure 4 for LeetDecoding: A PyTorch Library for Exponentially Decaying Causal Linear Attention with CUDA Implementations

Abstract:The machine learning and data science community has made significant while dispersive progress in accelerating transformer-based large language models (LLMs), and one promising approach is to replace the original causal attention in a generative pre-trained transformer (GPT) with \emph{exponentially decaying causal linear attention}. In this paper, we present LeetDecoding, which is the first Python package that provides a large set of computation routines for this fundamental operator. The launch of LeetDecoding was motivated by the current lack of (1) clear understanding of the complexity regarding this operator, (2) a comprehensive collection of existing computation methods (usually spread in seemingly unrelated fields), and (3) CUDA implementations for fast inference on GPU. LeetDecoding's design is easy to integrate with existing linear-attention LLMs, and allows for researchers to benchmark and evaluate new computation methods for exponentially decaying causal linear attention. The usage of LeetDecoding does not require any knowledge of GPU programming and the underlying complexity analysis, intentionally making LeetDecoding accessible to LLM practitioners. The source code of LeetDecoding is provided at \href{https://github.com/Computational-Machine-Intelligence/LeetDecoding}{this GitHub repository}, and users can simply install LeetDecoding by the command \texttt{pip install leet-decoding}.

* The source code of LeetDecoding is hosted at https://github.com/Computational-Machine-Intelligence/LeetDecoding

Via

Access Paper or Ask Questions