Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jing Liu

Perry

Two Layer Walk: A Community-Aware Graph Embedding

Dec 18, 2024

He Yu, Jing Liu

Figure 1 for Two Layer Walk: A Community-Aware Graph Embedding

Figure 2 for Two Layer Walk: A Community-Aware Graph Embedding

Figure 3 for Two Layer Walk: A Community-Aware Graph Embedding

Figure 4 for Two Layer Walk: A Community-Aware Graph Embedding

Abstract:Community structures are critical for understanding the mesoscopic organization of networks, bridging local and global patterns. While methods such as DeepWalk and node2vec capture local positional information through random walks, they fail to preserve community structures. Other approaches like modularized nonnegative matrix factorization and evolutionary algorithms address this gap but are computationally expensive and unsuitable for large-scale networks. To overcome these limitations, we propose Two Layer Walk (TLWalk), a novel graph embedding algorithm that incorporates hierarchical community structures. TLWalk balances intra- and inter-community relationships through a community-aware random walk mechanism without requiring additional parameters. Theoretical analysis demonstrates that TLWalk effectively mitigates locality bias. Experiments on benchmark datasets show that TLWalk outperforms state-of-the-art methods, achieving up to 3.2% accuracy gains for link prediction tasks. By encoding dense local and sparse global structures, TLWalk proves robust and scalable across diverse networks, offering an efficient solution for network analysis.

Via

Access Paper or Ask Questions

Numerical Pruning for Efficient Autoregressive Models

Dec 17, 2024

Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen(+5 more)

Figure 1 for Numerical Pruning for Efficient Autoregressive Models

Figure 2 for Numerical Pruning for Efficient Autoregressive Models

Figure 3 for Numerical Pruning for Efficient Autoregressive Models

Figure 4 for Numerical Pruning for Efficient Autoregressive Models

Abstract:Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high computational costs due to their substantial model size. This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning to improve the model efficiency while preserving performance for both language and image generation tasks. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and MLP modules, respectively. Besides, we further propose another compensation algorithm to recover the pruned model for better performance. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks

Dec 17, 2024

Shibing Mo, Kai Wu, Qixuan Gao, Xiangyi Teng, Jing Liu

Figure 1 for AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks

Figure 2 for AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks

Figure 3 for AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks

Figure 4 for AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks

Abstract:In real-world applications, spectral Graph Neural Networks (GNNs) are powerful tools for processing diverse types of graphs. However, a single GNN often struggles to handle different graph types-such as homogeneous and heterogeneous graphs-simultaneously. This challenge has led to the manual design of GNNs tailored to specific graph types, but these approaches are limited by the high cost of labor and the constraints of expert knowledge, which cannot keep up with the rapid growth of graph data. To overcome these challenges, we propose AutoSGNN, an automated framework for discovering propagation mechanisms in spectral GNNs. AutoSGNN unifies the search space for spectral GNNs by integrating large language models with evolutionary strategies to automatically generate architectures that adapt to various graph types. Extensive experiments on nine widely-used datasets, encompassing both homophilic and heterophilic graphs, demonstrate that AutoSGNN outperforms state-of-the-art spectral GNNs and graph neural architecture search methods in both performance and efficiency.

Via

Access Paper or Ask Questions

TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

Dec 17, 2024

Gangqiang Hu, Jianfeng Lu, Jianmin Han, Shuqin Cao, Jing Liu, Hao Fu

Figure 1 for TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

Figure 2 for TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

Figure 3 for TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

Abstract:Due to the sensitivity of data, federated learning (FL) is employed to enable distributed machine learning while safeguarding data privacy and accommodating the requirements of various devices. However, in the context of semi-decentralized federated learning (SD-FL), clients' communication and training states are dynamic. This variability arises from local training fluctuations, heterogeneous data distributions, and intermittent client participation. Most existing studies primarily focus on stable client states, neglecting the dynamic challenges present in real-world scenarios. To tackle this issue, we propose a trust-aware client scheduling mechanism (TRAIL) that assesses client states and contributions, enhancing model training efficiency through selective client participation. Our focus is on a semi-decentralized federated learning framework where edge servers and clients train a shared global model using unreliable intra-cluster model aggregation and inter-cluster model consensus. First, we develop an adaptive hidden semi-Markov model (AHSMM) to estimate clients' communication states and contributions. Next, we address a client-server association optimization problem to minimize global training loss. Using convergence analysis, we propose a greedy client scheduling algorithm. Finally, our experiments conducted on real-world datasets demonstrate that TRAIL outperforms state-of-the-art baselines, achieving an improvement of 8.7\% in test accuracy and a reduction of 15.3\% in training loss.

Via

Access Paper or Ask Questions

Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Dec 13, 2024

Jing Liu, Abdellah Fourtassi

Figure 1 for Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Figure 2 for Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Figure 3 for Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction

Abstract:LLMs can generate human-like dialogues, yet their ability to simulate early child-adult interactions remains largely unexplored. In this paper, we examined how effectively LLMs can capture the distinctive features of child-caregiver language in interaction, using both static and interactive benchmarking methods. We found that state-of-the-art LLMs like Llama 3 and GPT-4o can approximate child-caregiver dialogues at the word and utterance level, but they struggle to reproduce the child and caregiver's discursive patterns, exaggerate alignment, and fail to reach the level of diversity shown by humans. The broader goal of this work is to initiate the development of a comprehensive benchmark for LLMs in child-oriented applications.

Via

Access Paper or Ask Questions

Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark

Dec 13, 2024

Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Bingwen Zhu, Jixuan Xu(+3 more)

Figure 1 for Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark

Figure 2 for Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark

Figure 3 for Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark

Figure 4 for Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark

Abstract:Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerated motions. In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation dataset. Supported by the data processing pipeline with over 10M high-quality data, the generation model incorporates a spatiotemporal mask module to facilitate key animation production functions such as image-to-video generation, frame interpolation, and localized image-guided animation. We also collect an evaluation benchmark of 948 various animation videos, the evaluation on VBench and human double-blind test demonstrates consistency in character and motion, achieving state-of-the-art results in animation video generation. %We also collect an evaluation benchmark of 948 various animation videos, with specifically developed metrics for animation video generation. Our model access API and evaluation benchmark will be publicly available.

Via

Access Paper or Ask Questions

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

Dec 05, 2024

Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya Rastrow, Najim Dehak, Jesus Villalba

Abstract:We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces the reliance on input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model's capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs linear modulation to dynamically adjust internal representations, enabling fine-grained adaptability without significantly altering the original model behavior. Experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting, and excels in under-resourced and unseen tasks. Specifically, CA-SSLR achieves a 10% relative reduction in LID errors, a 37% improvement in ASR CER on the ML-SUPERB benchmark, and a 27% decrease in SV EER on VoxCeleb-1, demonstrating its effectiveness.

* 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Via

Access Paper or Ask Questions

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Nov 26, 2024

Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Linjie Luo, Bo Yuan

Figure 1 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Figure 2 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Figure 3 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Figure 4 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Abstract:Training large-scale neural networks in vision, and multimodal domains demands substantial memory resources, primarily due to the storage of optimizer states. While LoRA, a popular parameter-efficient method, reduces memory usage, it often suffers from suboptimal performance due to the constraints of low-rank updates. Low-rank gradient projection methods (e.g., GaLore, Flora) reduce optimizer memory by projecting gradients and moment estimates into low-rank spaces via singular value decomposition or random projection. However, they fail to account for inter-projection correlation, causing performance degradation, and their projection strategies often incur high computational costs. In this paper, we present COAP (Correlation-Aware Gradient Projection), a memory-efficient method that minimizes computational overhead while maintaining training performance. Evaluated across various vision, language, and multimodal tasks, COAP outperforms existing methods in both training speed and model performance. For LLaMA-1B, it reduces optimizer memory by 61% with only 2% additional time cost, achieving the same PPL as AdamW. With 8-bit quantization, COAP cuts optimizer memory by 81% and achieves 4x speedup over GaLore for LLaVA-v1.5-7B fine-tuning, while delivering higher accuracy.

Via

Access Paper or Ask Questions

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

Nov 22, 2024

Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu

Abstract:As multimodal large language models (MLLMs) advance rapidly, rigorous evaluation has become essential, providing further guidance for their development. In this work, we focus on a unified and robust evaluation of \textbf{vision perception} abilities, the foundational skill of MLLMs. We find that existing perception benchmarks, each focusing on different question types, domains, and evaluation metrics, introduce significant evaluation variance, complicating comprehensive assessments of perception abilities when relying on any single benchmark. To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics. With the assistance of AbilityLens, we: (1) identify the strengths and weaknesses of current models, highlighting stability patterns and revealing a notable performance gap between open-source and closed-source models; (2) introduce an online evaluation mode, which uncovers interesting ability conflict and early convergence phenomena during MLLM training; and (3) design a simple ability-specific model merging method that combines the best ability checkpoint from early training stages, effectively mitigating performance decline due to ability conflict. The benchmark and online leaderboard will be released soon.

Via

Access Paper or Ask Questions

Privacy-Preserving Video Anomaly Detection: A Survey

Nov 21, 2024

Jing Liu, Yang Liu, Xiaoguang Zhu

Figure 1 for Privacy-Preserving Video Anomaly Detection: A Survey

Figure 2 for Privacy-Preserving Video Anomaly Detection: A Survey

Figure 3 for Privacy-Preserving Video Anomaly Detection: A Survey

Figure 4 for Privacy-Preserving Video Anomaly Detection: A Survey

Abstract:Video Anomaly Detection (VAD) aims to automatically analyze spatiotemporal patterns in surveillance videos collected from open spaces to detect anomalous events that may cause harm without physical contact. However, vision-based surveillance systems such as closed-circuit television often capture personally identifiable information. The lack of transparency and interpretability in video transmission and usage raises public concerns about privacy and ethics, limiting the real-world application of VAD. Recently, researchers have focused on privacy concerns in VAD by conducting systematic studies from various perspectives including data, features, and systems, making Privacy-Preserving Video Anomaly Detection (P2VAD) a hotspot in the AI community. However, current research in P2VAD is fragmented, and prior reviews have mostly focused on methods using RGB sequences, overlooking privacy leakage and appearance bias considerations. To address this gap, this article systematically reviews the progress of P2VAD for the first time, defining its scope and providing an intuitive taxonomy. We outline the basic assumptions, learning frameworks, and optimization objectives of various approaches, analyzing their strengths, weaknesses, and potential correlations. Additionally, we provide open access to research resources such as benchmark datasets and available code. Finally, we discuss key challenges and future opportunities from the perspectives of AI development and P2VAD deployment, aiming to guide future work in the field.

* 19 pages, 6 figures

Via

Access Paper or Ask Questions