Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feng Zhou

Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Feb 11, 2025

Quyu Kong, Yixuan Zhang, Yang Liu, Panrong Tong, Enqi Liu, Feng Zhou

Figure 1 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Figure 2 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Figure 3 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Figure 4 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Abstract:Temporal Point Processes (TPPs) have been widely used for event sequence modeling, but they often struggle to incorporate rich textual event descriptions effectively. Conversely, while Large Language Models (LLMs) have been shown remarkable capabilities in processing textual data, they lack mechanisms for handling temporal dynamics. To bridge this gap, we introduce Language-TPP, a unified framework that integrates TPPs with LLMs for enhanced event sequence modeling. Language-TPP introduces a novel temporal encoding mechanism that converts continuous time intervals into specialized byte-tokens, enabling seamless integration with standard LLM architectures. This approach allows Language-TPP to achieve state-of-the-art performance across multiple TPP tasks, including event time prediction, type prediction, and intensity estimation, on five datasets. Additionally, we demonstrate that incorporating temporal information significantly improves the quality of generated event descriptions.

Via

Access Paper or Ask Questions

Advances in Temporal Point Processes: Bayesian, Deep, and LLM Approaches

Jan 24, 2025

Feng Zhou, Quyu Kong, Yixuan Zhang

Abstract:Temporal point processes (TPPs) are stochastic process models used to characterize event sequences occurring in continuous time. Traditional statistical TPPs have a long-standing history, with numerous models proposed and successfully applied across diverse domains. In recent years, advances in deep learning have spurred the development of neural TPPs, enabling greater flexibility and expressiveness in capturing complex temporal dynamics. The emergence of large language models (LLMs) has further sparked excitement, offering new possibilities for modeling and analyzing event sequences by leveraging their rich contextual understanding. This survey presents a comprehensive review of recent research on TPPs from three perspectives: Bayesian, deep learning, and LLM approaches. We begin with a review of the fundamental concepts of TPPs, followed by an in-depth discussion of model design and parameter estimation techniques in these three frameworks. We also revisit classic application areas of TPPs to highlight their practical relevance. Finally, we outline challenges and promising directions for future research.

Via

Access Paper or Ask Questions

Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild

Jan 07, 2025

Wanpeng Hu, Haodi Liu, Lin Chen, Feng Zhou, Changming Xiao, Qi Yang, Changshui Zhang

Figure 1 for Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild

Figure 2 for Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild

Figure 3 for Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild

Figure 4 for Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild

Abstract:Complex visual reasoning remains a key challenge today. Typically, the challenge is tackled using methodologies such as Chain of Thought (COT) and visual instruction tuning. However, how to organically combine these two methodologies for greater success remains unexplored. Also, issues like hallucinations and high training cost still need to be addressed. In this work, we devise an innovative multi-round training and reasoning framework suitable for lightweight Multimodal Large Language Models (MLLMs). Our self-questioning approach heuristically guides MLLMs to focus on visual clues relevant to the target problem, reducing hallucinations and enhancing the model's ability to describe fine-grained image details. This ultimately enables the model to perform well in complex visual reasoning and question-answering tasks. We have named this framework Socratic Questioning(SQ). To facilitate future research, we create a multimodal mini-dataset named CapQA, which includes 1k images of fine-grained activities, for visual instruction tuning and evaluation, our proposed SQ method leads to a 31.2% improvement in the hallucination score. Our extensive experiments on various benchmarks demonstrate SQ's remarkable capabilities in heuristic self-questioning, zero-shot visual reasoning and hallucination mitigation. Our model and code will be publicly available.

Via

Access Paper or Ask Questions

Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA

Dec 30, 2024

Qingyun Jin, Xiaohui Song, Feng Zhou, Zengchang Qin

Abstract:Large language models have been shown to perform well on a variety of natural language processing problems. However, as the model size and the input sequence's length increase, the rapid increase of KV Cache significantly slows down inference speed. Therefore GQA model, as an alternative to MHA model, has been widely introduced into LLMs. In this work, we propose a low-cost method for pruning MHA models into GQA models with any compression ratio of key-value heads. Our method is based on $\mathit{L_0}$ masks to gradually remove redundant parameters. In addition, we apply orthogonal transformations to attention heads without changing the model to increase similarity between attention heads before pruning training, in order to further improve performance of the model. Our method can be compatible with rotary position embedding (RoPE), which means the model after training can be fully adapted to the mainstream standard GQA framework. Experiments demonstrate that our strategy can compress up to 87.5% of key-value heads of the LLaMA2-7B model without too much performance degradation, just achieved through supervised fine-tuning.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Navigating Towards Fairness with Data Selection

Dec 15, 2024

Yixuan Zhang, Zhidong Li, Yang Wang, Fang Chen, Xuhui Fan, Feng Zhou

Figure 1 for Navigating Towards Fairness with Data Selection

Figure 2 for Navigating Towards Fairness with Data Selection

Figure 3 for Navigating Towards Fairness with Data Selection

Figure 4 for Navigating Towards Fairness with Data Selection

Abstract:Machine learning algorithms often struggle to eliminate inherent data biases, particularly those arising from unreliable labels, which poses a significant challenge in ensuring fairness. Existing fairness techniques that address label bias typically involve modifying models and intervening in the training process, but these lack flexibility for large-scale datasets. To address this limitation, we introduce a data selection method designed to efficiently and flexibly mitigate label bias, tailored to more practical needs. Our approach utilizes a zero-shot predictor as a proxy model that simulates training on a clean holdout set. This strategy, supported by peer predictions, ensures the fairness of the proxy model and eliminates the need for an additional holdout set, which is a common requirement in previous methods. Without altering the classifier's architecture, our modality-agnostic method effectively selects appropriate training data and has proven efficient and effective in handling label bias and improving fairness across diverse datasets in experimental evaluations.

Via

Access Paper or Ask Questions

Task Diversity in Bayesian Federated Learning: Simultaneous Processing of Classification and Regression

Dec 14, 2024

Junliang Lyu, Yixuan Zhang, Xiaoling Lu, Feng Zhou

Figure 1 for Task Diversity in Bayesian Federated Learning: Simultaneous Processing of Classification and Regression

Figure 2 for Task Diversity in Bayesian Federated Learning: Simultaneous Processing of Classification and Regression

Figure 3 for Task Diversity in Bayesian Federated Learning: Simultaneous Processing of Classification and Regression

Figure 4 for Task Diversity in Bayesian Federated Learning: Simultaneous Processing of Classification and Regression

Abstract:This work addresses a key limitation in current federated learning approaches, which predominantly focus on homogeneous tasks, neglecting the task diversity on local devices. We propose a principled integration of multi-task learning using multi-output Gaussian processes (MOGP) at the local level and federated learning at the global level. MOGP handles correlated classification and regression tasks, offering a Bayesian non-parametric approach that naturally quantifies uncertainty. The central server aggregates the posteriors from local devices, updating a global MOGP prior redistributed for training local models until convergence. Challenges in performing posterior inference on local devices are addressed through the P\'{o}lya-Gamma augmentation technique and mean-field variational inference, enhancing computational efficiency and convergence rate. Experimental results on both synthetic and real data demonstrate superior predictive performance, OOD detection, uncertainty calibration and convergence rate, highlighting the method's potential in diverse applications. Our code is publicly available at https://github.com/JunliangLv/task_diversity_BFL.

Via

Access Paper or Ask Questions

Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Dec 12, 2024

Feng Zhou, Ruiyang Liu, Chen Liu, Gaofeng He, Yong-Lu Li, Xiaogang Jin, Huamin Wang

Figure 1 for Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Figure 2 for Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Figure 3 for Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Figure 4 for Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Abstract:Sewing patterns, the essential blueprints for fabric cutting and tailoring, act as a crucial bridge between design concepts and producible garments. However, existing uni-modal sewing pattern generation models struggle to effectively encode complex design concepts with a multi-modal nature and correlate them with vectorized sewing patterns that possess precise geometric structures and intricate sewing relations. In this work, we propose a novel sewing pattern generation approach Design2GarmentCode based on Large Multimodal Models (LMMs), to generate parametric pattern-making programs from multi-modal design concepts. LMM offers an intuitive interface for interpreting diverse design inputs, while pattern-making programs could serve as well-structured and semantically meaningful representations of sewing patterns, and act as a robust bridge connecting the cross-domain pattern-making knowledge embedded in LMMs with vectorized sewing patterns. Experimental results demonstrate that our method can flexibly handle various complex design expressions such as images, textual descriptions, designer sketches, or their combinations, and convert them into size-precise sewing patterns with correct stitches. Compared to previous methods, our approach significantly enhances training efficiency, generation quality, and authoring flexibility. Our code and data will be publicly available.

Via

Access Paper or Ask Questions

Position-aware Guided Point Cloud Completion with CLIP Model

Dec 11, 2024

Feng Zhou, Qi Zhang, Ju Dai, Lei Li, Qing Fan, Junliang Xing

Figure 1 for Position-aware Guided Point Cloud Completion with CLIP Model

Figure 2 for Position-aware Guided Point Cloud Completion with CLIP Model

Figure 3 for Position-aware Guided Point Cloud Completion with CLIP Model

Figure 4 for Position-aware Guided Point Cloud Completion with CLIP Model

Abstract:Point cloud completion aims to recover partial geometric and topological shapes caused by equipment defects or limited viewpoints. Current methods either solely rely on the 3D coordinates of the point cloud to complete it or incorporate additional images with well-calibrated intrinsic parameters to guide the geometric estimation of the missing parts. Although these methods have achieved excellent performance by directly predicting the location of complete points, the extracted features lack fine-grained information regarding the location of the missing area. To address this issue, we propose a rapid and efficient method to expand an unimodal framework into a multimodal framework. This approach incorporates a position-aware module designed to enhance the spatial information of the missing parts through a weighted map learning mechanism. In addition, we establish a Point-Text-Image triplet corpus PCI-TI and MVP-TI based on the existing unimodal point cloud completion dataset and use the pre-trained vision-language model CLIP to provide richer detail information for 3D shapes, thereby enhancing performance. Extensive quantitative and qualitative experiments demonstrate that our method outperforms state-of-the-art point cloud completion methods.

* Accepted by AAAI25

Via

Access Paper or Ask Questions

Human Motion Instruction Tuning

Nov 25, 2024

Lei Li, Sen Jia, Wang Jianhao, Zhongyu Jiang, Feng Zhou, Ju Dai, Tianfang Zhang, Wu Zongkai, Jenq-Neng Hwang

Figure 1 for Human Motion Instruction Tuning

Figure 2 for Human Motion Instruction Tuning

Figure 3 for Human Motion Instruction Tuning

Figure 4 for Human Motion Instruction Tuning

Abstract:This paper presents LLaMo (Large Language and Human Motion Assistant), a multimodal framework for human motion instruction tuning. In contrast to conventional instruction-tuning approaches that convert non-linguistic inputs, such as video or motion sequences, into language tokens, LLaMo retains motion in its native form for instruction tuning. This method preserves motion-specific details that are often diminished in tokenization, thereby improving the model's ability to interpret complex human behaviors. By processing both video and motion data alongside textual inputs, LLaMo enables a flexible, human-centric analysis. Experimental evaluations across high-complexity domains, including human behaviors and professional activities, indicate that LLaMo effectively captures domain-specific knowledge, enhancing comprehension and prediction in motion-intensive scenarios. We hope LLaMo offers a foundation for future multimodal AI systems with broad applications, from sports analytics to behavioral prediction. Our code and models are available on the project website: https://github.com/ILGLJ/LLaMo.

Via

Access Paper or Ask Questions

Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method

Nov 23, 2024

Pan Yin, Kaiyu Li, Xiangyong Cao, Jing Yao, Lei Liu, Xueru Bai, Feng Zhou, Deyu Meng

Abstract:Recently, road graph extraction has garnered increasing attention due to its crucial role in autonomous driving, navigation, etc. However, accurately and efficiently extracting road graphs remains a persistent challenge, primarily due to the severe scarcity of labeled data. To address this limitation, we collect a global-scale satellite road graph extraction dataset, i.e. Global-Scale dataset. Specifically, the Global-Scale dataset is $\sim20 \times$ larger than the largest existing public road extraction dataset and spans over 13,800 $km^2$ globally. Additionally, we develop a novel road graph extraction model, i.e. SAM-Road++, which adopts a node-guided resampling method to alleviate the mismatch issue between training and inference in SAM-Road, a pioneering state-of-the-art road graph extraction model. Furthermore, we propose a simple yet effective ``extended-line'' strategy in SAM-Road++ to mitigate the occlusion issue on the road. Extensive experiments demonstrate the validity of the collected Global-Scale dataset and the proposed SAM-Road++ method, particularly highlighting its superior predictive power in unseen regions. The dataset and code are available at \url{https://github.com/earth-insights/samroadplus}.

Via

Access Paper or Ask Questions