Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ming Gong

Multi-Level Service Performance Forecasting via Spatiotemporal Graph Neural Networks

Aug 09, 2025

Zhihao Xue, Yun Zi, Nia Qi, Ming Gong, Yujun Zou

Abstract:This paper proposes a spatiotemporal graph neural network-based performance prediction algorithm to address the challenge of forecasting performance fluctuations in distributed backend systems with multi-level service call structures. The method abstracts system states at different time slices into a sequence of graph structures. It integrates the runtime features of service nodes with the invocation relationships among services to construct a unified spatiotemporal modeling framework. The model first applies a graph convolutional network to extract high-order dependency information from the service topology. Then it uses a gated recurrent network to capture the dynamic evolution of performance metrics over time. A time encoding mechanism is also introduced to enhance the model's ability to represent non-stationary temporal sequences. The architecture is trained in an end-to-end manner, optimizing the multi-layer nested structure to achieve high-precision regression of future service performance metrics. To validate the effectiveness of the proposed method, a large-scale public cluster dataset is used. A series of multi-dimensional experiments are designed, including variations in time windows and concurrent load levels. These experiments comprehensively evaluate the model's predictive performance and stability. The experimental results show that the proposed model outperforms existing representative methods across key metrics such as MAE, RMSE, and R2. It maintains strong robustness under varying load intensities and structural complexities. These results demonstrate the model's practical potential for backend service performance management tasks.

Via

Access Paper or Ask Questions

Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data

Mar 07, 2025

Shiping Yang, Jie Wu, Wenbiao Ding, Ning Wu, Shining Liang, Ming Gong, Hengyuan Zhang, Dongmei Zhang

Abstract:Robustness has become a critical attribute for the deployment of RAG systems in real-world applications. Existing research focuses on robustness to explicit noise (e.g., document semantics) but overlooks spurious features (a.k.a. implicit noise). While previous works have explored spurious features in LLMs, they are limited to specific features (e.g., formats) and narrow scenarios (e.g., ICL). In this work, we statistically confirm the presence of spurious features in the RAG paradigm, a robustness problem caused by the sensitivity of LLMs to semantic-agnostic features. Moreover, we provide a comprehensive taxonomy of spurious features and empirically quantify their impact through controlled experiments. Further analysis reveals that not all spurious features are harmful and they can even be beneficial sometimes. Extensive evaluation results across multiple LLMs suggest that spurious features are a widespread and challenging problem in the field of RAG. The code and dataset will be released to facilitate future research. We release all codes and data at: $\\\href{https://github.com/maybenotime/RAG-SpuriousFeatures}{https://github.com/maybenotime/RAG-SpuriousFeatures}$.

Via

Access Paper or Ask Questions

MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

Feb 19, 2025

Weihao Liu, Ning Wu, Shiping Yang, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang

Abstract:Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input, which severely impairs their long-context capabilities. Inspired by recent studies on the effectiveness of retrieval heads in long-context factutality, we aim at addressing this distraction issue through improving such retrieval heads directly. We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimizes the attention distribution at the head level through contrastive learning. According to the experimental results, MuDAF can significantly improve the long-context question answering performance of LLMs, especially in multi-document question answering. Extensive evaluations on retrieval scores and attention visualizations show that MuDAF possesses great potential in making attention heads more focused on relevant information and reducing attention distractions.

* 18 pages

Via

Access Paper or Ask Questions

AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Aug 21, 2024

Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, Wangmeng Zuo, Nan Duan

Figure 1 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Figure 2 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Figure 3 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Figure 4 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Abstract:With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism. However, the approach to generate multi-sensory outputs has not been fully explored, limiting the application on high-value scenarios such as of directing a film. Developing a movie director agent faces two major challenges: (1) Lack of parallelism and online scheduling with production steps: In the production of multi-sensory films, there are complex dependencies between different sensory elements, and the production time for each element varies. (2) Diverse needs and clear communication demands with users: Users often cannot clearly express their needs until they see a draft, which requires human-computer interaction and iteration to continually adjust and optimize the film content based on user feedback. To address these issues, we introduce AutoDirector, an interactive multi-sensory composition framework that supports long shots, special effects, music scoring, dubbing, and lip-syncing. This framework improves the efficiency of multi-sensory film production through automatic scheduling and supports the modification and improvement of interactive tasks to meet user needs. AutoDirector not only expands the application scope of human-machine collaboration but also demonstrates the potential of AI in collaborating with humans in the role of a film director to complete multi-sensory films.

Via

Access Paper or Ask Questions

Towards Truthful Multilingual Large Language Models: Benchmarking and Alignment Strategies

Jun 20, 2024

Weihao Liu, Ning Wu, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang

Abstract:In the era of large language models (LLMs), building multilingual large language models (MLLMs) that can serve users worldwide holds great significance. However, existing research seldom focuses on the truthfulness of MLLMs. Meanwhile, contemporary multilingual aligning technologies struggle to balance massive languages and often exhibit serious truthfulness gaps across different languages, especially those that differ greatly from English. In our work, we construct a benchmark for truthfulness evaluation in multilingual scenarios and explore the ways to align facts across languages to enhance the truthfulness of MLLMs. Furthermore, we propose Fact-aware Multilingual Selective Synergy (FaMSS) to optimize the data allocation across a large number of languages and different data types. Experimental results demonstrate that our approach can effectively reduce the multilingual representation disparity and enhance the multilingual capabilities of LLMs.

* 15 pages

Via

Access Paper or Ask Questions

Step-Back Profiling: Distilling User History for Personalized Scientific Writing

Jun 20, 2024

Xiangru Tang, Xingyao Zhang, Yanjun Shao, Jie Wu, Yilun Zhao, Arman Cohan, Ming Gong, Dongmei Zhang, Mark Gerstein

Abstract:Large language models (LLMs) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals, particularly in real-world scenarios like scientific writing. Addressing this challenge, we introduce Step-Back Profiling to personalize LLMs by distilling user history into concise profiles, including essential traits and preferences of users. Regarding our experiments, we construct a Personalized Scientific Writing (PSW) dataset to study multiuser personalization. PSW requires the models to write scientific papers given specialized author groups with diverse academic backgrounds. As for the results, we demonstrate the effectiveness of capturing user characteristics via Step-Back Profiling for collaborative writing. Moreover, our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark (LaMP), including 7 personalization LLM tasks. Our extensive ablation studies validate the contributions of different components in our method and provide insights into our task definition. Our dataset and code are available at \url{https://github.com/gersteinlab/step-back-profiling}.

Via

Access Paper or Ask Questions

TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from images

Mar 26, 2024

Liping Yang, Joshua Driscol, Ming Gong, Shujie Wang, Catherine G. Potts

Figure 1 for TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from images

Figure 2 for TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from images

Figure 3 for TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from images

Figure 4 for TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from images

Abstract:Line detection is a classic and essential problem in image processing, computer vision and machine intelligence. Line detection has many important applications, including image vectorization (e.g., document recognition and art design), indoor mapping, and important societal challenges (e.g., sea ice fracture line extraction from satellite imagery). Many line detection algorithms and methods have been developed, but robust and intuitive methods are still lacking. In this paper, we proposed and implemented a topological graph-guided algorithm, named TGGLinesPlus, for line detection. Our experiments on images from a wide range of domains have demonstrated the flexibility of our TGGLinesPlus algorithm. We also benchmarked our algorithm with five classic and state-of-the-art line detection methods and the results demonstrate the robustness of TGGLinesPlus. We hope our open-source implementation of TGGLinesPlus will inspire and pave the way for many applications where spatial science matters.

* Our TGGLinesPlus Python implementation is open source. 27 pages, 8 figures and 4 tables

Via

Access Paper or Ask Questions

Hypertext Entity Extraction in Webpage

Mar 04, 2024

Yifei Yang, Tianqiao Liu, Bo Shao, Hai Zhao, Linjun Shou, Ming Gong, Daxin Jiang

Abstract:Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual content and its structure information. However, existing datasets all overlook the rich hypertext features (e.g., font color, font size) which show their effectiveness in previous works. To this end, we first collect a \textbf{H}ypertext \textbf{E}ntity \textbf{E}xtraction \textbf{D}ataset (\textit{HEED}) from the e-commerce domains, scraping both the text and the corresponding explicit hypertext features with high-quality manual entity annotations. Furthermore, we present the \textbf{Mo}E-based \textbf{E}ntity \textbf{E}xtraction \textbf{F}ramework (\textit{MoEEF}), which efficiently integrates multiple features to enhance model performance by Mixture of Experts and outperforms strong baselines, including the state-of-the-art small-scale models and GPT-3.5-turbo. Moreover, the effectiveness of hypertext features in \textit{HEED} and several model components in \textit{MoEEF} are analyzed.

Via

Access Paper or Ask Questions

Grounding Language Models for Visual Entity Recognition

Feb 28, 2024

Zilin Xiao, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez

Figure 1 for Grounding Language Models for Visual Entity Recognition

Figure 2 for Grounding Language Models for Visual Entity Recognition

Figure 3 for Grounding Language Models for Visual Entity Recognition

Figure 4 for Grounding Language Models for Visual Entity Recognition

Abstract:We introduce AutoVER, an Autoregressive model for Visual Entity Recognition. Our model extends an autoregressive Multi-modal Large Language Model by employing retrieval augmented constrained generation. It mitigates low performance on out-of-domain entities while excelling in queries that require visually-situated reasoning. Our method learns to distinguish similar entities within a vast label space by contrastively training on hard negative pairs in parallel with a sequence-to-sequence objective without an external retriever. During inference, a list of retrieved candidate answers explicitly guides language generation by removing invalid decoding paths. The proposed method achieves significant improvements across different dataset splits in the recently proposed Oven-Wiki benchmark. Accuracy on the Entity seen split rises from 32.7% to 61.5%. It also demonstrates superior performance on the unseen and query splits by a substantial double-digit margin.

Via

Access Paper or Ask Questions

Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

Jan 04, 2024

Xiaoquan Li, Stephan Weiss, Yijun Yan, Yinhe Li, Jinchang Ren, John Soraghan, Ming Gong

Figure 1 for Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

Figure 2 for Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

Figure 3 for Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

Figure 4 for Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

Abstract:Understanding and identifying musical shape plays an important role in music education and performance assessment. To simplify the otherwise time- and cost-intensive musical shape evaluation, in this paper we explore how artificial intelligence (AI) driven models can be applied. Considering musical shape evaluation as a classification problem, a light-weight Siamese residual neural network (S-ResNN) is proposed to automatically identify musical shapes. To assess the proposed approach in the context of piano musical shape evaluation, we have generated a new dataset, containing 4116 music pieces derived by 147 piano preparatory exercises and performed in 28 categories of musical shapes. The experimental results show that the S-ResNN significantly outperforms a number of benchmark methods in terms of the precision, recall and F1 score.

* X.Li, S.Weiss, Y.Yan, Y.Li, J.Ren, J.Soraghan, M.Gong,"Siamese residual neural network for musical shape evaluation in piano performance assessment" in Proc. of the 31st European Signal Processing Conference, Helsinki, Finland

Via

Access Paper or Ask Questions