Alert button
Picture for Xiaoming Shi

Xiaoming Shi

Alert button

Prompt-augmented Temporal Point Process for Streaming Event Sequence

Oct 13, 2023
Siqiao Xue, Yan Wang, Zhixuan Chu, Xiaoming Shi, Caigao Jiang, Hongyan Hao, Gangwei Jiang, Xiaoyun Feng, James Y. Zhang, Jun Zhou

Figure 1 for Prompt-augmented Temporal Point Process for Streaming Event Sequence
Figure 2 for Prompt-augmented Temporal Point Process for Streaming Event Sequence
Figure 3 for Prompt-augmented Temporal Point Process for Streaming Event Sequence
Figure 4 for Prompt-augmented Temporal Point Process for Streaming Event Sequence

Neural Temporal Point Processes (TPPs) are the prevalent paradigm for modeling continuous-time event sequences, such as user activities on the web and financial transactions. In real-world applications, event data is typically received in a \emph{streaming} manner, where the distribution of patterns may shift over time. Additionally, \emph{privacy and memory constraints} are commonly observed in practical scenarios, further compounding the challenges. Therefore, the continuous monitoring of a TPP to learn the streaming event sequence is an important yet under-explored problem. Our work paper addresses this challenge by adopting Continual Learning (CL), which makes the model capable of continuously learning a sequence of tasks without catastrophic forgetting under realistic constraints. Correspondingly, we propose a simple yet effective framework, PromptTPP\footnote{Our code is available at {\small \url{ https://github.com/yanyanSann/PromptTPP}}}, by integrating the base TPP with a continuous-time retrieval prompt pool. The prompts, small learnable parameters, are stored in a memory space and jointly optimized with the base TPP, ensuring that the model learns event streams sequentially without buffering past examples or task-specific attributes. We present a novel and realistic experimental setup for modeling event streams, where PromptTPP consistently achieves state-of-the-art performance across three real user behavior datasets.

* NeurIPS 2023 camera ready version 
Viaarxiv icon

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Oct 03, 2023
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, Qingsong Wen

Figure 1 for Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Figure 2 for Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Figure 3 for Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Figure 4 for Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Time series forecasting holds significant importance in many real-world dynamic systems and has been extensively studied. Unlike natural language process (NLP) and computer vision (CV), where a single large model can tackle multiple tasks, models for time series forecasting are often specialized, necessitating distinct designs for different tasks and applications. While pre-trained foundation models have made impressive strides in NLP and CV, their development in time series domains has been constrained by data sparsity. Recent studies have revealed that large language models (LLMs) possess robust pattern recognition and reasoning abilities over complex sequences of tokens. However, the challenge remains in effectively aligning the modalities of time series data and natural language to leverage these capabilities. In this work, we present Time-LLM, a reprogramming framework to repurpose LLMs for general time series forecasting with the backbone language models kept intact. We begin by reprogramming the input time series with text prototypes before feeding it into the frozen LLM to align the two modalities. To augment the LLM's ability to reason with time series data, we propose Prompt-as-Prefix (PaP), which enriches the input context and directs the transformation of reprogrammed input patches. The transformed time series patches from the LLM are finally projected to obtain the forecasts. Our comprehensive evaluations demonstrate that Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models. Moreover, Time-LLM excels in both few-shot and zero-shot learning scenarios.

Viaarxiv icon

LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

Aug 15, 2023
Xiaoming Shi, Jie Xu, Jinru Ding, Jiali Pang, Sichen Liu, Shuqing Luo, Xingwei Peng, Lu Lu, Haihong Yang, Mingtao Hu, Tong Ruan, Shaoting Zhang

Figure 1 for LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation
Figure 2 for LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation
Figure 3 for LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation
Figure 4 for LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

There is an increasing interest in developing LLMs for medical diagnosis to improve diagnosis efficiency. Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios. Besides, current evaluations heavily rely on labor-intensive interactions with LLMs to obtain diagnostic dialogues and human evaluation on the quality of diagnosis dialogue. To tackle the lack of unified and comprehensive evaluation criterion, we first initially establish an evaluation criterion, termed LLM-specific Mini-CEX to assess the diagnostic capabilities of LLMs effectively, based on original Mini-CEX. To address the labor-intensive interaction problem, we develop a patient simulator to engage in automatic conversations with LLMs, and utilize ChatGPT for evaluating diagnosis dialogues automatically. Experimental results show that the LLM-specific Mini-CEX is adequate and necessary to evaluate medical diagnosis dialogue. Besides, ChatGPT can replace manual evaluation on the metrics of humanistic qualities and provides reproducible and automated comparisons between different LLMs.

Viaarxiv icon

EasyTPP: Towards Open Benchmarking the Temporal Point Processes

Jul 16, 2023
Siqiao Xue, Xiaoming Shi, Zhixuan Chu, Yan Wang, Fan Zhou, Hongyan Hao, Caigao Jiang, Chen Pan, Yi Xu, James Y. Zhang, Qingsong Wen, Jun Zhou, Hongyuan Mei

Figure 1 for EasyTPP: Towards Open Benchmarking the Temporal Point Processes
Figure 2 for EasyTPP: Towards Open Benchmarking the Temporal Point Processes
Figure 3 for EasyTPP: Towards Open Benchmarking the Temporal Point Processes
Figure 4 for EasyTPP: Towards Open Benchmarking the Temporal Point Processes

Continuous-time event sequences play a vital role in real-world domains such as healthcare, finance, online shopping, social networks, and so on. To model such data, temporal point processes (TPPs) have emerged as the most advanced generative models, making a significant impact in both academic and application communities. Despite the emergence of many powerful models in recent years, there is still no comprehensive benchmark to evaluate them. This lack of standardization impedes researchers and practitioners from comparing methods and reproducing results, potentially slowing down progress in this field. In this paper, we present EasyTPP, which aims to establish a central benchmark for evaluating TPPs. Compared to previous work that also contributed datasets, our EasyTPP has three unique contributions to the community: (i) a comprehensive implementation of eight highly cited neural TPPs with the integration of commonly used evaluation metrics and datasets; (ii) a standardized benchmarking pipeline for a transparent and thorough comparison of different methods on different datasets; (iii) a universal framework supporting multiple ML libraries (e.g., PyTorch and TensorFlow) as well as custom implementations. Our benchmark is open-sourced: all the data and implementation can be found at this \href{https://github.com/ant-research/EasyTemporalPointProcess}{\textcolor{blue}{Github repository}}\footnote{\url{https://github.com/ant-research/EasyTemporalPointProcess}.}. We will actively maintain this benchmark and welcome contributions from other researchers and practitioners. Our benchmark will help promote reproducible research in this field, thus accelerating research progress as well as making more significant real-world impacts.

Viaarxiv icon

MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Jun 14, 2023
Xiaoming Shi, Zeming Liu, Chuan Wang, Haitao Leng, Kui Xue, Xiaofan Zhang, Shaoting Zhang

Figure 1 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation
Figure 2 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation
Figure 3 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation
Figure 4 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Most medical dialogue systems assume that patients have clear goals (medicine querying, surgical operation querying, etc.) before medical consultation. However, in many real scenarios, due to the lack of medical knowledge, it is usually difficult for patients to determine clear goals with all necessary slots. In this paper, we identify this challenge as how to construct medical consultation dialogue systems to help patients clarify their goals. To mitigate this challenge, we propose a novel task and create a human-to-human mixed-type medical consultation dialogue corpus, termed MidMed, covering five dialogue types: task-oriented dialogue for diagnosis, recommendation, knowledge-grounded dialogue, QA, and chitchat. MidMed covers four departments (otorhinolaryngology, ophthalmology, skin, and digestive system), with 8,175 dialogues. Furthermore, we build baselines on MidMed and propose an instruction-guiding medical dialogue generation framework, termed InsMed, to address this task. Experimental results show the effectiveness of InsMed.

* Accepted by ACL 2023 main conference. The first two authors contributed equally to this work 
Viaarxiv icon

Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning

May 26, 2023
Xiaoming Shi, Siqiao Xue, Kangrui Wang, Fan Zhou, James Y. Zhang, Jun Zhou, Chenhao Tan, Hongyuan Mei

Figure 1 for Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
Figure 2 for Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
Figure 3 for Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
Figure 4 for Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning

Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction accuracy of event sequence models. We design a modeling and prediction framework where a large language model performs abductive reasoning to assist an event sequence model: the event model proposes predictions on future events given the past; instructed by a few expert-annotated demonstrations, the language model learns to suggest possible causes for each proposal; a search module finds out the previous events that match the causes; a scoring function learns to examine whether the retrieved events could actually cause the proposal. Through extensive experiments on two challenging real-world datasets (Amazon Review and GDELT), we demonstrate that our framework -- thanks to the reasoning ability of language models -- could significantly outperform the state-of-the-art event sequence models.

Viaarxiv icon

MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine

May 12, 2023
Jie Xu, Lu Lu, Sen Yang, Bilin Liang, Xinwei Peng, Jiali Pang, Jinru Ding, Xiaoming Shi, Lingrui Yang, Huan Song, Kang Li, Xin Sun, Shaoting Zhang

Figure 1 for MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine
Figure 2 for MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine
Figure 3 for MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine
Figure 4 for MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine

METHODS: First, a set of evaluation criteria is designed based on a comprehensive literature review. Second, existing candidate criteria are optimized for using a Delphi method by five experts in medicine and engineering. Third, three clinical experts design a set of medical datasets to interact with LLMs. Finally, benchmarking experiments are conducted on the datasets. The responses generated by chatbots based on LLMs are recorded for blind evaluations by five licensed medical experts. RESULTS: The obtained evaluation criteria cover medical professional capabilities, social comprehensive capabilities, contextual capabilities, and computational robustness, with sixteen detailed indicators. The medical datasets include twenty-seven medical dialogues and seven case reports in Chinese. Three chatbots are evaluated, ChatGPT by OpenAI, ERNIE Bot by Baidu Inc., and Doctor PuJiang (Dr. PJ) by Shanghai Artificial Intelligence Laboratory. Experimental results show that Dr. PJ outperforms ChatGPT and ERNIE Bot in both multiple-turn medical dialogue and case report scenarios.

Viaarxiv icon

Full Scaling Automation for Sustainable Development of Green Data Centers

May 01, 2023
Shiyu Wang, Yinbo Sun, Xiaoming Shi, Shiyi Zhu, Lin-Tao Ma, James Zhang, Yifei Zheng, Jian Liu

Figure 1 for Full Scaling Automation for Sustainable Development of Green Data Centers
Figure 2 for Full Scaling Automation for Sustainable Development of Green Data Centers
Figure 3 for Full Scaling Automation for Sustainable Development of Green Data Centers
Figure 4 for Full Scaling Automation for Sustainable Development of Green Data Centers

The rapid rise in cloud computing has resulted in an alarming increase in data centers' carbon emissions, which now accounts for >3% of global greenhouse gas emissions, necessitating immediate steps to combat their mounting strain on the global climate. An important focus of this effort is to improve resource utilization in order to save electricity usage. Our proposed Full Scaling Automation (FSA) mechanism is an effective method of dynamically adapting resources to accommodate changing workloads in large-scale cloud computing clusters, enabling the clusters in data centers to maintain their desired CPU utilization target and thus improve energy efficiency. FSA harnesses the power of deep representation learning to accurately predict the future workload of each service and automatically stabilize the corresponding target CPU usage level, unlike the previous autoscaling methods, such as Autopilot or FIRM, that need to adjust computing resources with statistical models and expert knowledge. Our approach achieves significant performance improvement compared to the existing work in real-world datasets. We also deployed FSA on large-scale cloud computing clusters in industrial data centers, and according to the certification of the China Environmental United Certification Center (CEC), a reduction of 947 tons of carbon dioxide, equivalent to a saving of 1538,000 kWh of electricity, was achieved during the Double 11 shopping festival of 2022, marking a critical step for our company's strategic goal towards carbon neutrality by 2030.

Viaarxiv icon

A Graph Regularized Point Process Model For Event Propagation Sequence

Nov 21, 2022
Siqiao Xue, Xiaoming Shi, Hongyan Hao, Lintao Ma, Shiyu Wang, Shijun Wang, James Zhang

Figure 1 for A Graph Regularized Point Process Model For Event Propagation Sequence
Figure 2 for A Graph Regularized Point Process Model For Event Propagation Sequence
Figure 3 for A Graph Regularized Point Process Model For Event Propagation Sequence
Figure 4 for A Graph Regularized Point Process Model For Event Propagation Sequence

Point process is the dominant paradigm for modeling event sequences occurring at irregular intervals. In this paper we aim at modeling latent dynamics of event propagation in graph, where the event sequence propagates in a directed weighted graph whose nodes represent event marks (e.g., event types). Most existing works have only considered encoding sequential event history into event representation and ignored the information from the latent graph structure. Besides they also suffer from poor model explainability, i.e., failing to uncover causal influence across a wide variety of nodes. To address these problems, we propose a Graph Regularized Point Process (GRPP) that can be decomposed into: 1) a graph propagation model that characterizes the event interactions across nodes with neighbors and inductively learns node representations; 2) a temporal attentive intensity model, whose excitation and time decay factors of past events on the current event are constructed via the contextualization of the node embedding. Moreover, by applying a graph regularization method, GRPP provides model interpretability by uncovering influence strengths between nodes. Numerical experiments on various datasets show that GRPP outperforms existing models on both the propagation time and node prediction by notable margins.

* 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE  
* IJCNN 2021 
Viaarxiv icon