Alert button
Picture for Xianjun Yang

Xianjun Yang

Alert button

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

May 27, 2023
Xianjun Yang, Wei Cheng, Linda Petzold, William Yang Wang, Haifeng Chen

Figure 1 for DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Figure 2 for DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Figure 3 for DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Figure 4 for DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we can clearly illustrate significant discrepancies between machine-generated and human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.

Viaarxiv icon

Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

May 22, 2023
Xinlu Zhang, Shiyang Li, Xianjun Yang, Chenxin Tian, Yao Qin, Linda Ruth Petzold

Figure 1 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting
Figure 2 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting
Figure 3 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting
Figure 4 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Large language models (LLMs) demonstrate remarkable medical expertise, but data privacy concerns impede their direct use in healthcare environments. Although offering improved data privacy protection, domain-specific small language models (SLMs) often underperform LLMs, emphasizing the need for methods that reduce this performance gap while alleviating privacy concerns. In this paper, we present a simple yet effective method that harnesses LLMs' medical proficiency to boost SLM performance in medical tasks under privacy-restricted scenarios. Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context by simulating clinicians' thought processes. This context serves as additional input for SLMs, augmenting their decision-making capabilities. Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability.

Viaarxiv icon

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

May 18, 2023
Yujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang Wang

Figure 1 for LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Figure 2 for LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Figure 3 for LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Figure 4 for LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments. In this work, we propose LLMScore, a new framework that offers evaluation scores with multi-granularity compositionality. LLMScore leverages the large language models (LLMs) to evaluate text-to-image models. Initially, it transforms the image into image-level and object-level visual descriptions. Then an evaluation instruction is fed into the LLMs to measure the alignment between the synthesized image and the text, ultimately generating a score accompanied by a rationale. Our substantial analysis reveals the highest correlation of LLMScore with human judgments on a wide range of datasets (Attribute Binding Contrast, Concept Conjunction, MSCOCO, DrawBench, PaintSkills). Notably, our LLMScore achieves Kendall's tau correlation with human evaluations that is 58.8% and 31.2% higher than the commonly-used text-image matching metrics CLIP and BLIP, respectively.

Viaarxiv icon

Dynamic Prompting: A Unified Framework for Prompt Tuning

Mar 06, 2023
Xianjun Yang, Wei Cheng, Xujiang Zhao, Linda Petzold, Haifeng Chen

Figure 1 for Dynamic Prompting: A Unified Framework for Prompt Tuning
Figure 2 for Dynamic Prompting: A Unified Framework for Prompt Tuning
Figure 3 for Dynamic Prompting: A Unified Framework for Prompt Tuning
Figure 4 for Dynamic Prompting: A Unified Framework for Prompt Tuning

It has been demonstrated that prompt tuning is highly effective in efficiently eliciting knowledge from language models (LMs). However, the prompt tuning still lags behind fine-tuning, especially when the LMs are small. P-tuning v2 (Liu et al., 2021b) makes it comparable with finetuning by adding continuous prompts for every layer of the pre-trained model. However, prepending fixed soft prompts for all instances, regardless of their discrepancy, is doubtful. In particular, the inserted prompt position, length, and the representations of prompts for diversified instances through different tasks could all affect the prompt tuning performance. To fill this gap, we propose dynamic prompting (DP): the position, length, and prompt representation can all be dynamically optimized with respect to different tasks and instances. We conduct comprehensive experiments on the SuperGlue benchmark to validate our hypothesis and demonstrate substantial improvements. We also derive a unified framework for supporting our dynamic prompting strategy. In particular, we use a simple learning network and Gumble- Softmax for learning instance-dependent guidance. Experimental results show that simple instance-level position-aware soft prompts can improve the classification accuracy of up to 6 points on average on five datasets, reducing its gap with fine-tuning. Besides, we also prove its universal usefulness under full-data, few-shot, and multitask regimes. Combining them together can even further unleash the power of DP, narrowing the distance between finetuning.

* Work in progress 
Viaarxiv icon

Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

Feb 16, 2023
Xianjun Yang, Yan Li, Xinlu Zhang, Haifeng Chen, Wei Cheng

Figure 1 for Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization
Figure 2 for Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization
Figure 3 for Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

Text summarization has been a crucial problem in natural language processing (NLP) for several decades. It aims to condense lengthy documents into shorter versions while retaining the most critical information. Various methods have been proposed for text summarization, including extractive and abstractive summarization. The emergence of large language models (LLMs) like GPT3 and ChatGPT has recently created significant interest in using these models for text summarization tasks. Recent studies \cite{goyal2022news, zhang2023benchmarking} have shown that LLMs-generated news summaries are already on par with humans. However, the performance of LLMs for more practical applications like aspect or query-based summaries is underexplored. To fill this gap, we conducted an evaluation of ChatGPT's performance on four widely used benchmark datasets, encompassing diverse summaries from Reddit posts, news articles, dialogue meetings, and stories. Our experiments reveal that ChatGPT's performance is comparable to traditional fine-tuning methods in terms of Rouge scores. Moreover, we highlight some unique differences between ChatGPT-generated summaries and human references, providing valuable insights into the superpower of ChatGPT for diverse text summarization tasks. Our findings call for new directions in this area, and we plan to conduct further research to systematically examine the characteristics of ChatGPT-generated summaries through extensive human evaluation.

* Work in progress 
Viaarxiv icon

MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures

Feb 11, 2023
Xianjun Yang, Stephen Wilson, Linda Petzold

Figure 1 for MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures
Figure 2 for MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures
Figure 3 for MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures
Figure 4 for MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures

In this paper, we present a novel approach to knowledge extraction and retrieval using Natural Language Processing (NLP) techniques for material science. Our goal is to automatically mine structured knowledge from millions of research articles in the field of polycrystalline materials and make it easily accessible to the broader community. The proposed method leverages NLP techniques such as entity recognition and document classification to extract relevant information and build an extensive knowledge base, from a collection of 9.5 Million publications. The resulting knowledge base is integrated into a search engine, which enables users to search for information about specific materials, properties, and experiments with greater precision than traditional search engines like Google. We hope our results can enable material scientists quickly locate desired experimental procedures, compare their differences, and even inspire them to design new experiments. Our website will be available at Github \footnote{https://github.com/Xianjun-Yang/PcMSP.git} soon.

* Work in Progress 
Viaarxiv icon

ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval

Feb 05, 2023
Kexun Zhang, Xianjun Yang, William Yang Wang, Lei Li

Figure 1 for ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval
Figure 2 for ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval
Figure 3 for ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval
Figure 4 for ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval

Diffusion models show promising generation capability for a variety of data. Despite their high generation quality, the inference for diffusion models is still time-consuming due to the numerous sampling iterations required. To accelerate the inference, we propose ReDi, a simple yet learning-free Retrieval-based Diffusion sampling framework. From a precomputed knowledge base, ReDi retrieves a trajectory similar to the partially generated trajectory at an early stage of generation, skips a large portion of intermediate steps, and continues sampling from a later step in the retrieved trajectory. We theoretically prove that the generation performance of ReDi is guaranteed. Our experiments demonstrate that ReDi improves the model inference efficiency by 2x speedup. Furthermore, ReDi is able to generalize well in zero-shot cross-domain image generation such as image stylization.

Viaarxiv icon

OASum: Large-Scale Open Domain Aspect-based Summarization

Dec 19, 2022
Xianjun Yang, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Xiaoman Pan, Linda Petzold, Dong Yu

Figure 1 for OASum: Large-Scale Open Domain Aspect-based Summarization
Figure 2 for OASum: Large-Scale Open Domain Aspect-based Summarization
Figure 3 for OASum: Large-Scale Open Domain Aspect-based Summarization
Figure 4 for OASum: Large-Scale Open Domain Aspect-based Summarization

Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OAsum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.

Viaarxiv icon

Few-Shot Document-Level Event Argument Extraction

Sep 06, 2022
Xianjun Yang, Yujie Lu, Linda Petzold

Figure 1 for Few-Shot Document-Level Event Argument Extraction
Figure 2 for Few-Shot Document-Level Event Argument Extraction
Figure 3 for Few-Shot Document-Level Event Argument Extraction
Figure 4 for Few-Shot Document-Level Event Argument Extraction

Event argument extraction (EAE) has been well studied at the sentence level but under-explored at the document level. In this paper, we study to capture event arguments that actually spread across sentences in documents. Prior works mainly assume full access to rich document supervision, ignoring the fact that the argument supervision is limited in documents. To fill this gap, we present FewDocAE, a Few-Shot Document-Level Event Argument Extraction benchmark, based on the largest document-level event extraction dataset DocEE. We first define the new problem and reconstruct the corpus by a novel N-Way-D-Doc sampling instead of the traditional N-Way-K-Shot strategy. Then we adjust the advanced document-level neural models into the few-shot setting to provide baseline results under in- and cross-domain settings. Since the argument extraction depends on the context from multiple sentences and the learning process is limited to very few examples, we find the task to be very challenging with substantively low performance. Considering FewDocAE is closely related to practical use under low-resource regimes, we hope this benchmark encourages more research in this direction. Our data and codes will be available online.

Viaarxiv icon