Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengnan Du

LawLLM: Law Large Language Model for the US Legal System

Jul 27, 2024

Dong Shu, Haoran Zhao, Xukun Liu, David Demeter, Mengnan Du, Yongfeng Zhang

Figure 1 for LawLLM: Law Large Language Model for the US Legal System

Figure 2 for LawLLM: Law Large Language Model for the US Legal System

Figure 3 for LawLLM: Law Large Language Model for the US Legal System

Figure 4 for LawLLM: Law Large Language Model for the US Legal System

Abstract:In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax, and historical context. Moreover, the subtle distinctions between similar and precedent cases require a deep understanding of legal knowledge. Researchers often conflate these concepts, making it difficult to develop specialized techniques to effectively address these nuanced tasks. In this paper, we introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain to address these challenges. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP). By clearly distinguishing between precedent and similar cases, we provide essential clarity, guiding future research in developing specialized strategies for these tasks. We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format. Furthermore, we also use techniques such as in-context learning (ICL) and advanced information retrieval methods in LawLLM. The evaluation results demonstrate that LawLLM consistently outperforms existing baselines in both zero-shot and few-shot scenarios, offering unparalleled multi-task capabilities and filling critical gaps in the legal domain.

* 21 pages, 2 figures, accepted at the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024) for the Applied Research Paper track

Via

Access Paper or Ask Questions

When Search Engine Services meet Large Language Models: Visions and Challenges

Jun 28, 2024

Haoyi Xiong, Jiang Bian, Yuchen Li, Xuhong Li, Mengnan Du, Shuaiqiang Wang, Dawei Yin, Sumi Helal

Figure 1 for When Search Engine Services meet Large Language Models: Visions and Challenges

Figure 2 for When Search Engine Services meet Large Language Models: Visions and Challenges

Figure 3 for When Search Engine Services meet Large Language Models: Visions and Challenges

Figure 4 for When Search Engine Services meet Large Language Models: Visions and Challenges

Abstract:Combining Large Language Models (LLMs) with search engine services marks a significant shift in the field of services computing, opening up new possibilities to enhance how we search for and retrieve information, understand content, and interact with internet services. This paper conducts an in-depth examination of how integrating LLMs with search engines can mutually benefit both technologies. We focus on two main areas: using search engines to improve LLMs (Search4LLM) and enhancing search engine functions using LLMs (LLM4Search). For Search4LLM, we investigate how search engines can provide diverse high-quality datasets for pre-training of LLMs, how they can use the most relevant documents to help LLMs learn to answer queries more accurately, how training LLMs with Learning-To-Rank (LTR) tasks can enhance their ability to respond with greater precision, and how incorporating recent search results can make LLM-generated content more accurate and current. In terms of LLM4Search, we examine how LLMs can be used to summarize content for better indexing by search engines, improve query outcomes through optimization, enhance the ranking of search results by analyzing document relevance, and help in annotating data for learning-to-rank tasks in various learning contexts. However, this promising integration comes with its challenges, which include addressing potential biases and ethical issues in training models, managing the computational and other costs of incorporating LLMs into search services, and continuously updating LLM training with the ever-changing web content. We discuss these challenges and chart out required research directions to address them. We also discuss broader implications for service computing, such as scalability, privacy concerns, and the need to adapt search engine architectures for these advanced models.

* Under Review

Via

Access Paper or Ask Questions

Towards Imperceptible Backdoor Attack in Self-supervised Learning

May 23, 2024

Hanrong Zhang, Zhenting Wang, Tingxu Han, Mingyu Jin, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma

Figure 1 for Towards Imperceptible Backdoor Attack in Self-supervised Learning

Figure 2 for Towards Imperceptible Backdoor Attack in Self-supervised Learning

Figure 3 for Towards Imperceptible Backdoor Attack in Self-supervised Learning

Figure 4 for Towards Imperceptible Backdoor Attack in Self-supervised Learning

Abstract:Self-supervised learning models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in self-supervised learning often involve noticeable triggers, like colored patches, which are vulnerable to human inspection. In this paper, we propose an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers designed for supervised learning are not as effective in compromising self-supervised models. We then identify this ineffectiveness is attributed to the overlap in distributions between the backdoor and augmented samples used in self-supervised learning. Building on this insight, we design an attack using optimized triggers that are disentangled to the augmented transformation in the self-supervised learning, while also remaining imperceptible to human vision. Experiments on five datasets and seven SSL algorithms demonstrate our attack is highly effective and stealthy. It also has strong resistance to existing backdoor defenses. Our code can be found at https://github.com/Zhang-Henry/IMPERATIVE.

Via

Access Paper or Ask Questions

Neural Operator for Accelerating Coronal Magnetic Field Model

May 21, 2024

Yutao Du, Qin Li, Raghav Gnanasambandam, Mengnan Du, Haimin Wang, Bo Shen

Figure 1 for Neural Operator for Accelerating Coronal Magnetic Field Model

Figure 2 for Neural Operator for Accelerating Coronal Magnetic Field Model

Figure 3 for Neural Operator for Accelerating Coronal Magnetic Field Model

Figure 4 for Neural Operator for Accelerating Coronal Magnetic Field Model

Abstract:Studying the sun's outer atmosphere is challenging due to its complex magnetic fields impacting solar activities. Magnetohydrodynamics (MHD) simulations help model these interactions but are extremely time-consuming (usually on a scale of days). Our research applies the Fourier Neural Operator (FNO) to accelerate the coronal magnetic field modeling, specifically, the Bifrost MHD model. We apply Tensorized FNO (TFNO) to generate solutions from partial differential equations (PDEs) over a 3D domain efficiently. TFNO's performance is compared with other deep learning methods, highlighting its accuracy and scalability. Physics analysis confirms that TFNO is reliable and capable of accelerating MHD simulations with high precision. This advancement improves efficiency in data handling, enhances predictive capabilities, and provides a better understanding of magnetic topologies.

Via

Access Paper or Ask Questions

Quantifying Multilingual Performance of Large Language Models Across Languages

Apr 17, 2024

Zihao Li, Yucheng Shi, Zirui Liu, Fan Yang, Ninghao Liu, Mengnan Du

Abstract:The training process of Large Language Models (LLMs) requires extensive text corpus. However, these data are often unevenly distributed in different languages. As a result, LLMs perform well on common languages, such as English, German, and French, but perform poorly on low-resource languages. However, currently there is no work to quantitatively measure the performance of LLMs in low-resource languages. To fill this gap, we proposed the Language Ranker that aims to benchmark and rank different languages according to the performance of LLMs on those languages. We employ the LLM's performance on the English corpus as a baseline to compare the performances of different languages and English. We have the following three findings: 1. The performance rankings of different LLMs in all languages are roughly the same. 2. LLMs with different sizes have the same partial order of performance. 3. There is a strong correlation between LlaMa2's performance in different languages and the proportion of the pre-training corpus. These findings illustrate that the Language Ranker can be used as an indicator to measure the language performance of LLMs.

Via

Access Paper or Ask Questions

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

Apr 10, 2024

Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding(+3 more)

Abstract:This paper studies the phenomenon that different concepts are learned in different layers of large language models, i.e. more difficult concepts are fully acquired with deeper layers. We define the difficulty of concepts by the level of abstraction, and here it is crudely categorized by factual, emotional, and inferential. Each category contains a spectrum of tasks, arranged from simple to complex. For example, within the factual dimension, tasks range from lie detection to categorizing mathematical problems. We employ a probing technique to extract representations from different layers of the model and apply these to classification tasks. Our findings reveal that models tend to efficiently classify simpler tasks, indicating that these concepts are learned in shallower layers. Conversely, more complex tasks may only be discernible at deeper layers, if at all. This paper explores the implications of these findings for our understanding of model learning processes and internal representations. Our implementation is available at \url{https://github.com/Luckfort/CD}.

* 12 pages

Via

Access Paper or Ask Questions

Knowledge Graph Large Language Model for Link Prediction

Mar 19, 2024

Dong Shu, Tianle Chen, Mingyu Jin, Yiting Zhang, Chong Zhang, Mengnan Du, Yongfeng Zhang

Figure 1 for Knowledge Graph Large Language Model for Link Prediction

Figure 2 for Knowledge Graph Large Language Model for Link Prediction

Figure 3 for Knowledge Graph Large Language Model for Link Prediction

Figure 4 for Knowledge Graph Large Language Model for Link Prediction

Abstract:The task of predicting multiple links within knowledge graphs (KGs) stands as a challenge in the field of knowledge graph analysis, a challenge increasingly resolvable due to advancements in natural language processing (NLP) and KG embedding techniques. This paper introduces a novel methodology, the Knowledge Graph Large Language Model Framework (KG-LLM), which leverages pivotal NLP paradigms, including chain-of-thought (CoT) prompting and in-context learning (ICL), to enhance multi-hop link prediction in KGs. By converting the KG to a CoT prompt, our framework is designed to discern and learn the latent representations of entities and their interrelations. To show the efficacy of the KG-LLM Framework, we fine-tune three leading Large Language Models (LLMs) within this framework, employing both non-ICL and ICL tasks for a comprehensive evaluation. Further, we explore the framework's potential to provide LLMs with zero-shot capabilities for handling previously unseen prompts. Our experimental findings discover that integrating ICL and CoT not only augments the performance of our approach but also significantly boosts the models' generalization capacity, thereby ensuring more precise predictions in unfamiliar scenarios.

* 23 pages, 2 figures

Via

Access Paper or Ask Questions

Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Mar 13, 2024

Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang, Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du(+1 more)

Figure 1 for Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Figure 2 for Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Figure 3 for Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Figure 4 for Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Abstract:Explainable AI (XAI) refers to techniques that provide human-understandable insights into the workings of AI models. Recently, the focus of XAI is being extended towards Large Language Models (LLMs) which are often criticized for their lack of transparency. This extension calls for a significant transformation in XAI methodologies because of two reasons. First, many existing XAI methods cannot be directly applied to LLMs due to their complexity advanced capabilities. Second, as LLMs are increasingly deployed across diverse industry applications, the role of XAI shifts from merely opening the "black box" to actively enhancing the productivity and applicability of LLMs in real-world settings. Meanwhile, unlike traditional machine learning models that are passive recipients of XAI insights, the distinct abilities of LLMs can reciprocally enhance XAI. Therefore, in this paper, we introduce Usable XAI in the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems, and (2) how LLMs can contribute to the advancement of XAI. We introduce 10 strategies, introducing the key techniques for each and discussing their associated challenges. We also provide case studies to demonstrate how to obtain and leverage explanations. The code used in this paper can be found at: https://github.com/JacksonWuxs/UsableXAI_LLM.

* 38 pages, 4 figures

Via

Access Paper or Ask Questions

What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Feb 21, 2024

Mingyu Jin, Beichen Wang, Zhaoqian Xue, Suiyuan Zhu, Wenyue Hua, Hua Tang, Kai Mei, Mengnan Du, Yongfeng Zhang

Figure 1 for What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Figure 2 for What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Figure 3 for What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Figure 4 for What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Abstract:In this study, we introduce "CosmoAgent," an innovative artificial intelligence framework utilizing Large Language Models (LLMs) to simulate complex interactions between human and extraterrestrial civilizations, with a special emphasis on Stephen Hawking's cautionary advice about not sending radio signals haphazardly into the universe. The goal is to assess the feasibility of peaceful coexistence while considering potential risks that could threaten well-intentioned civilizations. Employing mathematical models and state transition matrices, our approach quantitatively evaluates the development trajectories of civilizations, offering insights into future decision-making at critical points of growth and saturation. Furthermore, the paper acknowledges the vast diversity in potential living conditions across the universe, which could foster unique cosmologies, ethical codes, and worldviews among various civilizations. Recognizing the Earth-centric bias inherent in current LLM designs, we propose the novel concept of using LLMs with diverse ethical paradigms and simulating interactions between entities with distinct moral principles. This innovative research provides a new way to understand complex inter-civilizational dynamics, expanding our perspective while pioneering novel strategies for conflict resolution, crucial for preventing interstellar conflicts. We have also released the code and datasets to enable further academic investigation into this interesting area of research. The code is available at https://github.com/agiresearch/AlienAgent.

Via

Access Paper or Ask Questions

Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities

Feb 19, 2024

Mingyu Jin, Hua Tang, Chong Zhang, Qinkai Yu, Chengzhi Liu, Suiyuan Zhu, Yongfeng Zhang, Mengnan Du

Figure 1 for Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities

Figure 2 for Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities

Figure 3 for Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities

Figure 4 for Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities

Abstract:Large language models (LLMs) have been applied in many fields with rapid development in recent years. As a classic machine learning task, time series forecasting has recently received a boost from LLMs. However, there is a research gap in the LLMs' preferences in this field. In this paper, by comparing LLMs with traditional models, many properties of LLMs in time series prediction are found. For example, our study shows that LLMs excel in predicting time series with clear patterns and trends but face challenges with datasets lacking periodicity. We explain our findings through designing prompts to require LLMs to tell the period of the datasets. In addition, the input strategy is investigated, and it is found that incorporating external knowledge and adopting natural language paraphrases positively affects the predictive performance of LLMs for time series. Overall, this study contributes to insight into the advantages and limitations of LLMs in time series forecasting under different conditions.

Via

Access Paper or Ask Questions