Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Meng Fang

Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

Jul 05, 2024

Jiawei Xu, Rui Yang, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han

Abstract:Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption, especially when the amount of data is limited. This suggests the potential of sequential modeling for tackling data corruption in offline RL. To further unleash the potential of sequence modeling methods, we propose Robust Decision Transformer (RDT) by incorporating several robust techniques. Specifically, we introduce Gaussian weighted learning and iterative data correction to reduce the effect of corrupted data. Additionally, we leverage embedding dropout to enhance the model's resistance to erroneous inputs. Extensive experiments on MoJoCo, KitChen, and Adroit tasks demonstrate RDT's superior performance under diverse data corruption compared to previous methods. Moreover, RDT exhibits remarkable robustness in a challenging setting that combines training-time data corruption with testing-time observation perturbations. These results highlight the potential of robust sequence modeling for learning from noisy or corrupted offline datasets, thereby promoting the reliable application of offline RL in real-world tasks.

Via

Access Paper or Ask Questions

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Jun 26, 2024

Meng Fang, Xiangpeng Wan, Fei Lu, Fei Xing, Kai Zou

Figure 1 for MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Figure 2 for MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Figure 3 for MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Figure 4 for MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Abstract:Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset. The dataset includes diverse mathematical problems at high school and university levels, created by experts from notable institutions to rigorously test LLMs in advanced problem-solving scenarios and cover a wider range of subject areas. By providing the MathOdyssey dataset as a resource to the AI community, we aim to contribute to the understanding and improvement of AI capabilities in complex mathematical problem-solving. We conduct benchmarking on open-source models, such as Llama-3 and DBRX-Instruct, and closed-source models from the GPT series and Gemini models. Our results indicate that while LLMs perform well on routine and moderately difficult tasks, they face significant challenges with Olympiad-level problems and complex university-level questions. Our analysis shows a narrowing performance gap between open-source and closed-source models, yet substantial challenges remain, particularly with the most demanding problems. This study highlights the ongoing need for research to enhance the mathematical reasoning of LLMs. The dataset, results, and code are publicly available.

Via

Access Paper or Ask Questions

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Jun 25, 2024

Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath(+30 more)

Figure 1 for LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Figure 2 for LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Figure 3 for LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Figure 4 for LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Abstract:This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.

Via

Access Paper or Ask Questions

Synergizing Foundation Models and Federated Learning: A Survey

Jun 18, 2024

Shenghui Li, Fanghua Ye, Meng Fang, Jiaxu Zhao, Yun-Hin Chan, Edith C. -H. Ngai, Thiemo Voigt

Figure 1 for Synergizing Foundation Models and Federated Learning: A Survey

Figure 2 for Synergizing Foundation Models and Federated Learning: A Survey

Figure 3 for Synergizing Foundation Models and Federated Learning: A Survey

Figure 4 for Synergizing Foundation Models and Federated Learning: A Survey

Abstract:The recent development of Foundation Models (FMs), represented by large language models, vision transformers, and multimodal models, has been making a significant impact on both academia and industry. Compared with small-scale models, FMs have a much stronger demand for high-volume data during the pre-training phase. Although general FMs can be pre-trained on data collected from open sources such as the Internet, domain-specific FMs need proprietary data, posing a practical challenge regarding the amount of data available due to privacy concerns. Federated Learning (FL) is a collaborative learning paradigm that breaks the barrier of data availability from different participants. Therefore, it provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy. This survey paper discusses the potentials and challenges of synergizing FL and FMs and summarizes core techniques, future directions, and applications. A periodically updated paper collection on FM-FL is available at https://github.com/lishenghui/awesome-fm-fl.

Via

Access Paper or Ask Questions

Augmenting Biomedical Named Entity Recognition with General-domain Resources

Jun 15, 2024

Yu Yin, Hyunjae Kim, Xiao Xiao, Chih Hsuan Wei, Jaewoo Kang, Zhiyong Lu, Hua Xu, Meng Fang, Qingyu Chen

Figure 1 for Augmenting Biomedical Named Entity Recognition with General-domain Resources

Figure 2 for Augmenting Biomedical Named Entity Recognition with General-domain Resources

Figure 3 for Augmenting Biomedical Named Entity Recognition with General-domain Resources

Figure 4 for Augmenting Biomedical Named Entity Recognition with General-domain Resources

Abstract:Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets. In this paper, we proposed GERBERA, a simple-yet-effective method that utilized a general-domain NER dataset for training. Specifically, we performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset. We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with multiple additional BioNER datasets. Specifically, our models consistently outperformed the baselines in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight biomedical entity types sourced from five different corpora. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset.

* We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera

Via

Access Paper or Ask Questions

Revisiting Catastrophic Forgetting in Large Language Model Tuning

Jun 07, 2024

Hongyu Li, Liang Ding, Meng Fang, Dacheng Tao

Figure 1 for Revisiting Catastrophic Forgetting in Large Language Model Tuning

Figure 2 for Revisiting Catastrophic Forgetting in Large Language Model Tuning

Figure 3 for Revisiting Catastrophic Forgetting in Large Language Model Tuning

Figure 4 for Revisiting Catastrophic Forgetting in Large Language Model Tuning

Abstract:Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data. It compromises the effectiveness of large language models (LLMs) during fine-tuning, yet the underlying causes have not been thoroughly investigated. This paper takes the first step to reveal the direct link between the flatness of the model loss landscape and the extent of CF in the field of LLMs. Based on this, we introduce the sharpness-aware minimization to mitigate CF by flattening the loss landscape. Experiments on three widely-used fine-tuning datasets, spanning different model scales, demonstrate the effectiveness of our method in alleviating CF. Analyses show that we nicely complement the existing anti-forgetting strategies, further enhancing the resistance of LLMs to CF.

Via

Access Paper or Ask Questions

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

May 30, 2024

Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, Jun Wang

Abstract:Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework.

* 27 pages, 5 figures

Via

Access Paper or Ask Questions

Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

May 21, 2024

Jiaxu Liu, Xiangyu Yin, Sihao Wu, Jianhong Wang, Meng Fang, Xinping Yi, Xiaowei Huang

Figure 1 for Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

Figure 2 for Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

Figure 3 for Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

Figure 4 for Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

Abstract:With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, effectively reducing toxicity in responses from target LLMs. The sentinel model naturally overcomes the \textit{parameter inefficiency} and \textit{limited model accessibility} for fine-tuning large target models. We employ an interleaved training regimen using Proximal Policy Optimization (PPO) to optimize both red team and sentinel models dynamically, incorporating a value head-sharing mechanism inspired by the multi-agent centralized critic to manage the complex interplay between agents. Our extensive experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs, even when dealing with larger models like \texttt{Llama-2}, \texttt{GPT-3.5} and \texttt{Stable-Diffusion}, highlighting the potential of our framework in enhancing safety and robustness in various applications.

* Preprint, 10 pages main with 10 pages appendix

Via

Access Paper or Ask Questions

Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion

May 18, 2024

Zeyu Zhang, Yiran Wang, Biao Wu, Shuo Chen, Zhiyuan Zhang, Shiya Huang, Wenbo Zhang, Meng Fang, Ling Chen, Yang Zhao

Abstract:In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. Additionally, while avatar and motion generation predominantly target humans, extending these techniques to animals remains a significant challenge due to inadequate training data and methods. To bridge these gaps, our paper presents three key contributions. Firstly, we proposed a novel agent-based approach named Motion Avatar, which allows for the automatic generation of high-quality customizable human and animal avatars with motions through text queries. The method significantly advanced the progress in dynamic 3D character generation. Secondly, we introduced a LLM planner that coordinates both motion and avatar generation, which transforms a discriminative planning into a customizable Q&A fashion. Lastly, we presented an animal motion dataset named Zoo-300K, comprising approximately 300,000 text-motion pairs across 65 animal categories and its building pipeline ZooGen, which serves as a valuable resource for the community. See project website https://steve-zeyu-zhang.github.io/MotionAvatar/

Via

Access Paper or Ask Questions

MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Apr 28, 2024

Zirui Song, Yaohang Li, Meng Fang, Zhenhao Chen, Zecheng Shi, Yuan Huang

Figure 1 for MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Figure 2 for MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Figure 3 for MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Figure 4 for MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Abstract:Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps. To evaluate the performance of MMAC-Copilot, we conducted experiments using both the GAIA benchmark and our newly introduced Visual Interaction Benchmark (VIBench). VIBench focuses on non-API-interactable applications across various domains, including 3D gaming, recreation, and office scenarios. MMAC-Copilot achieved exceptional performance on GAIA, with an average improvement of 6.8\% over existing leading systems. Furthermore, it demonstrated remarkable capability on VIBench, particularly in managing various methods of interaction within systems and applications. These results underscore MMAC-Copilot's potential in advancing the field of autonomous virtual agents through its innovative approach to agent collaboration.

* In processing

Via

Access Paper or Ask Questions