Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Tang

Tony

Parameter-Efficient Fine-Tuning for Foundation Models

Jan 23, 2025

Dan Zhang, Tao Feng, Lilong Xue, Yuandong Wang, Yuxiao Dong, Jie Tang

Abstract:This survey delves into the realm of Parameter-Efficient Fine-Tuning (PEFT) within the context of Foundation Models (FMs). PEFT, a cost-effective fine-tuning technique, minimizes parameters and computational complexity while striving for optimal downstream task performance. FMs, like ChatGPT, DALL-E, and LLaVA specialize in language understanding, generative tasks, and multimodal tasks, trained on diverse datasets spanning text, images, and videos. The diversity of FMs guides various adaptation strategies for PEFT. Therefore, this survey aims to provide a comprehensive overview of PEFT techniques applied to diverse FMs and address critical gaps in understanding the techniques, trends, and applications. We start by providing a detailed development of FMs and PEFT. Subsequently, we systematically review the key categories and core mechanisms of PEFT across diverse FMs to offer a comprehensive understanding of trends. We also explore the most recent applications across various FMs to demonstrate the versatility of PEFT, shedding light on the integration of systematic PEFT methods with a range of FMs. Furthermore, we identify potential research and development directions for improving PEFTs in the future. This survey provides a valuable resource for both newcomers and experts seeking to understand and use the power of PEFT across FMs. All reviewed papers are listed at \url{https://github.com/THUDM/Awesome-Parameter-Efficient-Fine-Tuning-for-Foundation-Models}.

* 25 pages, 6 figures, 7 tables

Via

Access Paper or Ask Questions

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Jan 20, 2025

Zhenyu Hou, Xin Lv, Rui Lu, Jiajie Zhang, Yujiang Li, Zijun Yao, Juanzi Li, Jie Tang, Yuxiao Dong

Figure 1 for Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Figure 2 for Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Figure 3 for Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Figure 4 for Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling. While reinforcement learning (RL) holds promise for enabling self-exploration and learning from feedback, recent attempts yield only modest improvements in complex reasoning. In this paper, we present T1 to scale RL by encouraging exploration and understand inference scaling. We first initialize the LLM using synthesized chain-of-thought data that integrates trial-and-error and self-verification. To scale RL training, we promote increased sampling diversity through oversampling. We further employ an entropy bonus as an auxiliary loss, alongside a dynamic anchor for regularization to facilitate reward optimization. We demonstrate that T1 with open LLMs as its base exhibits inference scaling behavior and achieves superior performance on challenging math reasoning benchmarks. For example, T1 with Qwen2.5-32B as the base model outperforms the recent Qwen QwQ-32B-Preview model on MATH500, AIME2024, and Omni-math-500. More importantly, we present a simple strategy to examine inference scaling, where increased inference budgets directly lead to T1's better performance without any additional verification. We will open-source the T1 models and the data used to train them at \url{https://github.com/THUDM/T1}.

Via

Access Paper or Ask Questions

ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Jan 17, 2025

Lucen Zhong, Zhengxiao Du, Xiaohan Zhang, Haiyi Hu, Jie Tang

Figure 1 for ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Figure 2 for ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Figure 3 for ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Figure 4 for ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Abstract:Enhancing large language models (LLMs) with real-time APIs can help generate more accurate and up-to-date responses. However, evaluating the function calling abilities of LLMs in real-world scenarios remains under-explored due to the complexity of data collection and evaluation. In this work, we introduce ComplexFuncBench, a benchmark for complex function calling across five real-world scenarios. Compared to existing benchmarks, ComplexFuncBench encompasses multi-step and constrained function calling, which requires long-parameter filing, parameter value reasoning, and 128k long context. Additionally, we propose an automatic framework, ComplexEval, for quantitatively evaluating complex function calling tasks. Through comprehensive experiments, we demonstrate the deficiencies of state-of-the-art LLMs in function calling and suggest future directions for optimizing these capabilities. The data and code are available at \url{https://github.com/THUDM/ComplexFuncBench}.

Via

Access Paper or Ask Questions

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Jan 06, 2025

Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang

Figure 1 for MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Figure 2 for MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Figure 3 for MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Figure 4 for MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Abstract:In recent years, vision language models (VLMs) have made significant advancements in video understanding. However, a crucial capability - fine-grained motion comprehension - remains under-explored in current benchmarks. To address this gap, we propose MotionBench, a comprehensive evaluation benchmark designed to assess the fine-grained motion comprehension of video understanding models. MotionBench evaluates models' motion-level perception through six primary categories of motion-oriented question types and includes data collected from diverse sources, ensuring a broad representation of real-world video content. Experimental results reveal that existing VLMs perform poorly in understanding fine-grained motions. To enhance VLM's ability to perceive fine-grained motion within a limited sequence length of LLM, we conduct extensive experiments reviewing VLM architectures optimized for video feature compression and propose a novel and efficient Through-Encoder (TE) Fusion method. Experiments show that higher frame rate inputs and TE Fusion yield improvements in motion understanding, yet there is still substantial room for enhancement. Our benchmark aims to guide and motivate the development of more capable video understanding models, emphasizing the importance of fine-grained motion comprehension. Project page: https://motion-bench.github.io .

* 20 pages

Via

Access Paper or Ask Questions

CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis

Jan 03, 2025

Bohan Zhang, Xiaokang Zhang, Jing Zhang, Jifan Yu, Sijia Luo, Jie Tang

Figure 1 for CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis

Figure 2 for CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis

Figure 3 for CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis

Figure 4 for CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis

Abstract:Current inference scaling methods, such as Self-consistency and Best-of-N, have proven effective in improving the accuracy of LLMs on complex reasoning tasks. However, these methods rely heavily on the quality of candidate responses and are unable to produce correct answers when all candidates are incorrect. In this paper, we propose a novel inference scaling strategy, CoT-based Synthesizer, which leverages CoT reasoning to synthesize superior answers by analyzing complementary information from multiple candidate responses, even when all candidate responses are flawed. To enable a lightweight and cost-effective implementation, we introduce an automated data generation pipeline that creates diverse training data. This allows smaller LLMs trained on this data to improve the inference accuracy of larger models, including API-based LLMs. Experimental results across four benchmark datasets with seven policy models demonstrate that our method significantly enhances performance, with gains of 11.8% for Llama3-8B and 10.3% for GPT-4o on the MATH dataset. The corresponding training data and code are publicly available on https://github.com/RUCKBReasoning/CoT-based-Synthesizer.

Via

Access Paper or Ask Questions

ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Jan 03, 2025

Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang

Figure 1 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Figure 2 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Figure 3 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Figure 4 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Abstract:Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Like, SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. In practice, permission to access gradient information is not always granted (the gradient ban), such as black-box APIs, hardware limitations, and non-differentiable systems. To bridge this gap, we introduce the first benchmark ZeroFlow to evaluate gradient-free optimization algorithms for overcoming forgetting. This benchmark examines a suite of forward pass methods across multiple methods, forgetting scenarios, and datasets. We find that forward passes alone are enough to overcome forgetting. Our findings reveal new optimization principles that highlight the potential of forward-pass in mitigating forgetting, managing task conflicts, and reducing memory demands, alongside novel enhancements that further mitigate forgetting with just one forward pass. This work provides essential insights and tools for advancing forward pass methods to overcome forgetting.

Via

Access Paper or Ask Questions

Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems

Jan 02, 2025

Shunxing Yang, Junteng Yao, Jie Tang, Tuo Wu, Maged Elkashlan, Chau Yuen, Merouane Debbah, Hyundong Shin, Matthew Valenti

Figure 1 for Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems

Figure 2 for Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems

Figure 3 for Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems

Abstract:Fluid antenna systems (FAS) enable dynamic antenna positioning, offering new opportunities to enhance integrated sensing and communication (ISAC) performance. However, existing studies primarily focus on communication enhancement or single-target sensing, leaving multi-target scenarios underexplored. Additionally, the joint optimization of beamforming and antenna positions poses a highly non-convex problem, with traditional methods becoming impractical as the number of fluid antennas increases. To address these challenges, this letter proposes a block coordinate descent (BCD) framework integrated with a deep reinforcement learning (DRL)-based approach for intelligent antenna positioning. By leveraging the deep deterministic policy gradient (DDPG) algorithm, the proposed framework efficiently balances sensing and communication performance. Simulation results demonstrate the scalability and effectiveness of the proposed approach.

Via

Access Paper or Ask Questions

Dynamic Scaling of Unit Tests for Code Reward Modeling

Jan 02, 2025

Zeyao Ma, Xiaokang Zhang, Jing Zhang, Jifan Yu, Sijia Luo, Jie Tang

Figure 1 for Dynamic Scaling of Unit Tests for Code Reward Modeling

Figure 2 for Dynamic Scaling of Unit Tests for Code Reward Modeling

Figure 3 for Dynamic Scaling of Unit Tests for Code Reward Modeling

Figure 4 for Dynamic Scaling of Unit Tests for Code Reward Modeling

Abstract:Current large language models (LLMs) often struggle to produce accurate responses on the first attempt for complex reasoning tasks like code generation. Prior research tackles this challenge by generating multiple candidate solutions and validating them with LLM-generated unit tests. The execution results of unit tests serve as reward signals to identify correct solutions. As LLMs always confidently make mistakes, these unit tests are not reliable, thereby diminishing the quality of reward signals. Motivated by the observation that scaling the number of solutions improves LLM performance, we explore the impact of scaling unit tests to enhance reward signal quality. Our pioneer experiment reveals a positive correlation between the number of unit tests and reward signal quality, with greater benefits observed in more challenging problems. Based on these insights, we propose CodeRM-8B, a lightweight yet effective unit test generator that enables efficient and high-quality unit test scaling. Additionally, we implement a dynamic scaling mechanism that adapts the number of unit tests based on problem difficulty, further improving efficiency. Experimental results show that our approach significantly improves performance across various models on three benchmarks (e.g., with gains of 18.43% for Llama3-8B and 3.42% for GPT-4o-mini on HumanEval Plus).

* Homepage: https://code-reward-model.github.io/

Via

Access Paper or Ask Questions

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Dec 30, 2024

Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li(+11 more)

Abstract:We present a general strategy to aligning visual generation models -- both image and video generation -- with human preference. To start with, we build VisionReward -- a fine-grained and multi-dimensional reward model. We decompose human preferences in images and videos into multiple dimensions, each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accurate score. To address the challenges of video quality assessment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. Based on VisionReward, we develop a multi-objective preference learning algorithm that effectively addresses the issue of confounding factors within preference data. Our approach significantly outperforms existing image and video scoring methods on both machine metrics and human evaluation. All code and datasets are provided at https://github.com/THUDM/VisionReward.

* 27 pages

Via

Access Paper or Ask Questions

OpenAI o1 System Card

Dec 21, 2024

OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry(+253 more)

Abstract:The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

Via

Access Paper or Ask Questions