Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Zhao

Department of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Offline RLHF Methods Need More Accurate Supervision Signals

Aug 18, 2024

Shiqi Wang, Zhengze Zhang, Rui Zhao, Fei Tan, Cam Tu Nguyen

Figure 1 for Offline RLHF Methods Need More Accurate Supervision Signals

Figure 2 for Offline RLHF Methods Need More Accurate Supervision Signals

Figure 3 for Offline RLHF Methods Need More Accurate Supervision Signals

Figure 4 for Offline RLHF Methods Need More Accurate Supervision Signals

Abstract:With the rapid advances in Large Language Models (LLMs), aligning LLMs with human preferences become increasingly important. Although Reinforcement Learning with Human Feedback (RLHF) proves effective, it is complicated and highly resource-intensive. As such, offline RLHF has been introduced as an alternative solution, which directly optimizes LLMs with ranking losses on a fixed preference dataset. Current offline RLHF only captures the ``ordinal relationship'' between responses, overlooking the crucial aspect of ``how much'' one is preferred over the others. To address this issue, we propose a simple yet effective solution called \textbf{R}eward \textbf{D}ifference \textbf{O}ptimization, shorted as \textbf{RDO}. Specifically, we introduce {\it reward difference coefficients} to reweigh sample pairs in offline RLHF. We then develop a {\it difference model} involving rich interactions between a pair of responses for predicting these difference coefficients. Experiments with 7B LLMs on the HH and TL;DR datasets substantiate the effectiveness of our method in both automatic metrics and human evaluation, thereby highlighting its potential for aligning LLMs with human intent and values.

* under review

Via

Access Paper or Ask Questions

SimCT: A Simple Consistency Test Protocol in LLMs Development Lifecycle

Jul 24, 2024

Fufangchen Zhao, Guoqiang Jin, Rui Zhao, Jiangheng Huang, Fei Tan

Abstract:In this work, we report our efforts to advance the standard operation procedure of developing Large Language Models (LLMs) or LLMs-based systems or services in industry. We introduce the concept of Large Language Model Development Lifecycle (LDLC) and then highlight the importance of consistency test in ensuring the delivery quality. The principled solution of consistency test, however, is usually overlooked by industrial practitioners and not urgent in academia, and current practical solutions are insufficiently rigours and labor-intensive. We thus propose a simple yet effective consistency test protocol, named SimCT. SimCT is mainly to proactively check the consistency across different development stages of "bare metal" LLMs or associated services without accessing the model artifacts, in an attempt to expedite the delivery by reducing the back-and-forth alignment communications among multiple teams involved in different development stages. Specifically, SimCT encompasses response-wise and model-wise tests. We implement the protocol with LightGBM and Student's t-test for two components respectively, and perform extensive experiments to substantiate the effectiveness of SimCT and the involved components.

Via

Access Paper or Ask Questions

CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

Jul 24, 2024

Jiawei Gu, Zacc Yang, Chuanghao Ding, Rui Zhao, Fei Tan

Figure 1 for CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

Figure 2 for CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

Figure 3 for CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

Figure 4 for CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

Abstract:Large Language Models (LLMs) excel in diverse tasks but often underperform in specialized fields due to limited domain-specific or proprietary corpus. Continual pre-training (CPT) enhances LLM capabilities by imbuing new domain-specific or proprietary knowledge while replaying general corpus to prevent catastrophic forgetting. The data mixture ratio of general corpus and domain-specific corpus, however, has been chosen heuristically, leading to sub-optimal training efficiency in practice. In this context, we attempt to re-visit the scaling behavior of LLMs under the hood of CPT, and discover a power-law relationship between loss, mixture ratio, and training tokens scale. We formalize the trade-off between general and domain-specific capabilities, leading to a well-defined Critical Mixture Ratio (CMR) of general and domain data. By striking the balance, CMR maintains the model's general ability and achieves the desired domain transfer, ensuring the highest utilization of available resources. Therefore, if we value the balance between efficiency and effectiveness, CMR can be consider as the optimal mixture ratio.Through extensive experiments, we ascertain the predictability of CMR, and propose CMR scaling law and have substantiated its generalization. These findings offer practical guidelines for optimizing LLM training in specialized domains, ensuring both general and domain-specific performance while efficiently managing training resources.

Via

Access Paper or Ask Questions

GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

Jul 15, 2024

Haoyuan Jiang, Xuantang Xiong, Ziyue Li, Hangyu Mao, Guanghu Sui, Jingqing Ruan, Yuheng Cheng, Hua Wei, Wolfgang Ketter, Rui Zhao

Figure 1 for GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

Figure 2 for GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

Figure 3 for GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

Figure 4 for GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

Abstract:Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable input is much more limited than simulation-based RL methods. For real-world solutions, only flow can be reliably collected, whereas common RL methods need more. For the output action, most RL methods focus on acyclic control, which real-world signal controllers do not support. Most importantly, industry standards require a consistent cycle-flow relationship: non-decreasing and different response strategies for low, medium, and high-level flows, which is ignored by the RL methods. To narrow the gap between RL methods and industry standards, we innovatively propose to use industry solutions to guide the RL agent. Specifically, we design behavior cloning and curriculum learning to guide the agent to mimic and meet industry requirements and, at the same time, leverage the power of exploration and exploitation in RL for better performance. We theoretically prove that such guidance can largely decrease the sample complexity to polynomials in the horizon when searching for an optimal policy. Our rigid experiments show that our method has good cycle-flow relation and superior performance.

* Under Review of IEEE Transactions on Intelligent Transportation Systems

Via

Access Paper or Ask Questions

Eliminating Feature Ambiguity for Few-Shot Segmentation

Jul 13, 2024

Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao

Figure 1 for Eliminating Feature Ambiguity for Few-Shot Segmentation

Figure 2 for Eliminating Feature Ambiguity for Few-Shot Segmentation

Figure 3 for Eliminating Feature Ambiguity for Few-Shot Segmentation

Abstract:Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features, typically based on cross attention, which selectively activate query foreground (FG) features that correspond to the same-class support FG features. However, due to the large receptive fields in deep layers of the backbone, the extracted query and support FG features are inevitably mingled with background (BG) features, impeding the FG-FG matching in cross attention. Hence, the query FG features are fused with less support FG features, i.e., the support information is not well utilized. This paper presents a novel plug-in termed ambiguity elimination network (AENet), which can be plugged into any existing cross attention-based FSS methods. The main idea is to mine discriminative query FG regions to rectify the ambiguous FG features, increasing the proportion of FG information, so as to suppress the negative impacts of the doped BG features. In this way, the FG-FG matching is naturally enhanced. We plug AENet into three baselines CyCTR, SCCAN and HDMNet for evaluation, and their scores are improved by large margins, e.g., the 1-shot performance of SCCAN can be improved by 3.0%+ on both PASCAL-5$^i$ and COCO-20$^i$. The code is available at https://github.com/Sam1224/AENet.

* This paper is accepted by ECCV'24

Via

Access Paper or Ask Questions

CLEAR: Can Language Models Really Understand Causal Graphs?

Jun 24, 2024

Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu

Figure 1 for CLEAR: Can Language Models Really Understand Causal Graphs?

Figure 2 for CLEAR: Can Language Models Really Understand Causal Graphs?

Figure 3 for CLEAR: Can Language Models Really Understand Causal Graphs?

Figure 4 for CLEAR: Can Language Models Really Understand Causal Graphs?

Abstract:Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we develop a framework to define causal graph understanding, by assessing language models' behaviors through four practical criteria derived from diverse disciplines (e.g., philosophy and psychology). We then develop CLEAR, a novel benchmark that defines three complexity levels and encompasses 20 causal graph-based tasks across these levels. Finally, based on our framework and benchmark, we conduct extensive experiments on six leading language models and summarize five empirical findings. Our results indicate that while language models demonstrate a preliminary understanding of causal graphs, significant potential for improvement remains. Our project website is at https://github.com/OpenCausaLab/CLEAR.

Via

Access Paper or Ask Questions

TimeCMA: Towards LLM-Empowered Time Series Forecasting via Cross-Modality Alignment

Jun 03, 2024

Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, Rui Zhao

Abstract:The widespread adoption of scalable mobile sensing has led to large amounts of time series data for real-world applications. A fundamental application is multivariate time series forecasting (MTSF), which aims to predict future time series values based on historical observations. Existing MTSF methods suffer from limited parameterization and small-scale training data. Recently, Large language models (LLMs) have been introduced in time series, which achieve promising forecasting performance but incur heavy computational costs. To solve these challenges, we propose TimeCMA, an LLM-empowered framework for time series forecasting with cross-modality alignment. We design a dual-modality encoding module with two branches, where the time series encoding branch extracts relatively low-quality yet pure embeddings of time series through an inverted Transformer. In addition, the LLM-empowered encoding branch wraps the same time series as prompts to obtain high-quality yet entangled prompt embeddings via a Pre-trained LLM. Then, we design a cross-modality alignment module to retrieve high-quality and pure time series embeddings from the prompt embeddings. Moreover, we develop a time series forecasting module to decode the aligned embeddings while capturing dependencies among multiple variables for forecasting. Notably, we tailor the prompt to encode sufficient temporal information into a last token and design the last token embedding storage to reduce computational costs. Extensive experiments on real data offer insight into the accuracy and efficiency of the proposed framework.

Via

Access Paper or Ask Questions

A Vlogger-augmented Graph Neural Network Model for Micro-video Recommendation

May 28, 2024

Weijiang Lai, Beihong Jin, Beibei Li, Yiyuan Zheng, Rui Zhao

Abstract:Existing micro-video recommendation models exploit the interactions between users and micro-videos and/or multi-modal information of micro-videos to predict the next micro-video a user will watch, ignoring the information related to vloggers, i.e., the producers of micro-videos. However, in micro-video scenarios, vloggers play a significant role in user-video interactions, since vloggers generally focus on specific topics and users tend to follow the vloggers they are interested in. Therefore, in the paper, we propose a vlogger-augmented graph neural network model VA-GNN, which takes the effect of vloggers into consideration. Specifically, we construct a tripartite graph with users, micro-videos, and vloggers as nodes, capturing user preferences from different views, i.e., the video-view and the vlogger-view. Moreover, we conduct cross-view contrastive learning to keep the consistency between node embeddings from the two different views. Besides, when predicting the next user-video interaction, we adaptively combine the user preferences for a video itself and its vlogger. We conduct extensive experiments on two real-world datasets. The experimental results show that VA-GNN outperforms multiple existing GNN-based recommendation models.

* (2023) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track (pp. 684-699). Cham: Springer Nature Switzerland

Via

Access Paper or Ask Questions

CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

May 27, 2024

Jingqing Ruan, Ziyue Li, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao, Rui Zhao

Figure 1 for CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

Figure 2 for CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

Figure 3 for CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

Figure 4 for CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

Abstract:Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator selection as a second policy to be learned, concurrently being updated with the original signal-controlling policy. Specifically, the selection policy in real-time adaptively selects the best teammates according to phase- and intersection-level features. Empirical results on both synthetic and real-world datasets provide robust validation for the superiority of our approach, offering significant improvements over existing state-of-the-art methods. The code is available at https://github.com/AnonymousAccountss/CoSLight.

* Accepted by KDD 2024

Via

Access Paper or Ask Questions

What Makes Good Few-shot Examples for Vision-Language Models?

May 22, 2024

Zhaojun Guo, Jinghui Lu, Xuejing Liu, Rui Zhao, ZhenXing Qian, Fei Tan

Figure 1 for What Makes Good Few-shot Examples for Vision-Language Models?

Figure 2 for What Makes Good Few-shot Examples for Vision-Language Models?

Figure 3 for What Makes Good Few-shot Examples for Vision-Language Models?

Figure 4 for What Makes Good Few-shot Examples for Vision-Language Models?

Abstract:Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research. In this study, we delve into devising more effective strategies for the meticulous selection of few-shot training examples, as opposed to relying on random sampling, to enhance the potential of existing few-shot prompt learning methodologies. To achieve this, we assess the effectiveness of various Active Learning (AL) techniques for instance selection, such as Entropy and Margin of Confidence, within the context of few-shot training. Furthermore, we introduce two innovative selection methods - Representativeness (REPRE) and Gaussian Monte Carlo (Montecarlo) - designed to proactively pinpoint informative examples for labeling in relation to pre-trained VL models. Our findings demonstrate that both REPRE and Montecarlo significantly surpass both random selection and AL-based strategies in few-shot training scenarios. The research also underscores that these instance selection methods are model-agnostic, offering a versatile enhancement to a wide array of few-shot training methodologies.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions