Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dan Zhang

ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Jan 03, 2025

Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang

Figure 1 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Figure 2 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Figure 3 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Figure 4 for ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Abstract:Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Like, SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. In practice, permission to access gradient information is not always granted (the gradient ban), such as black-box APIs, hardware limitations, and non-differentiable systems. To bridge this gap, we introduce the first benchmark ZeroFlow to evaluate gradient-free optimization algorithms for overcoming forgetting. This benchmark examines a suite of forward pass methods across multiple methods, forgetting scenarios, and datasets. We find that forward passes alone are enough to overcome forgetting. Our findings reveal new optimization principles that highlight the potential of forward-pass in mitigating forgetting, managing task conflicts, and reducing memory demands, alongside novel enhancements that further mitigate forgetting with just one forward pass. This work provides essential insights and tools for advancing forward pass methods to overcome forgetting.

Via

Access Paper or Ask Questions

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Dec 16, 2024

Jiale Cheng, Xiao Liu, Cunxiang Wang, Xiaotao Gu, Yida Lu, Dan Zhang, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang

Figure 1 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Figure 2 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Figure 3 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Figure 4 for SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Abstract:Instruction-following is a fundamental capability of language models, requiring the model to recognize even the most subtle requirements in the instructions and accurately reflect them in its output. Such an ability is well-suited for and often optimized by preference learning. However, existing methods often directly sample multiple independent responses from the model when creating preference pairs. Such practice can introduce content variations irrelevant to whether the instruction is precisely followed (e.g., different expressions about the same semantic), interfering with the goal of teaching models to recognize the key differences that lead to improved instruction following. In light of this, we introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs free from distractions. By playing against itself, an LLM employs a tree-search strategy to refine its previous responses with respect to the instruction while minimizing unnecessary variations. Our experiments show that a LLaMA3-8B model, trained over three iterations guided by SPaR, surpasses GPT-4-Turbo on the IFEval benchmark without losing general capabilities. Furthermore, SPaR demonstrates promising scalability and transferability, greatly enhancing models like GLM-4-9B and LLaMA3-70B. We also identify how inference scaling in tree search would impact model performance. Our code and data are publicly available at https://github.com/thu-coai/SPaR.

Via

Access Paper or Ask Questions

Object-Focused Data Selection for Dense Prediction Tasks

Dec 13, 2024

Niclas Popp, Dan Zhang, Jan Hendrik Metzen, Matthias Hein, Lukas Schott

Figure 1 for Object-Focused Data Selection for Dense Prediction Tasks

Figure 2 for Object-Focused Data Selection for Dense Prediction Tasks

Figure 3 for Object-Focused Data Selection for Dense Prediction Tasks

Figure 4 for Object-Focused Data Selection for Dense Prediction Tasks

Abstract:Dense prediction tasks such as object detection and segmentation require high-quality labels at pixel level, which are costly to obtain. Recent advances in foundation models have enabled the generation of autolabels, which we find to be competitive but not yet sufficient to fully replace human annotations, especially for more complex datasets. Thus, we consider the challenge of selecting a representative subset of images for labeling from a large pool of unlabeled images under a constrained annotation budget. This task is further complicated by imbalanced class distributions, as rare classes are often underrepresented in selected subsets. We propose object-focused data selection (OFDS) which leverages object-level representations to ensure that the selected image subsets semantically cover the target classes, including rare ones. We validate OFDS on PASCAL VOC and Cityscapes for object detection and semantic segmentation tasks. Our experiments demonstrate that prior methods which employ image-level representations fail to consistently outperform random selection. In contrast, OFDS consistently achieves state-of-the-art performance with substantial improvements over all baselines in scenarios with imbalanced class distributions. Moreover, we demonstrate that pre-training with autolabels on the full datasets before fine-tuning on human-labeled subsets selected by OFDS further enhances the final performance.

Via

Access Paper or Ask Questions

Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

Dec 10, 2024

Li Shi, Houjiang Liu, Yian Wong, Utkarsh Mujumdar, Dan Zhang, Jacek Gwizdka, Matthew Lease

Figure 1 for Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

Figure 2 for Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

Figure 3 for Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

Figure 4 for Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

Abstract:Large language models (LLMs) are enabling designers to give life to exciting new user experiences for information access. In this work, we present a system that generates LLM personas to debate a topic of interest from different perspectives. How might information seekers use and benefit from such a system? Can centering information access around diverse viewpoints help to mitigate thorny challenges like confirmation bias in which information seekers over-trust search results matching existing beliefs? How do potential biases and hallucinations in LLMs play out alongside human users who are also fallible and possibly biased? Our study exposes participants to multiple viewpoints on controversial issues via a mixed-methods, within-subjects study. We use eye-tracking metrics to quantitatively assess cognitive engagement alongside qualitative feedback. Compared to a baseline search system, we see more creative interactions and diverse information-seeking with our multi-persona debate system, which more effectively reduces user confirmation bias and conviction toward their initial beliefs. Overall, our study contributes to the emerging design space of LLM-based information access systems, specifically investigating the potential of simulated personas to promote greater exposure to information diversity, emulate collective intelligence, and mitigate bias in information seeking.

Via

Access Paper or Ask Questions

UNCOVER: Unknown Class Object Detection for Autonomous Vehicles in Real-time

Dec 05, 2024

Lars Schmarje, Kaspar Sakman, Reinhard Koch, Dan Zhang

Figure 1 for UNCOVER: Unknown Class Object Detection for Autonomous Vehicles in Real-time

Figure 2 for UNCOVER: Unknown Class Object Detection for Autonomous Vehicles in Real-time

Figure 3 for UNCOVER: Unknown Class Object Detection for Autonomous Vehicles in Real-time

Figure 4 for UNCOVER: Unknown Class Object Detection for Autonomous Vehicles in Real-time

Abstract:Autonomous driving (AD) operates in open-world scenarios, where encountering unknown objects is inevitable. However, standard object detectors trained on a limited number of base classes tend to ignore any unknown objects, posing potential risks on the road. To address this, it is important to learn a generic rather than a class specific objectness from objects seen during training. We therefore introduce an occupancy prediction together with bounding box regression. It learns to score the objectness by calculating the ratio of the predicted area occupied by actual objects. To enhance its generalizability, we increase the object diversity by exploiting data from other domains via Mosaic and Mixup augmentation. The objects outside the AD training classes are classified as a newly added out-of-distribution (OOD) class. Our solution UNCOVER, for UNknown Class Object detection for autonomous VEhicles in Real-time, excels at achieving both real-time detection and high recall of unknown objects on challenging AD benchmarks. To further attain very low false positive rates, particularly for close objects, we introduce a post-hoc filtering step that utilizes geometric cues extracted from the depth map, typically available within the AD system.

Via

Access Paper or Ask Questions

Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models

Nov 26, 2024

Peng Cui, Guande He, Dan Zhang, Zhijie Deng, Yinpeng Dong, Jun Zhu

Figure 1 for Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models

Figure 2 for Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models

Figure 3 for Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models

Figure 4 for Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models

Abstract:Datasets collected from the open world unavoidably suffer from various forms of randomness or noiseness, leading to the ubiquity of aleatoric (data) uncertainty. Quantifying such uncertainty is particularly pivotal for object detection, where images contain multi-scale objects with occlusion, obscureness, and even noisy annotations, in contrast to images with centric and similar-scale objects in classification. This paper suggests modeling and exploiting the uncertainty inherent in object detection data with vision foundation models and develops a data-centric reliable training paradigm. Technically, we propose to estimate the data uncertainty of each object instance based on the feature space of vision foundation models, which are trained on ultra-large-scale datasets and able to exhibit universal data representation. In particular, we assume a mixture-of-Gaussian structure of the object features and devise Mahalanobis distance-based measures to quantify the data uncertainty. Furthermore, we suggest two curial and practical usages of the estimated uncertainty: 1) for defining uncertainty-aware sample filter to abandon noisy and redundant instances to avoid over-fitting, and 2) for defining sample adaptive regularizer to balance easy/hard samples for adaptive training. The estimated aleatoric uncertainty serves as an extra level of annotations of the dataset, so it can be utilized in a plug-and-play manner with any model. Extensive empirical studies verify the effectiveness of the proposed aleatoric uncertainty measure on various advanced detection models and challenging benchmarks.

Via

Access Paper or Ask Questions

Snippet-based Conversational Recommender System

Nov 09, 2024

Haibo Sun, Naoki Otani, Hannah Kim, Dan Zhang, Nikita Bhutani

Figure 1 for Snippet-based Conversational Recommender System

Figure 2 for Snippet-based Conversational Recommender System

Figure 3 for Snippet-based Conversational Recommender System

Figure 4 for Snippet-based Conversational Recommender System

Abstract:Conversational Recommender Systems (CRS) engage users in interactive dialogues to gather preferences and provide personalized recommendations. Traditionally, CRS rely on pre-defined attributes or expensive, domain-specific annotated datasets to guide conversations, which limits flexibility and adaptability across domains. In this work, we introduce SnipRec, a novel CRS that enhances dialogues and recommendations by extracting diverse expressions and preferences from user-generated content (UGC) like customer reviews. Using large language models, SnipRec maps user responses and UGC to concise snippets, which are used to generate clarification questions and retrieve relevant items. Our approach eliminates the need for domain-specific training, making it adaptable to new domains and effective without prior knowledge of user preferences. Extensive experiments on the Yelp dataset demonstrate the effectiveness of snippet-based representations against document and sentence-based representations. Additionally, SnipRec is able to improve Hits@10 by 0.25 over the course of five conversational turns, underscoring the efficiency of SnipRec in capturing user preferences through multi-turn conversations.

Via

Access Paper or Ask Questions

FactLens: Benchmarking Fine-Grained Fact Verification

Nov 08, 2024

Kushan Mitra, Dan Zhang, Sajjadur Rahman, Estevam Hruschka

Figure 1 for FactLens: Benchmarking Fine-Grained Fact Verification

Figure 2 for FactLens: Benchmarking Fine-Grained Fact Verification

Figure 3 for FactLens: Benchmarking Fine-Grained Fact Verification

Figure 4 for FactLens: Benchmarking Fine-Grained Fact Verification

Abstract:Large Language Models (LLMs) have shown impressive capability in language generation and understanding, but their tendency to hallucinate and produce factually incorrect information remains a key limitation. To verify LLM-generated contents and claims from other sources, traditional verification approaches often rely on holistic models that assign a single factuality label to complex claims, potentially obscuring nuanced errors. In this paper, we advocate for a shift toward fine-grained verification, where complex claims are broken down into smaller sub-claims for individual verification, allowing for more precise identification of inaccuracies, improved transparency, and reduced ambiguity in evidence retrieval. However, generating sub-claims poses challenges, such as maintaining context and ensuring semantic equivalence with respect to the original claim. We introduce FactLens, a benchmark for evaluating fine-grained fact verification, with metrics and automated evaluators of sub-claim quality. The benchmark data is manually curated to ensure high-quality ground truth. Our results show alignment between automated FactLens evaluators and human judgments, and we discuss the impact of sub-claim characteristics on the overall verification performance.

* 12 pages, under review

Via

Access Paper or Ask Questions

Dynamic-Attention-based EEG State Transition Modeling for Emotion Recognition

Nov 07, 2024

Xinke Shen, Runmin Gan, Kaixuan Wang, Shuyi Yang, Qingzhu Zhang, Quanying Liu, Dan Zhang, Sen Song

Figure 1 for Dynamic-Attention-based EEG State Transition Modeling for Emotion Recognition

Figure 2 for Dynamic-Attention-based EEG State Transition Modeling for Emotion Recognition

Figure 3 for Dynamic-Attention-based EEG State Transition Modeling for Emotion Recognition

Figure 4 for Dynamic-Attention-based EEG State Transition Modeling for Emotion Recognition

Abstract:Electroencephalogram (EEG)-based emotion decoding can objectively quantify people's emotional state and has broad application prospects in human-computer interaction and early detection of emotional disorders. Recently emerging deep learning architectures have significantly improved the performance of EEG emotion decoding. However, existing methods still fall short of fully capturing the complex spatiotemporal dynamics of neural signals, which are crucial for representing emotion processing. This study proposes a Dynamic-Attention-based EEG State Transition (DAEST) modeling method to characterize EEG spatiotemporal dynamics. The model extracts spatiotemporal components of EEG that represent multiple parallel neural processes and estimates dynamic attention weights on these components to capture transitions in brain states. The model is optimized within a contrastive learning framework for cross-subject emotion recognition. The proposed method achieved state-of-the-art performance on three publicly available datasets: FACED, SEED, and SEED-V. It achieved 75.4% accuracy in the binary classification of positive and negative emotions and 59.3% in nine-class discrete emotion classification on the FACED dataset, 88.1% in the three-class classification of positive, negative, and neutral emotions on the SEED dataset, and 73.6% in five-class discrete emotion classification on the SEED-V dataset. The learned EEG spatiotemporal patterns and dynamic transition properties offer valuable insights into neural dynamics underlying emotion processing.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Oct 31, 2024

Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, Yuxiao Dong

Figure 1 for AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Figure 2 for AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Figure 3 for AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Figure 4 for AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Abstract:Autonomous agents have become increasingly important for interacting with the real world. Android agents, in particular, have been recently a frequently-mentioned interaction method. However, existing studies for training and evaluating Android agents lack systematic research on both open-source and closed-source models. In this work, we propose AndroidLab as a systematic Android agent framework. It includes an operation environment with different modalities, action space, and a reproducible benchmark. It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space. AndroidLab benchmark includes predefined Android virtual devices and 138 tasks across nine apps built on these devices. By using the AndroidLab environment, we develop an Android Instruction dataset and train six open-source LLMs and LMMs, lifting the average success rates from 4.59\% to 21.50\% for LLMs and from 1.93\% to 13.28\% for LMMs. AndroidLab is open-sourced and publicly available at \url{https://github.com/THUDM/Android-Lab}.

Via

Access Paper or Ask Questions